[PATCH V2] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-09 Thread Anshuman Khandual
From: James Morse 

As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Suzuki K Poulose 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Marc Zyngier 
Signed-off-by: James Morse 
Signed-off-by: Anshuman Khandual 
---
Changes in V2:

- Changes back to switch construct in kvm_set_ipa_limit() per Marc

Changes in V1:

https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=442817

 arch/arm64/include/asm/sysreg.h   | 20 ++--
 arch/arm64/kernel/head.S  |  6 --
 arch/arm64/kvm/reset.c| 10 ++
 drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
 4 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index dfd4edb..d4a5fca9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -796,6 +796,11 @@
 #define ID_AA64MMFR0_PARANGE_480x5
 #define ID_AA64MMFR0_PARANGE_520x6
 
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7
+
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define ID_AA64MMFR0_PARANGE_MAX   ID_AA64MMFR0_PARANGE_52
 #else
@@ -961,14 +966,17 @@
 #define ID_PFR1_PROGMOD_SHIFT  0
 
 #if defined(CONFIG_ARM64_4K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #elif defined(CONFIG_ARM64_16K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0xF
 #elif defined(CONFIG_ARM64_64K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #endif
 
 #define MVFR2_FPMISC_SHIFT 4
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b..8b469f1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
 SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
-   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
-   b.ne__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
+   b.lt__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
+   b.gt__no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrpx2, idmap_pg_dir
phys_to_ttbr x1, x1
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f03..e81c7ec 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -311,16 +311,18 @@ int kvm_set_ipa_limit(void)
}
 
switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
-   default:
-   case 1:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
return -EINVAL;
-   case 0:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
break;
-   case 2:
+   case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... 
ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
break;
+   default:
+   kvm_err("Unsupported value for TGRAN_2, giving up\n");
+   return -EINVAL;
}
 
kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
diff 

[RFC PATCH 0/4] KVM: stats: Retrieve statistics data in binary format

2021-03-09 Thread Jing Zhang
This patchset extends IOCTL interface to retrieve KVM statistics data
in aggregated binary format.
It is meant to provide a lightweight, flexible, scalable and efficient 
lock-free solution for userspace telemetry applications to pull the
statistics data periodically for large scale systems.
The capability is indicated by KVM_CAP_STATS_BINARY_FORM.
Ioctl KVM_STATS_GET_INFO is used to get the information about VM or
vCPU statistics data (The number of supported statistics data which is
used for buffer allocation).
Ioctl KVM_STATS_GET_NAMES is used to get the list of name strings of
all supported statistics data.
Ioctl KVM_STATS_GET_DATA is used to get the aggregated statistics data
per VM or vCPU in the same order as the list of name strings. This is
the ioctl which would be called periodically to retrieve statistics
data per VM or vCPU.

Jing Zhang (4):
  KVM: stats: Separate statistics name strings from debugfs code
  KVM: stats: Define APIs for aggregated stats retrieval in binary
format
  KVM: stats: Add ioctl commands to pull statistics in binary format
  KVM: selftests: Add selftest for KVM binary form statistics interface

 Documentation/virt/kvm/api.rst|  79 +
 arch/arm64/kvm/guest.c|  47 ++-
 arch/mips/kvm/mips.c  | 114 +--
 arch/powerpc/kvm/book3s.c | 107 --
 arch/powerpc/kvm/booke.c  |  84 +++--
 arch/s390/kvm/kvm-s390.c  | 320 --
 arch/x86/kvm/x86.c| 127 ---
 include/linux/kvm_host.h  |  30 +-
 include/uapi/linux/kvm.h  |  60 
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   3 +
 .../selftests/kvm/kvm_bin_form_stats.c|  89 +
 virt/kvm/kvm_main.c   | 115 +++
 13 files changed, 935 insertions(+), 241 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c


base-commit: 357ad203d45c0f9d76a8feadbd5a1c5d460c638b
-- 
2.30.1.766.gb4fecdf3b7-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 1/4] KVM: stats: Separate statistics name strings from debugfs code

2021-03-09 Thread Jing Zhang
Prepare the statistics name strings for supporting binary format
aggregated statistics data retrieval.

No functional change intended.

Signed-off-by: Jing Zhang 
---
 arch/arm64/kvm/guest.c|  47 --
 arch/mips/kvm/mips.c  | 114 ++
 arch/powerpc/kvm/book3s.c | 107 +
 arch/powerpc/kvm/booke.c  |  84 +++---
 arch/s390/kvm/kvm-s390.c  | 320 ++
 arch/x86/kvm/x86.c| 127 ++-
 include/linux/kvm_host.h  |  31 +++-
 7 files changed, 589 insertions(+), 241 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9bbd30e62799..fb3aafe76b52 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -28,19 +28,42 @@
 
 #include "trace.h"
 
+const char kvm_vm_stat_strings[][KVM_STATS_NAME_LEN] = {
+   "remote_tlb_flush",
+};
+static_assert(sizeof(kvm_vm_stat_strings) ==
+   VM_STAT_COUNT * KVM_STATS_NAME_LEN);
+
+const char kvm_vcpu_stat_strings[][KVM_STATS_NAME_LEN] = {
+   "halt_successful_poll",
+   "halt_attempted_poll",
+   "halt_poll_success_ns",
+   "halt_poll_fail_ns",
+   "halt_poll_invalid",
+   "halt_wakeup",
+   "hvc_exit_stat",
+   "wfe_exit_stat",
+   "wfi_exit_stat",
+   "mmio_exit_user",
+   "mmio_exit_kernel",
+   "exits",
+};
+static_assert(sizeof(kvm_vcpu_stat_strings) ==
+   VCPU_STAT_COUNT * KVM_STATS_NAME_LEN);
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-   VCPU_STAT("halt_successful_poll", halt_successful_poll),
-   VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-   VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-   VCPU_STAT("halt_wakeup", halt_wakeup),
-   VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
-   VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
-   VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
-   VCPU_STAT("mmio_exit_user", mmio_exit_user),
-   VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
-   VCPU_STAT("exits", exits),
-   VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-   VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+   VCPU_STAT(halt_successful_poll),
+   VCPU_STAT(halt_attempted_poll),
+   VCPU_STAT(halt_poll_invalid),
+   VCPU_STAT(halt_wakeup),
+   VCPU_STAT(hvc_exit_stat),
+   VCPU_STAT(wfe_exit_stat),
+   VCPU_STAT(wfi_exit_stat),
+   VCPU_STAT(mmio_exit_user),
+   VCPU_STAT(mmio_exit_kernel),
+   VCPU_STAT(exits),
+   VCPU_STAT(halt_poll_success_ns),
+   VCPU_STAT(halt_poll_fail_ns),
{ NULL }
 };
 
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 3d6a7f5827b1..8b9cbd9d6205 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,44 +39,92 @@
 #define VECTORSPACING 0x100/* for EI/VI mode */
 #endif
 
+const char kvm_vm_stat_strings[][KVM_STATS_NAME_LEN] = {
+   "remote_tlb_flush",
+};
+static_assert(sizeof(kvm_vm_stat_strings) ==
+   VM_STAT_COUNT * KVM_STATS_NAME_LEN);
+
+const char kvm_vcpu_stat_strings[][KVM_STATS_NAME_LEN] = {
+   "wait",
+   "cache",
+   "signal",
+   "interrupt",
+   "cop_unusable",
+   "tlbmod",
+   "tlbmiss_ld",
+   "tlbmiss_st",
+   "addrerr_st",
+   "addrerr_ld",
+   "syscall",
+   "resvd_inst",
+   "break_inst",
+   "trap_inst",
+   "msa_fpe",
+   "fpe",
+   "msa_disabled",
+   "flush_dcache",
+#ifdef CONFIG_KVM_MIPS_VZ
+   "vz_gpsi",
+   "vz_gsfc",
+   "vz_hc",
+   "vz_grr",
+   "vz_gva",
+   "vz_ghfc",
+   "vz_gpa",
+   "vz_resvd",
+#ifdef CONFIG_CPU_LOONGSON64
+   "vz_cpucfg",
+#endif
+#endif
+   "halt_successful_poll",
+   "halt_attempted_poll",
+   "halt_poll_success_ns",
+   "halt_poll_fail_ns",
+   "halt_poll_invalid",
+   "halt_wakeup",
+};
+static_assert(sizeof(kvm_vcpu_stat_strings) ==
+   VCPU_STAT_COUNT * KVM_STATS_NAME_LEN);
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-   VCPU_STAT("wait", wait_exits),
-   VCPU_STAT("cache", cache_exits),
-   VCPU_STAT("signal", signal_exits),
-   VCPU_STAT("interrupt", int_exits),
-   VCPU_STAT("cop_unusable", cop_unusable_exits),
-   VCPU_STAT("tlbmod", tlbmod_exits),
-   VCPU_STAT("tlbmiss_ld", tlbmiss_ld_exits),
-   VCPU_STAT("tlbmiss_st", tlbmiss_st_exits),
-   VCPU_STAT("addrerr_st", addrerr_st_exits),
-   VCPU_STAT("addrerr_ld", addrerr_ld_exits),
-   VCPU_STAT("syscall", syscall_exits),
-   VCPU_STAT("resvd_inst", resvd_inst_exits),
-   VCPU_STAT("break_inst", break_inst_exits),
-   VCPU_STAT("trap_inst", trap_inst_exits),
-   VCPU_STAT("msa_fpe", msa_fpe_exits),
-   VCPU_STAT("fpe", fpe_exits),
-   VCPU_STAT("msa_disabled", msa_disabled_exits),
-   VCPU_STAT("flush_dcache", flush_dcache_exits),
+   VCPU_STAT(wait_exits),
+   VCPU_STAT(cache_exits),
+   VCPU_STAT(signal_exits),
+  

[RFC PATCH 3/4] KVM: stats: Add ioctl commands to pull statistics in binary format

2021-03-09 Thread Jing Zhang
Three ioctl commands are added to support binary form statistics data
retrieval. KVM_STATS_GET_INFO, KVM_STATS_GET_NAMES, KVM_STATS_GET_DATA.
KVM_CAP_STATS_BINARY_FORM indicates the capability.

Signed-off-by: Jing Zhang 
---
 virt/kvm/kvm_main.c | 115 
 1 file changed, 115 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 383df23514b9..87dd62516c8b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3464,6 +3464,51 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
break;
}
+   case KVM_STATS_GET_INFO: {
+   struct kvm_stats_info stats_info;
+
+   r = -EFAULT;
+   stats_info.num_stats = VCPU_STAT_COUNT;
+   if (copy_to_user(argp, _info, sizeof(stats_info)))
+   goto out;
+   r = 0;
+   break;
+   }
+   case KVM_STATS_GET_NAMES: {
+   struct kvm_stats_names stats_names;
+
+   r = -EFAULT;
+   if (copy_from_user(_names, argp, sizeof(stats_names)))
+   goto out;
+   r = -EINVAL;
+   if (stats_names.size < VCPU_STAT_COUNT * KVM_STATS_NAME_LEN)
+   goto out;
+
+   r = -EFAULT;
+   if (copy_to_user(argp + sizeof(stats_names),
+   kvm_vcpu_stat_strings,
+   VCPU_STAT_COUNT * KVM_STATS_NAME_LEN))
+   goto out;
+   r = 0;
+   break;
+   }
+   case KVM_STATS_GET_DATA: {
+   struct kvm_stats_data stats_data;
+
+   r = -EFAULT;
+   if (copy_from_user(_data, argp, sizeof(stats_data)))
+   goto out;
+   r = -EINVAL;
+   if (stats_data.size < sizeof(vcpu->stat))
+   goto out;
+
+   r = -EFAULT;
+   argp += sizeof(stats_data);
+   if (copy_to_user(argp, >stat, sizeof(vcpu->stat)))
+   goto out;
+   r = 0;
+   break;
+   }
default:
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
}
@@ -3695,6 +3740,7 @@ static long kvm_vm_ioctl_check_extension_generic(struct 
kvm *kvm, long arg)
case KVM_CAP_CHECK_EXTENSION_VM:
case KVM_CAP_ENABLE_CAP_VM:
case KVM_CAP_HALT_POLL:
+   case KVM_CAP_STATS_BINARY_FORM:
return 1;
 #ifdef CONFIG_KVM_MMIO
case KVM_CAP_COALESCED_MMIO:
@@ -3825,6 +3871,40 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm 
*kvm,
}
 }
 
+static long kvm_vm_ioctl_stats_get_data(struct kvm *kvm, unsigned long arg)
+{
+   void __user *argp = (void __user *)arg;
+   struct kvm_vcpu *vcpu;
+   struct kvm_stats_data stats_data;
+   u64 *data = NULL, *pdata;
+   int i, j, ret = 0;
+   size_t dsize = (VM_STAT_COUNT + VCPU_STAT_COUNT) * sizeof(*data);
+
+
+   if (copy_from_user(_data, argp, sizeof(stats_data)))
+   return -EFAULT;
+   if (stats_data.size < dsize)
+   return -EINVAL;
+   data = kzalloc(dsize, GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   for (i = 0; i < VM_STAT_COUNT; i++)
+   *(data + i) = *((ulong *)>stat + i);
+
+   kvm_for_each_vcpu(j, vcpu, kvm) {
+   pdata = data + VM_STAT_COUNT;
+   for (i = 0; i < VCPU_STAT_COUNT; i++, pdata++)
+   *pdata += *((u64 *)>stat + i);
+   }
+
+   if (copy_to_user(argp + sizeof(stats_data), data, dsize))
+   ret = -EFAULT;
+
+   kfree(data);
+   return ret;
+}
+
 static long kvm_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -4001,6 +4081,41 @@ static long kvm_vm_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_STATS_GET_INFO: {
+   struct kvm_stats_info stats_info;
+
+   r = -EFAULT;
+   stats_info.num_stats = VM_STAT_COUNT + VCPU_STAT_COUNT;
+   if (copy_to_user(argp, _info, sizeof(stats_info)))
+   goto out;
+   r = 0;
+   break;
+   }
+   case KVM_STATS_GET_NAMES: {
+   struct kvm_stats_names stats_names;
+
+   r = -EFAULT;
+   if (copy_from_user(_names, argp, sizeof(stats_names)))
+   goto out;
+   r = -EINVAL;
+   if (stats_names.size <
+   (VM_STAT_COUNT + VCPU_STAT_COUNT) * KVM_STATS_NAME_LEN)
+   goto out;
+   r = -EFAULT;
+   argp += sizeof(stats_names);
+   if (copy_to_user(argp, kvm_vm_stat_strings,
+   VM_STAT_COUNT * KVM_STATS_NAME_LEN))
+   

[RFC PATCH 4/4] KVM: selftests: Add selftest for KVM binary form statistics interface

2021-03-09 Thread Jing Zhang
Check if the KVM binary form statistics works correctly and whether the
statistics name strings are synced correctly with KVM internal stats
data.

Signed-off-by: Jing Zhang 
---
 tools/testing/selftests/kvm/.gitignore|  1 +
 tools/testing/selftests/kvm/Makefile  |  3 +
 .../selftests/kvm/kvm_bin_form_stats.c| 89 +++
 3 files changed, 93 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 32b87cc77c8e..0c8241bd9a17 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -38,3 +38,4 @@
 /memslot_modification_stress_test
 /set_memory_region_test
 /steal_time
+/kvm_bin_form_stats
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index a6d61f451f88..5cdd52ccedf2 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -72,6 +72,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
+TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
 
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
@@ -81,6 +82,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
+TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
@@ -89,6 +91,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
 TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += set_memory_region_test
+TEST_GEN_PROGS_s390x += kvm_bin_form_stats
 
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c 
b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
new file mode 100644
index ..36cf206470b1
--- /dev/null
+++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_bin_form_stats
+ *
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Test for fd-based IOCTL commands for retrieving KVM statistics data in
+ * binary form. KVM_CAP_STATS_BINARY_FORM, KVM_STATS_GET_INFO,
+ * KVM_STATS_GET_NAMES and KVM_STATS_GET_DATA are checked.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "asm/kvm.h"
+#include "linux/kvm.h"
+
+int main(int argc, char *argv[])
+{
+   struct kvm_stats_info stats_info = {0};
+   struct kvm_stats_names *stats_names;
+   struct kvm_stats_data *stats_data;
+   struct kvm_vm *kvm;
+   int i, ret;
+
+   kvm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+
+   ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FORM);
+   if (ret < 0) {
+   pr_info("Binary form statistics interface is not supported!\n");
+   goto out_free_kvm;
+   }
+
+   ret = -1;
+   vm_ioctl(kvm, KVM_STATS_GET_INFO, _info);
+   if (stats_info.num_stats == 0) {
+   pr_info("KVM_STATS_GET_INFO failed!\n");
+   pr_info("Or the number of stistics data is zero.\n");
+   goto out_free_kvm;
+   }
+
+   /* Allocate memory for stats name strings and value */
+   stats_names = malloc(sizeof(*stats_names) +
+   stats_info.num_stats * KVM_STATS_NAME_LEN);
+   if (!stats_names) {
+   pr_info("Memory allocation failed!\n");
+   goto out_free_kvm;
+   }
+
+   stats_data = malloc(sizeof(*stats_data) +
+   stats_info.num_stats * sizeof(__u64));
+   if (!stats_data) {
+   pr_info("Memory allocation failed!\n");
+   goto out_free_names;
+   }
+
+   /* Retrieve the name strings and data */
+   stats_names->size = stats_info.num_stats * KVM_STATS_NAME_LEN;
+   vm_ioctl(kvm, KVM_STATS_GET_NAMES, stats_names);
+
+   stats_data->size = stats_info.num_stats * sizeof(__u64);
+   vm_ioctl(kvm, KVM_STATS_GET_DATA, stats_data);
+
+   /* Display supported statistics names */
+   for (i = 0; i < (int)stats_info.num_stats; i++) {
+   char *name = (char *)stats_names->names + i * 
KVM_STATS_NAME_LEN;
+
+   if (strnlen(name, KVM_STATS_NAME_LEN) == 0) {
+   pr_info("Empty stats name at offset %d!\n", i);
+   goto out_free_data;
+   }
+   pr_info("%s\n", name);
+   }
+
+   ret = 0;
+out_free_data:
+   free(stats_data);
+out_free_names:
+   free(stats_names);

[RFC PATCH 2/4] KVM: stats: Define APIs for aggregated stats retrieval in binary format

2021-03-09 Thread Jing Zhang
Define ioctl commands for VM/vCPU aggregated statistics data retrieval
in binary format and update corresponding API documentation.

The capability and ioctl are not enabled for now.
No functional change intended.

Signed-off-by: Jing Zhang 
---
 Documentation/virt/kvm/api.rst | 79 ++
 include/linux/kvm_host.h   |  1 -
 include/uapi/linux/kvm.h   | 60 ++
 3 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1a2b5210cdbf..aa4b5270c966 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4938,6 +4938,76 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.131 KVM_STATS_GET_INFO
+
+
+:Capability: KVM_CAP_STATS_BINARY_FORM
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: struct kvm_stats_info (out)
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_stats_info {
+__u32 num_stats;
+  };
+
+This ioctl is used to get the information about VM or vCPU statistics data.
+The number of statistics data would be returned in field num_stats in
+struct kvm_stats_info. This ioctl only needs to be called once on running
+VMs on the same architecture.
+
+4.132 KVM_STATS_GET_NAMES
+-
+
+:Capability: KVM_CAP_STATS_BINARY_FORM
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: struct kvm_stats_names (in/out)
+:Returns: 0 on success, < 0 on error
+
+::
+
+  #define KVM_STATS_NAME_LEN   32
+  struct kvm_stats_names {
+   __u32 size;
+   __u8  names[0];
+  };
+
+This ioctl is used to get the string names of all the statistics data for VM
+or vCPU. Users must use KVM_STATS_GET_INFO to find the number of statistics.
+They must allocate a buffer of the size num_stats * KVM_STATS_NAME_LEN
+immediately following struct kvm_stats_names. The size field of kvm_stats_name
+must contain the buffer size as an input.
+The buffer can be treated like a string array, each name string is null-padded
+to a size of KVM_STATS_NAME_LEN;
+This ioclt only needs to be called once on running VMs on the same 
architecture.
+
+4.133 KVM_STATS_GET_DATA
+-
+
+:Capability: KVM_CAP_STATS_BINARY_FORM
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: struct kvm_stats_data (in/out)
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_stats_data {
+   __u32 size;
+   __u64 data[0];
+  };
+
+This ioctl is used to get the aggregated statistics data for VM or vCPU.
+Users must use KVM_STATS_GET_INFO to find the number of statistics.
+They must allocate a buffer of the appropriate size num_stats * sizeof(data[0])
+immediately following struct kvm_stats_data. The size field of kvm_stats_data
+must contain the buffer size as an input.
+The data buffer 1-1 maps to name strings buffer in sequential order.
+This ioctl is usually called periodically to pull statistics data.
+
 5. The kvm_run structure
 
 
@@ -6721,3 +6791,12 @@ vcpu_info is set.
 The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related
 features KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR/_CURRENT/_DATA/_ADJUST are
 supported by the KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR ioctls.
+
+8.31 KVM_CAP_STATS_BINARY_FORM
+--
+
+:Architectures: all
+
+This capability indicates that KVM supports retrieving aggregated statistics
+data in binary format with corresponding VM/VCPU ioctl KVM_STATS_GET_INFO,
+KVM_STATS_GET_NAMES and KVM_STATS_GET_DATA.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1ea297458306..f2dabf457717 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1164,7 +1164,6 @@ static inline bool kvm_is_error_gpa(struct kvm *kvm, 
gpa_t gpa)
 
 #define VM_STAT_COUNT  (sizeof(struct kvm_vm_stat)/sizeof(ulong))
 #define VCPU_STAT_COUNT(sizeof(struct 
kvm_vcpu_stat)/sizeof(u64))
-#define KVM_STATS_NAME_LEN 32
 
 /* Make sure it is synced with fields in struct kvm_vm_stat. */
 extern const char kvm_vm_stat_strings[][KVM_STATS_NAME_LEN];
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..ad185d4c5e42 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_STATS_BINARY_FORM 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1853,4 +1854,63 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT(1 << 1)
 
+/* Available with KVM_CAP_STATS_BINARY_FORM */
+
+#define KVM_STATS_NAME_LEN 32
+
+/**
+ * struct kvm_stats_info - statistics information
+ *
+ * Used as 

Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

2021-03-09 Thread Marc Zyngier
On Wed, 3 Mar 2021 16:45:05 +, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  commit: 01dc9262ff5797b675c32c0c6bc682777d23de05

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 5/6] KVM: arm64: ioctl to fetch/store tags in a guest

2021-03-09 Thread Marc Zyngier
On Mon, 01 Mar 2021 14:23:14 +,
Steven Price  wrote:
> 
> The VMM may not wish to have it's own mapping of guest memory mapped
> with PROT_MTE because this causes problems if the VMM has tag checking
> enabled (the guest controls the tags in physical RAM and it's unlikely
> the tags are correct for the VMM).
> 
> Instead add a new ioctl which allows the VMM to easily read/write the
> tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
> while the VMM can still read/write the tags for the purpose of
> migration.
> 
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/uapi/asm/kvm.h | 13 +++
>  arch/arm64/kvm/arm.c  | 57 +++
>  include/uapi/linux/kvm.h  |  1 +
>  3 files changed, 71 insertions(+)
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..5fc2534ac5df 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,19 @@ struct kvm_vcpu_events {
>   __u32 reserved[12];
>  };
>  
> +struct kvm_arm_copy_mte_tags {
> + __u64 guest_ipa;
> + __u64 length;
> + union {
> + void __user *addr;
> + __u64 padding;
> + };
> + __u64 flags;

I'd be keen on a couple of reserved __64s. Just in case...

> +};
> +
> +#define KVM_ARM_TAGS_TO_GUEST0
> +#define KVM_ARM_TAGS_FROM_GUEST  1
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK  0x0FFF
>  #define KVM_REG_ARM_COPROC_SHIFT 16
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 46bf319f6cb7..01d404833e24 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1297,6 +1297,53 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm 
> *kvm,
>   }
>  }
>  
> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> +   struct kvm_arm_copy_mte_tags *copy_tags)
> +{
> + gpa_t guest_ipa = copy_tags->guest_ipa;
> + size_t length = copy_tags->length;
> + void __user *tags = copy_tags->addr;
> + gpa_t gfn;
> + bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
> +
> + if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
> + return -EINVAL;
> +
> + if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
> + return -EINVAL;

It is a bit odd to require userspace to provide a page-aligned
addr/size, as it now has to find out about the kernel's page
size. MTE_GRANULE_SIZE-aligned values would make more sense. Is there
an underlying reason for this?

> +
> + gfn = gpa_to_gfn(guest_ipa);
> +
> + while (length > 0) {
> + kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
> + void *maddr;
> + unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
> +
> + if (is_error_noslot_pfn(pfn))
> + return -ENOENT;
> +
> + maddr = page_address(pfn_to_page(pfn));
> +
> + if (!write) {
> + num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
> + kvm_release_pfn_clean(pfn);
> + } else {
> + num_tags = mte_copy_tags_from_user(maddr, tags,
> +num_tags);
> + kvm_release_pfn_dirty(pfn);
> + }
> +

Is it actually safe to do this without holding any lock, without
checking anything against the mmu_notifier_seq? What if the pages are
being swapped out? Or the memslot removed from under your feet?

It looks... dangerous. Do you even want to allow this while vcpus are
actually running?

> + if (num_tags != PAGE_SIZE / MTE_GRANULE_SIZE)
> + return -EFAULT;
> +
> + gfn++;
> + tags += num_tags;
> + length -= PAGE_SIZE;
> + }
> +
> + return 0;
> +}
> +
>  long kvm_arch_vm_ioctl(struct file *filp,
>  unsigned int ioctl, unsigned long arg)
>  {
> @@ -1333,6 +1380,16 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  
>   return 0;
>   }
> + case KVM_ARM_MTE_COPY_TAGS: {
> + struct kvm_arm_copy_mte_tags copy_tags;
> +
> + if (!kvm_has_mte(kvm))
> + return -EINVAL;
> +
> + if (copy_from_user(_tags, argp, sizeof(copy_tags)))
> + return -EFAULT;
> + return kvm_vm_ioctl_mte_copy_tags(kvm, _tags);
> + }
>   default:
>   return -EINVAL;
>   }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 05618a4abf7e..b75af0f9ba55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1423,6 +1423,7 @@ struct kvm_s390_ucas_mapping {
>  /* Available with KVM_CAP_PMU_EVENT_FILTER */
>  #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct 
> 

Re: [PATCH v9 3/6] arm64: kvm: Save/restore MTE registers

2021-03-09 Thread Marc Zyngier
On Mon, 01 Mar 2021 14:23:12 +,
Steven Price  wrote:
> 
> Define the new system registers that MTE introduces and context switch
> them. The MTE feature is still hidden from the ID register as it isn't
> supported in a VM yet.
> 
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/asm/kvm_host.h  |  6 ++
>  arch/arm64/include/asm/kvm_mte.h   | 66 ++
>  arch/arm64/include/asm/sysreg.h|  3 +-
>  arch/arm64/kernel/asm-offsets.c|  3 +
>  arch/arm64/kvm/hyp/entry.S |  7 +++
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++
>  arch/arm64/kvm/sys_regs.c  | 22 ++--
>  7 files changed, 123 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm64/include/asm/kvm_mte.h
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 1170ee137096..d00cc3590f6e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -208,6 +208,12 @@ enum vcpu_sysreg {
>   CNTP_CVAL_EL0,
>   CNTP_CTL_EL0,
>  
> + /* Memory Tagging Extension registers */
> + RGSR_EL1,   /* Random Allocation Tag Seed Register */
> + GCR_EL1,/* Tag Control Register */
> + TFSR_EL1,   /* Tag Fault Status Register (EL1) */
> + TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
> +
>   /* 32bit specific registers. Keep them at the end of the range */
>   DACR32_EL2, /* Domain Access Control Register */
>   IFSR32_EL2, /* Instruction Fault Status Register */
> diff --git a/arch/arm64/include/asm/kvm_mte.h 
> b/arch/arm64/include/asm/kvm_mte.h
> new file mode 100644
> index ..6541c7d6ce06
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_mte.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.
> + */
> +#ifndef __ASM_KVM_MTE_H
> +#define __ASM_KVM_MTE_H
> +
> +#ifdef __ASSEMBLY__
> +
> +#include 
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> + b   .L__skip_switch\@
> +alternative_else_nop_endif
> + mrs \reg1, hcr_el2
> + and \reg1, \reg1, #(HCR_ATA)
> + cbz \reg1, .L__skip_switch\@
> +
> + mrs_s   \reg1, SYS_RGSR_EL1
> + str \reg1, [\h_ctxt, #CPU_RGSR_EL1]
> + mrs_s   \reg1, SYS_GCR_EL1
> + str \reg1, [\h_ctxt, #CPU_GCR_EL1]
> +
> + ldr \reg1, [\g_ctxt, #CPU_RGSR_EL1]
> + msr_s   SYS_RGSR_EL1, \reg1
> + ldr \reg1, [\g_ctxt, #CPU_GCR_EL1]
> + msr_s   SYS_GCR_EL1, \reg1
> +
> +.L__skip_switch\@:
> +.endm
> +
> +.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> + b   .L__skip_switch\@
> +alternative_else_nop_endif
> + mrs \reg1, hcr_el2
> + and \reg1, \reg1, #(HCR_ATA)
> + cbz \reg1, .L__skip_switch\@
> +
> + mrs_s   \reg1, SYS_RGSR_EL1
> + str \reg1, [\g_ctxt, #CPU_RGSR_EL1]
> + mrs_s   \reg1, SYS_GCR_EL1
> + str \reg1, [\g_ctxt, #CPU_GCR_EL1]
> +
> + ldr \reg1, [\h_ctxt, #CPU_RGSR_EL1]
> + msr_s   SYS_RGSR_EL1, \reg1
> + ldr \reg1, [\h_ctxt, #CPU_GCR_EL1]
> + msr_s   SYS_GCR_EL1, \reg1
> +
> +.L__skip_switch\@:
> +.endm
> +
> +#else /* CONFIG_ARM64_MTE */
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +.endm
> +
> +.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
> +.endm
> +
> +#endif /* CONFIG_ARM64_MTE */
> +#endif /* __ASSEMBLY__ */
> +#endif /* __ASM_KVM_MTE_H */
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index dfd4edbfe360..5424d195cf96 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -580,7 +580,8 @@
>  #define SCTLR_ELx_M  (BIT(0))
>  
>  #define SCTLR_ELx_FLAGS  (SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
> -  SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
> +  SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
> +  SCTLR_ELx_ITFSB)
>  
>  /* SCTLR_EL2 specific flags. */
>  #define SCTLR_EL2_RES1   ((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) 
> | \
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index a36e2fc330d4..944e4f1f45d9 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -108,6 +108,9 @@ int main(void)
>DEFINE(VCPU_WORKAROUND_FLAGS,  offsetof(struct kvm_vcpu, 
> arch.workaround_flags));
>DEFINE(VCPU_HCR_EL2,   offsetof(struct kvm_vcpu, 
> arch.hcr_el2));
>DEFINE(CPU_USER_PT_REGS,   offsetof(struct kvm_cpu_context, regs));
> +  DEFINE(CPU_RGSR_EL1,   offsetof(struct kvm_cpu_context, 
> sys_regs[RGSR_EL1]));
> +  DEFINE(CPU_GCR_EL1,offsetof(struct kvm_cpu_context, 
> sys_regs[GCR_EL1]));
> +  DEFINE(CPU_TFSRE0_EL1, offsetof(struct kvm_cpu_context, 
> sys_regs[TFSRE0_EL1]));
>

Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

2021-03-09 Thread Alexandru Elisei
Hi Marc,

On 3/8/21 8:03 PM, Marc Zyngier wrote:
> Hi Alex,
>
> On Mon, 08 Mar 2021 16:53:09 +,
> Alexandru Elisei  wrote:
>> Hello,
>>
>> It's not clear to me why this patch is needed. If one VCPU in the VM is 
>> generating
>> code, is it not the software running in the VM responsible for keeping track 
>> of
>> the MMU state of the other VCPUs and making sure the new code is executed
>> correctly? Why should KVM get involved?
>>
>> I don't see how this is different than running on bare metal (no
>> hypervisor), and one CPU with the MMU on generates code that another
>> CPU with the MMU off must execute.
> The difference is that so far, we have always considered i-caches to
> be private to each CPU. With a hypervisor that allows migration of
> vcpus from one physical CPU to another, the i-cache isn't private
> anymore from the perspective of the vcpus.

I think I understand what the problem is. VCPU X running on CPU A with MMU off
fetches instructions from PoC and allocates them into the icache. VCPU Y running
on CPU B generates code and does dcache clean to PoU + icache invalidate, gets
scheduled on CPU A and executes the stale instructions fetched by VCPU X from 
PoC.

>
>> Some comments below.
>>
>> On 3/6/21 2:15 PM, Catalin Marinas wrote:
>>> On Sat, Mar 06, 2021 at 10:54:47AM +, Marc Zyngier wrote:
 On Fri, 05 Mar 2021 19:07:09 +,
 Catalin Marinas  wrote:
> On Wed, Mar 03, 2021 at 04:45:05PM +, Marc Zyngier wrote:
>> It recently became apparent that the ARMv8 architecture has interesting
>> rules regarding attributes being used when fetching instructions
>> if the MMU is off at Stage-1.
>>
>> In this situation, the CPU is allowed to fetch from the PoC and
>> allocate into the I-cache (unless the memory is mapped with
>> the XN attribute at Stage-2).
> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
> as well?
 The ARM ARM doesn't seem to mention FWB at all when it comes to
 instruction fetch, which is sort of expected as it only covers the
 D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
 though, as the I-side would then snoop the D-side.
>>> Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
>>> PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
>>> don't think I-cache snooping the D-cache would happen to the PoU when
>>> the S1 MMU is off.
>> FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means
>> that no dcache clean is required for instruction to data coherence
>> (page D13-3086). I interpret that as saying that with FEAT_FWB,
>> CTR_EL0.IDC is effectively 1, which means that dcache clean is not
>> required for instruction generation, and icache invalidation is
>> required only if CTR_EL0.DIC = 0 (according to B2-158).
>>
>>> My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
>>> are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
>>> accesses may be held in the I-cache.
>> Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that
>> instruction fetches are to Normal Cacheable, Inner and Outer
>> Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly
>> at D13-3586).
> I think that's the allocation in unified caches, and not necessarily
> the i-cache, given that it also mention things such as "Inner
> Write-Through", which makes no sense for the i-cache.
>> Like you've pointed out, as mentioned in D4.4.6, it is always
>> possible that instruction fetches are held in the instruction cache,
>> regardless of the state of the SCTLR_EL1.M bit.
> Exactly, and that's what breaks things.
>
>>> The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
>>> S1 is "upgraded" to cacheable. This should happen irrespective of
>>> whether the S1 MMU is on or off and should apply to both I and D
>>> accesses (since it does not explicitly says). So I think we could skip
>>> this IC IALLU when FWB is present.
>>>
>>> The same logic should apply when the VMM copies the VM text. With FWB,
>>> we probably only need D-cache maintenance to PoU and only if
>>> CTR_EL0.IDC==0. I haven't checked what the code currently does.
>> When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't
>> need a dcache clean in this case.
> But that isn't what concerns me. FWB is exclusively documented in
> terms of d-cache, and doesn't describe how that affects the
> instruction fetch (which is why I'm reluctant to attribute any effect
> to it).

I tend to agree with this. FEAT_S2FWB is described in terms of resultant memory
type, cacheability attribute and cacheability hints, which in the architecture
don't affect the need to do instruction cache invalidation or data cache clean
when generating instructions.

There's also this part which is specifically targeted at instruction generation
(page D5-2761):

"When FEAT_S2FWB is implemented, the architecture requires that CLIDR_EL1.{LOUU,

Re: [PATCH v9 2/6] arm64: kvm: Introduce MTE VM feature

2021-03-09 Thread Marc Zyngier
On Mon, 01 Mar 2021 14:23:11 +,
Steven Price  wrote:
> 
> Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> for a VM. This will expose the feature to the guest and automatically
> tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
> storage) to ensure that the guest cannot see stale tags, and so that
> the tags are correctly saved/restored across swap.
> 
> Actually exposing the new capability to user space happens in a later
> patch.
> 
> Signed-off-by: Steven Price 
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h|  3 +++
>  arch/arm64/kvm/hyp/exception.c   |  3 ++-
>  arch/arm64/kvm/mmu.c | 16 
>  arch/arm64/kvm/sys_regs.c|  3 ++-
>  include/uapi/linux/kvm.h |  1 +
>  6 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f612c090f2e4..6bf776c2399c 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>   vcpu_el1_is_32bit(vcpu))
>   vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> + if (kvm_has_mte(vcpu->kvm))
> + vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 3d10e6527f7d..1170ee137096 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -132,6 +132,8 @@ struct kvm_arch {
>  
>   u8 pfr0_csv2;
>   u8 pfr0_csv3;
> + /* Memory Tagging Extension enabled for the guest */
> + bool mte_enabled;
>  };
>  
>  struct kvm_vcpu_fault_info {
> @@ -767,6 +769,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
>  #define kvm_arm_vcpu_sve_finalized(vcpu) \
>   ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>  
> +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
>  #define kvm_vcpu_has_pmu(vcpu)   \
>   (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
>  
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 73629094f903..56426565600c 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
> unsigned long target_mode,
>   new |= (old & PSR_C_BIT);
>   new |= (old & PSR_V_BIT);
>  
> - // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> + if (kvm_has_mte(vcpu->kvm))
> + new |= PSR_TCO_BIT;
>  
>   new |= (old & PSR_DIT_BIT);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 77cb2d28f2a4..fdb6ab604fd0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   if (vma_pagesize == PAGE_SIZE && !force_pte)
>   vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  , _ipa);
> +
> + if (kvm_has_mte(kvm) && pfn_valid(pfn)) {
> + /*
> +  * VM will be able to see the page's tags, so we must ensure
> +  * they have been initialised. if PG_mte_tagged is set, tags
> +  * have already been initialised.
> +  */
> + struct page *page = pfn_to_page(pfn);
> + unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
> +
> + for (i = 0; i < nr_pages; i++, page++) {
> + if (!test_and_set_bit(PG_mte_tagged, >flags))
> + mte_clear_page_tags(page_address(page));
> + }
> + }

Is there any reason to do this dance for anything but a translation
fault?

> +
>   if (writable)
>   prot |= KVM_PGTABLE_PROT_W;
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4f2f1e3145de..e09dbc00b0a2 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1046,7 +1046,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>   val |= FIELD_PREP(FEATURE(ID_AA64PFR0_CSV3), 
> (u64)vcpu->kvm->arch.pfr0_csv3);
>   break;
>   case SYS_ID_AA64PFR1_EL1:
> - val &= ~FEATURE(ID_AA64PFR1_MTE);
> + if (!kvm_has_mte(vcpu->kvm))
> + val &= ~FEATURE(ID_AA64PFR1_MTE);

Are we happy to expose *any* of the MTE flavours? Or should we
restrict it in any way?

>   break;
>   case SYS_ID_AA64ISAR1_EL1:
>   if (!vcpu_has_ptrauth(vcpu))
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 

Re: [PATCH kvmtool v2 22/22] hw/rtc: ARM/arm64: Use MMIO at higher addresses

2021-03-09 Thread Alexandru Elisei
Hi Andre,

On 2/25/21 12:59 AM, Andre Przywara wrote:
> Using the RTC device at its legacy I/O address as set by IBM in 1981
> was a kludge we used for simplicity on ARM platforms as well.
> However this imposes problems due to their missing alignment and overlap
> with the PCI I/O address space.
>
> Now that we can switch a device easily between using ioports and
> MMIO, let's move the RTC out of the first 4K of memory on ARM platforms.
>
> That should be transparent for well behaved guests, since the change is
> naturally reflected in the device tree.

Everything looks correct, compile tested for arm64 and x86-64. Also grep'ed for
0x70 (device ioport address) and DEVICE_BUS_IOPORT, no occurrence in the file
other than in the defines:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>
> Signed-off-by: Andre Przywara 
> ---
>  arm/include/arm-common/kvm-arch.h |  3 +++
>  hw/rtc.c  | 24 
>  2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/arm/include/arm-common/kvm-arch.h 
> b/arm/include/arm-common/kvm-arch.h
> index 633ea8fa..02100c48 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -31,6 +31,9 @@
>  #define ARM_UART_MMIO_BASE   ARM_MMIO_AREA
>  #define ARM_UART_MMIO_SIZE   0x1
>  
> +#define ARM_RTC_MMIO_BASE(ARM_UART_MMIO_BASE + ARM_UART_MMIO_SIZE)
> +#define ARM_RTC_MMIO_SIZE0x1
> +
>  #define KVM_FLASH_MMIO_BASE  (ARM_MMIO_AREA + 0x100)
>  #define KVM_FLASH_MAX_SIZE   0x100
>  
> diff --git a/hw/rtc.c b/hw/rtc.c
> index ee4c9102..aec31c52 100644
> --- a/hw/rtc.c
> +++ b/hw/rtc.c
> @@ -5,6 +5,15 @@
>  
>  #include 
>  
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +#define RTC_BUS_TYPE DEVICE_BUS_MMIO
> +#define RTC_BASE_ADDRESS ARM_RTC_MMIO_BASE
> +#else
> +/* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> +#define RTC_BUS_TYPE DEVICE_BUS_IOPORT
> +#define RTC_BASE_ADDRESS 0x70
> +#endif
> +
>  /*
>   * MC146818 RTC registers
>   */
> @@ -49,7 +58,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, u8 
> *data,
>   time_t ti;
>  
>   if (is_write) {
> - if (addr == 0x70) { /* index register */
> + if (addr == RTC_BASE_ADDRESS) { /* index register */
>   u8 value = ioport__read8(data);
>  
>   vcpu->kvm->nmi_disabled = value & (1UL << 7);
> @@ -70,7 +79,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, u8 
> *data,
>   return;
>   }
>  
> - if (addr == 0x70)
> + if (addr == RTC_BASE_ADDRESS)   /* index register is write-only */
>   return;
>  
>   time();
> @@ -127,7 +136,7 @@ static void generate_rtc_fdt_node(void *fdt,
>   u8 irq,
>   enum irq_type))
>  {
> - u64 reg_prop[2] = { cpu_to_fdt64(0x70), cpu_to_fdt64(2) };
> + u64 reg_prop[2] = { cpu_to_fdt64(RTC_BASE_ADDRESS), cpu_to_fdt64(2) };
>  
>   _FDT(fdt_begin_node(fdt, "rtc"));
>   _FDT(fdt_property_string(fdt, "compatible", "motorola,mc146818"));
> @@ -139,7 +148,7 @@ static void generate_rtc_fdt_node(void *fdt,
>  #endif
>  
>  struct device_header rtc_dev_hdr = {
> - .bus_type = DEVICE_BUS_IOPORT,
> + .bus_type = RTC_BUS_TYPE,
>   .data = generate_rtc_fdt_node,
>  };
>  
> @@ -151,8 +160,8 @@ int rtc__init(struct kvm *kvm)
>   if (r < 0)
>   return r;
>  
> - /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> - r = kvm__register_pio(kvm, 0x0070, 2, cmos_ram_io, NULL);
> + r = kvm__register_iotrap(kvm, RTC_BASE_ADDRESS, 2, cmos_ram_io, NULL,
> +  RTC_BUS_TYPE);
>   if (r < 0)
>   goto out_device;
>  
> @@ -170,8 +179,7 @@ dev_init(rtc__init);
>  
>  int rtc__exit(struct kvm *kvm)
>  {
> - /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> - kvm__deregister_pio(kvm, 0x0070);
> + kvm__deregister_iotrap(kvm, RTC_BASE_ADDRESS, RTC_BUS_TYPE);
>  
>   return 0;
>  }
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 21/22] hw/serial: ARM/arm64: Use MMIO at higher addresses

2021-03-09 Thread Alexandru Elisei
Hi Andre,

I think you forgot to change the way the address is generated in
serial8250_generate_fdt_node, it's still KVM_IOPORT_AREA + dev->iobase. It's
technically correct, as KVM_IOPORT_AREA == ARM_IOPORT_AREA == 0x0, but very
confusing (and prone to breakage is something changes in the memory layout).

One more comment below.

On 2/25/21 12:59 AM, Andre Przywara wrote:
> Using the UART devices at their legacy I/O addresses as set by IBM in
> 1981 was a kludge we used for simplicity on ARM platforms as well.
> However this imposes problems due to their missing alignment and overlap
> with the PCI I/O address space.
>
> Now that we can switch a device easily between using ioports and MMIO,
> let's move the UARTs out of the first 4K of memory on ARM platforms.
>
> That should be transparent for well behaved guests, since the change is
> naturally reflected in the device tree. Even "earlycon" keeps working,
> as the stdout-path property is adjusted automatically.
>
> People providing direct earlycon parameters via the command line need to
> adjust it to: "earlycon=uart,mmio,0x100".
>
> Signed-off-by: Andre Przywara 
> ---
>  arm/include/arm-common/kvm-arch.h |  3 +++
>  hw/serial.c   | 45 ---
>  2 files changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/arm/include/arm-common/kvm-arch.h 
> b/arm/include/arm-common/kvm-arch.h
> index b12255b0..633ea8fa 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -28,6 +28,9 @@
>  #define ARM_IOPORT_SIZE  (1U << 16)
>  
>  
> +#define ARM_UART_MMIO_BASE   ARM_MMIO_AREA
> +#define ARM_UART_MMIO_SIZE   0x1
> +
>  #define KVM_FLASH_MMIO_BASE  (ARM_MMIO_AREA + 0x100)
>  #define KVM_FLASH_MAX_SIZE   0x100
>  
> diff --git a/hw/serial.c b/hw/serial.c
> index 4be188a1..1854add2 100644
> --- a/hw/serial.c
> +++ b/hw/serial.c
> @@ -13,6 +13,17 @@
>  
>  #include 
>  
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +#define serial_iobase(nr)(ARM_UART_MMIO_BASE + (nr) * 0x1000)
> +#define serial_irq(nr)   (32 + (nr))
> +#define SERIAL8250_BUS_TYPE  DEVICE_BUS_MMIO
> +#else
> +#define serial_iobase(nr)nr) & 1) ? 0x200 : 0x300) + \
> +  ((nr) >= 2 ? 0xe8 : 0xf8))
> +#define serial_irq(nr)   (((nr) & 1) ? 3 : 4)

Those two defines are hard to read, is there a reason for changing them from v1?
They looked a lot more readable in v1.

Thanks,

Alex

> +#define SERIAL8250_BUS_TYPE  DEVICE_BUS_IOPORT
> +#endif
> +
>  /*
>   * This fakes a U6_16550A. The fifo len needs to be 64 as the kernel
>   * expects that for autodetection.
> @@ -27,7 +38,7 @@ struct serial8250_device {
>   struct mutexmutex;
>   u8  id;
>  
> - u16 iobase;
> + u32 iobase;
>   u8  irq;
>   u8  irq_state;
>   int txcnt;
> @@ -65,56 +76,56 @@ static struct serial8250_device devices[] = {
>   /* ttyS0 */
>   [0] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 0,
> - .iobase = 0x3f8,
> - .irq= 4,
> + .iobase = serial_iobase(0),
> + .irq= serial_irq(0),
>  
>   SERIAL_REGS_SETTING
>   },
>   /* ttyS1 */
>   [1] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 1,
> - .iobase = 0x2f8,
> - .irq= 3,
> + .iobase = serial_iobase(1),
> + .irq= serial_irq(1),
>  
>   SERIAL_REGS_SETTING
>   },
>   /* ttyS2 */
>   [2] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 2,
> - .iobase = 0x3e8,
> - .irq= 4,
> + .iobase = serial_iobase(2),
> + .irq= 

Re: [PATCH kvmtool v2 20/22] arm: Reorganise and document memory map

2021-03-09 Thread Alexandru Elisei
Hi Andre,

This is a really good idea, thank you for implementing it!

Some comments below.

On 2/25/21 12:59 AM, Andre Przywara wrote:
> The hardcoded memory map we expose to a guest is currently described
> using a series of partially interconnected preprocessor constants,
> which is hard to read and follow.
>
> In preparation for moving the UART and RTC to some different MMIO
> region, document the current map with some ASCII art, and clean up the
> definition of the sections.
>
> No functional change.
>
> Signed-off-by: Andre Przywara 
> ---
>  arm/include/arm-common/kvm-arch.h | 41 ++-
>  1 file changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/arm/include/arm-common/kvm-arch.h 
> b/arm/include/arm-common/kvm-arch.h
> index d84e50cd..b12255b0 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -7,14 +7,33 @@
>  
>  #include "arm-common/gic.h"
>  
> +/*
> + * The memory map used for ARM guests (not to scale):
> + *
> + * 0  64K  16M 32M 48M1GB   2GB
> + * +---+-..-+---+---+----+-+--.--+---..
> + * | (PCI) || int.  |   || | |
> + * |  I/O  || MMIO: | Flash | virtio | GIC |   PCI   |  DRAM
> + * | ports || UART, |   |  MMIO  | |  (AXI)  |
> + * |   || RTC   |   || | |
> + * +---+-..-+---+---+----+-+--.--+---..
> + */

Nitpick: I searched the PCI Local Bus Specification revision 3.0 (which kvmtool
currently implements) for the term I/O ports, and found one mention in a 
schematic
for an add-in card. The I/O region is called in the spec I/O Space.

I don't know what "int." means in the region for the UART and RTC.

The comment says that the art is not to scale, so I don't think there's any need
for the "..." between the corners of the regions. To my eyes, it makes the ASCII
art look crooked.

The next patches add the UART and RTC outside the first 64K, I think the region
should be documented in the patches where the changes are made, not here. 
Another
alternative would be to move this patch to the end of the series instead of
incrementally changing the memory ASCII art (which I imagine is time consuming).

Otherwise, the numbers look OK.

> +
>  #define ARM_IOPORT_AREA  _AC(0x, UL)
> -#define ARM_FLASH_AREA   _AC(0x0200, UL)
> -#define ARM_MMIO_AREA_AC(0x0300, UL)
> +#define ARM_MMIO_AREA_AC(0x0100, UL)

The patch says it is *documenting* the memory layout, but here it is *changing*
the layout. Other than that, I like the shuffling of definitions so the kvmtool
global defines are closer to the arch values.

Thanks,

Alex

>  #define ARM_AXI_AREA _AC(0x4000, UL)
>  #define ARM_MEMORY_AREA  _AC(0x8000, UL)
>  
> -#define ARM_LOMAP_MAX_MEMORY ((1ULL << 32) - ARM_MEMORY_AREA)
> -#define ARM_HIMAP_MAX_MEMORY ((1ULL << 40) - ARM_MEMORY_AREA)
> +#define KVM_IOPORT_AREA  ARM_IOPORT_AREA
> +#define ARM_IOPORT_SIZE  (1U << 16)
> +
> +
> +#define KVM_FLASH_MMIO_BASE  (ARM_MMIO_AREA + 0x100)
> +#define KVM_FLASH_MAX_SIZE   0x100
> +
> +#define KVM_VIRTIO_MMIO_AREA (KVM_FLASH_MMIO_BASE + KVM_FLASH_MAX_SIZE)
> +#define ARM_VIRTIO_MMIO_SIZE (ARM_AXI_AREA - \
> + (KVM_VIRTIO_MMIO_AREA + ARM_GIC_SIZE))
>  
>  #define ARM_GIC_DIST_BASE(ARM_AXI_AREA - ARM_GIC_DIST_SIZE)
>  #define ARM_GIC_CPUI_BASE(ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE)
> @@ -22,19 +41,17 @@
>  #define ARM_GIC_DIST_SIZE0x1
>  #define ARM_GIC_CPUI_SIZE0x2
>  
> -#define KVM_FLASH_MMIO_BASE  ARM_FLASH_AREA
> -#define KVM_FLASH_MAX_SIZE   (ARM_MMIO_AREA - ARM_FLASH_AREA)
>  
> -#define ARM_IOPORT_SIZE  (1U << 16)
> -#define ARM_VIRTIO_MMIO_SIZE (ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
> +#define KVM_PCI_CFG_AREA ARM_AXI_AREA
>  #define ARM_PCI_CFG_SIZE (1ULL << 24)
> +#define KVM_PCI_MMIO_AREA(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>  #define ARM_PCI_MMIO_SIZE(ARM_MEMORY_AREA - \
>   (ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>  
> -#define KVM_IOPORT_AREA  ARM_IOPORT_AREA
> -#define KVM_PCI_CFG_AREA ARM_AXI_AREA
> -#define KVM_PCI_MMIO_AREA(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
> -#define KVM_VIRTIO_MMIO_AREA ARM_MMIO_AREA
> +
> +#define ARM_LOMAP_MAX_MEMORY ((1ULL << 32) - ARM_MEMORY_AREA)
> +#define ARM_HIMAP_MAX_MEMORY ((1ULL << 40) - ARM_MEMORY_AREA)
> +
>  
>  #define KVM_IOEVENTFD_HAS_PIO0
>  
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 19/22] Remove ioport specific routines

2021-03-09 Thread Alexandru Elisei
Hi Andre,

On 2/25/21 12:59 AM, Andre Przywara wrote:
> Now that all users of the dedicated ioport trap handler interface are
> gone, we can retire the code associated with it.
>
> This removes ioport.c and ioport.h, along with removing prototypes from
> other header files.
>
> This also transfers the responsibility for port I/O trap handling
> entirely into the new routine in mmio.c.

Looks good, and everything compiles on arm64 and x86-64:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>
> Signed-off-by: Andre Przywara 
> ---
>  Makefile |   1 -
>  include/kvm/ioport.h |  27 --
>  include/kvm/kvm.h|   2 -
>  ioport.c | 195 ---
>  mmio.c   |   2 +-
>  5 files changed, 1 insertion(+), 226 deletions(-)
>  delete mode 100644 ioport.c
>
> diff --git a/Makefile b/Makefile
> index 35bb1182..94ff5da6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -56,7 +56,6 @@ OBJS+= framebuffer.o
>  OBJS += guest_compat.o
>  OBJS += hw/rtc.o
>  OBJS += hw/serial.o
> -OBJS += ioport.o
>  OBJS += irq.o
>  OBJS += kvm-cpu.o
>  OBJS += kvm.o
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index a61038e2..b6f579cb 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -1,13 +1,8 @@
>  #ifndef KVM__IOPORT_H
>  #define KVM__IOPORT_H
>  
> -#include "kvm/devices.h"
>  #include "kvm/kvm-cpu.h"
> -#include "kvm/rbtree-interval.h"
> -#include "kvm/fdt.h"
>  
> -#include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -15,30 +10,8 @@
>  /* some ports we reserve for own use */
>  #define IOPORT_DBG   0xe0
>  
> -struct kvm;
> -
> -struct ioport {
> - struct rb_int_node  node;
> - struct ioport_operations*ops;
> - void*priv;
> - struct device_headerdev_hdr;
> - u32 refcount;
> - boolremove;
> -};
> -
> -struct ioport_operations {
> - bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> void *data, int size);
> - bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> void *data, int size);
> -};
> -
>  void ioport__map_irq(u8 *irq);
>  
> -int __must_check ioport__register(struct kvm *kvm, u16 port, struct 
> ioport_operations *ops,
> -   int count, void *param);
> -int ioport__unregister(struct kvm *kvm, u16 port);
> -int ioport__init(struct kvm *kvm);
> -int ioport__exit(struct kvm *kvm);
> -
>  static inline u8 ioport__read8(u8 *data)
>  {
>   return *data;
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 306b258a..6c28afa3 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -126,8 +126,6 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int 
> direction, int size, u32 count);
>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 
> len, u8 is_write);
> -bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
> -   int direction, int size, u32 count);
>  int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> *userspace_addr);
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> *userspace_addr,
> enum kvm_mem_type type);
> diff --git a/ioport.c b/ioport.c
> deleted file mode 100644
> index ce29e7e7..
> --- a/ioport.c
> +++ /dev/null
> @@ -1,195 +0,0 @@
> -#include "kvm/ioport.h"
> -
> -#include "kvm/kvm.h"
> -#include "kvm/util.h"
> -#include "kvm/rbtree-interval.h"
> -#include "kvm/mutex.h"
> -
> -#include/* for KVM_EXIT_* */
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define ioport_node(n) rb_entry(n, struct ioport, node)
> -
> -static DEFINE_MUTEX(ioport_lock);
> -
> -static struct rb_rootioport_tree = RB_ROOT;
> -
> -static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> -{
> - struct rb_int_node *node;
> -
> - node = rb_int_search_single(root, addr);
> - if (node == NULL)
> - return NULL;
> -
> - return ioport_node(node);
> -}
> -
> -static int ioport_insert(struct rb_root *root, struct ioport *data)
> -{
> - return rb_int_insert(root, >node);
> -}
> -
> -static void ioport_remove(struct rb_root *root, struct ioport *data)
> -{
> - rb_int_erase(root, >node);
> -}
> -
> -static struct ioport *ioport_get(struct rb_root *root, u64 addr)
> -{
> - struct ioport *ioport;
> -
> - mutex_lock(_lock);
> - ioport = ioport_search(root, addr);
> - if (ioport)
> - ioport->refcount++;
> - mutex_unlock(_lock);
> -
> - return ioport;
> -}
> -
> -/* Called with ioport_lock held. */
> -static void ioport_unregister(struct rb_root *root, struct ioport *data)
> -{
> - 

Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-09 Thread Anshuman Khandual



On 3/9/21 7:35 PM, Will Deacon wrote:
> On Mon, Mar 08, 2021 at 02:42:00PM +, Marc Zyngier wrote:
>> On Fri, 05 Mar 2021 14:36:09 +,
>> Anshuman Khandual  wrote:
>>> -   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
>>> -   default:
>>> -   case 1:
>>> +   tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift);
>>> +   if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE) {
>>> kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
>>> return -EINVAL;
>>> -   case 0:
>>> +   } else if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT) {
>>> kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
>>> -   break;
>>> -   case 2:
>>> +   } else if (tgran_2 >= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN &&
>>> +  tgran_2 <= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX) {
>>> kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
>>> -   break;
>>> +   } else {
>>> +   kvm_err("Unsupported value, giving up\n");
>>> +   return -EINVAL;
>>
>> nit: this doesn't say *what* value is unsupported, and I really
>> preferred the switch-case version, such as this:
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 1f22b36a0eff..d267e4b1aec6 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -312,15 +312,18 @@ int kvm_set_ipa_limit(void)
>>  
>>  switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
>>  default:
>> -case 1:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
>>  kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
>>  return -EINVAL;
>> -case 0:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
>>  kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
>>  break;
>> -case 2:
>> +case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... 
>> ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
>>  kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
>>  break;
>> +default:
>> +kvm_err("Unsupported value for TGRAN_2, giving up\n");
>> +return -EINVAL;
>>  }
>>  
>>  kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
>>
>>
>> Otherwise:
>>
>> Acked-by: Marc Zyngier 
> 
> Anshuman -- please can you spin a v2 with the switch syntax as suggested
> above by Marc?

Sure, will do.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Marc Zyngier
On Tue, 09 Mar 2021 14:29:10 +,
Andrew Jones  wrote:
> 
> On Tue, Mar 09, 2021 at 01:43:40PM +, Marc Zyngier wrote:
> > Hi Andrew,
> > 
> > On Tue, 09 Mar 2021 13:20:21 +,
> > Andrew Jones  wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> > > > KVM/arm64 has forever used a 40bit default IPA space, partially
> > > > due to its 32bit heritage (where the only choice is 40bit).
> > > > 
> > > > However, there are implementations in the wild that have a *cough*
> > > > much smaller *cough* IPA space, which leads to a misprogramming of
> > > > VTCR_EL2, and a guest that is stuck on its first memory access
> > > > if userspace dares to ask for the default IPA setting (which most
> > > > VMMs do).
> > > > 
> > > > Instead, cap the default IPA size to what the host can actually
> > > > do, and spit out a one-off message on the console. The boot warning
> > > > is turned into a more meaningfull message, and the new behaviour
> > > > is also documented.
> > > > 
> > > > Although this is a userspace ABI change, it doesn't really change
> > > > much for userspace:
> > > > 
> > > > - the guest couldn't run before this change, while it now has
> > > >   a chance to if the memory range fits the reduced IPA space
> > > > 
> > > > - a memory slot that was accepted because it did fit the default
> > > >   IPA space but didn't fit the HW constraints is now properly
> > > >   rejected
> > > 
> > > I'm not sure deferring the misconfiguration error until memslot
> > > request time is better than just failing to create a VM. If
> > > userspace doesn't use KVM_CAP_ARM_VM_IPA_SIZE to determine the
> > > limit (which it hasn't been obliged to do) and it is able to
> > > successfully create a VM, then it will assume up to 40-bit IPAs
> > > are supported. Later, when it tries to add memslots and fails
> > > it may be confused, especially if that later is much, much later
> > > with memory hotplug.
> > 
> > That's a fair point. However, no existing userspace will work on these
> > systems. Is that what we want to do? I don't care much, but having
> > non-usable defaults feel a bit... odd. I do spit out a warning, but I
> > agree this isn't great either.
> 
> I can send patches for QEMU, KVM selftests, and maybe even rust-vmm.
> Can you point me to something about these systems I can reference
> in my postings? Or I can just reference this mail thread.

The system of choice to see this is an Apple M1 box. Not supported in
mainline yet, but things are progressing pretty quickly.

> 
> > 
> > > > The other thing that's left doing is to convince userspace to
> > > > actually use the IPA space setting instead of relying on the
> > > > antiquated default.
> > > 
> > > Failing to create any VM which hasn't selected a valid IPA limit
> > > should be pretty convincing :-)
> > 
> > I'll make sure to redirect the reports your way! :D
> 
> What's the current error message when this occurs? Is it good enough, or
> should we improve it to help provide people hints? Please don't change
> it to "Invalid IPA limit, please mail Andrew Jones" :-)

Well, that's part of the problem. Currently, you don't get a message,
and the guest faults on its first memory access forever (level 0
translation fault), as the VTCR_EL2.T0SZ value is bogus.

I can change this patch to reject 40bit IPA when requested as a
default with something saying "Userspace using unsupported default IPA
limit, upgrade your VMM".

Now, there is another nit[1] which I just found with my kvmtool setup
that computes the optimal IPA space for a given VM. And that one is
even more problematic...

Thanks,

M.

[1] https://lore.kernel.org/r/87lfawxv40.wl-...@kernel.org

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 17/22] virtio: Switch trap handling to use MMIO handler

2021-03-09 Thread Alexandru Elisei
Hi Andre,

On 2/25/21 12:59 AM, Andre Przywara wrote:
> With the planned retirement of the special ioport emulation code, we
> need to provide an emulation function compatible with the MMIO prototype.
>
> Adjust the existing MMIO callback routine to automatically determine
> the region this trap came through, and call the existing I/O handlers.
> Register the ioport region using the new registration function.

Looks good to me:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>
> Signed-off-by: Andre Przywara 
> ---
>  virtio/pci.c | 46 ++
>  1 file changed, 14 insertions(+), 32 deletions(-)
>
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 6eea6c68..eb91f512 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -178,15 +178,6 @@ static bool virtio_pci__data_in(struct kvm_cpu *vcpu, 
> struct virtio_device *vdev
>   return ret;
>  }
>  
> -static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> -{
> - struct virtio_device *vdev = ioport->priv;
> - struct virtio_pci *vpci = vdev->virtio;
> - unsigned long offset = port - virtio_pci__port_addr(vpci);
> -
> - return virtio_pci__data_in(vcpu, vdev, offset, data, size);
> -}
> -
>  static void update_msix_map(struct virtio_pci *vpci,
>   struct msix_table *msix_entry, u32 vecnum)
>  {
> @@ -334,20 +325,6 @@ static bool virtio_pci__data_out(struct kvm_cpu *vcpu, 
> struct virtio_device *vde
>   return ret;
>  }
>  
> -static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> -{
> - struct virtio_device *vdev = ioport->priv;
> - struct virtio_pci *vpci = vdev->virtio;
> - unsigned long offset = port - virtio_pci__port_addr(vpci);
> -
> - return virtio_pci__data_out(vcpu, vdev, offset, data, size);
> -}
> -
> -static struct ioport_operations virtio_pci__io_ops = {
> - .io_in  = virtio_pci__io_in,
> - .io_out = virtio_pci__io_out,
> -};
> -
>  static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
>  u64 addr, u8 *data, u32 len,
>  u8 is_write, void *ptr)
> @@ -455,12 +432,19 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu 
> *vcpu,
>  {
>   struct virtio_device *vdev = ptr;
>   struct virtio_pci *vpci = vdev->virtio;
> - u32 mmio_addr = virtio_pci__mmio_addr(vpci);
> + u32 ioport_addr = virtio_pci__port_addr(vpci);
> + u32 base_addr;
> +
> + if (addr >= ioport_addr &&
> + addr < ioport_addr + pci__bar_size(>pci_hdr, 0))
> + base_addr = ioport_addr;
> + else
> + base_addr = virtio_pci__mmio_addr(vpci);
>  
>   if (!is_write)
> - virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
> + virtio_pci__data_in(vcpu, vdev, addr - base_addr, data, len);
>   else
> - virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
> + virtio_pci__data_out(vcpu, vdev, addr - base_addr, data, len);
>  }
>  
>  static int virtio_pci__bar_activate(struct kvm *kvm,
> @@ -478,10 +462,8 @@ static int virtio_pci__bar_activate(struct kvm *kvm,
>  
>   switch (bar_num) {
>   case 0:
> - r = ioport__register(kvm, bar_addr, _pci__io_ops,
> -  bar_size, vdev);
> - if (r > 0)
> - r = 0;
> + r = kvm__register_pio(kvm, bar_addr, bar_size,
> +   virtio_pci__io_mmio_callback, vdev);
>   break;
>   case 1:
>   r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> @@ -510,7 +492,7 @@ static int virtio_pci__bar_deactivate(struct kvm *kvm,
>  
>   switch (bar_num) {
>   case 0:
> - r = ioport__unregister(kvm, bar_addr);
> + r = kvm__deregister_pio(kvm, bar_addr);
>   break;
>   case 1:
>   case 2:
> @@ -625,7 +607,7 @@ int virtio_pci__exit(struct kvm *kvm, struct 
> virtio_device *vdev)
>   virtio_pci__reset(kvm, vdev);
>   kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci));
>   kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci));
> - ioport__unregister(kvm, virtio_pci__port_addr(vpci));
> + kvm__deregister_pio(kvm, virtio_pci__port_addr(vpci));
>  
>   return 0;
>  }
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

2021-03-09 Thread Catalin Marinas
On Tue, Mar 09, 2021 at 01:26:46PM +, Will Deacon wrote:
> On Wed, Mar 03, 2021 at 04:45:05PM +, Marc Zyngier wrote:
> > It recently became apparent that the ARMv8 architecture has interesting
> > rules regarding attributes being used when fetching instructions
> > if the MMU is off at Stage-1.
> > 
> > In this situation, the CPU is allowed to fetch from the PoC and
> > allocate into the I-cache (unless the memory is mapped with
> > the XN attribute at Stage-2).
> > 
> > If we transpose this to vcpus sharing a single physical CPU,
> > it is possible for a vcpu running with its MMU off to influence
> > another vcpu running with its MMU on, as the latter is expected to
> > fetch from the PoU (and self-patching code doesn't flush below that
> > level).
> > 
> > In order to solve this, reuse the vcpu-private TLB invalidation
> > code to apply the same policy to the I-cache, nuking it every time
> > the vcpu runs on a physical CPU that ran another vcpu of the same
> > VM in the past.
> > 
> > This involve renaming __kvm_tlb_flush_local_vmid() to
> > __kvm_flush_cpu_context(), and inserting a local i-cache invalidation
> > there.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   | 4 ++--
> >  arch/arm64/kvm/arm.c   | 7 ++-
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
> >  arch/arm64/kvm/hyp/nvhe/tlb.c  | 3 ++-
> >  arch/arm64/kvm/hyp/vhe/tlb.c   | 3 ++-
> >  5 files changed, 15 insertions(+), 8 deletions(-)
> 
> Since the FWB discussion doesn't affect the correctness of this patch:
> 
> Acked-by: Will Deacon 

I agree. We can optimise it later for FWB.

Acked-by: Catalin Marinas 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Andrew Jones
On Tue, Mar 09, 2021 at 01:43:40PM +, Marc Zyngier wrote:
> Hi Andrew,
> 
> On Tue, 09 Mar 2021 13:20:21 +,
> Andrew Jones  wrote:
> > 
> > Hi Marc,
> > 
> > On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> > > KVM/arm64 has forever used a 40bit default IPA space, partially
> > > due to its 32bit heritage (where the only choice is 40bit).
> > > 
> > > However, there are implementations in the wild that have a *cough*
> > > much smaller *cough* IPA space, which leads to a misprogramming of
> > > VTCR_EL2, and a guest that is stuck on its first memory access
> > > if userspace dares to ask for the default IPA setting (which most
> > > VMMs do).
> > > 
> > > Instead, cap the default IPA size to what the host can actually
> > > do, and spit out a one-off message on the console. The boot warning
> > > is turned into a more meaningfull message, and the new behaviour
> > > is also documented.
> > > 
> > > Although this is a userspace ABI change, it doesn't really change
> > > much for userspace:
> > > 
> > > - the guest couldn't run before this change, while it now has
> > >   a chance to if the memory range fits the reduced IPA space
> > > 
> > > - a memory slot that was accepted because it did fit the default
> > >   IPA space but didn't fit the HW constraints is now properly
> > >   rejected
> > 
> > I'm not sure deferring the misconfiguration error until memslot
> > request time is better than just failing to create a VM. If
> > userspace doesn't use KVM_CAP_ARM_VM_IPA_SIZE to determine the
> > limit (which it hasn't been obliged to do) and it is able to
> > successfully create a VM, then it will assume up to 40-bit IPAs
> > are supported. Later, when it tries to add memslots and fails
> > it may be confused, especially if that later is much, much later
> > with memory hotplug.
> 
> That's a fair point. However, no existing userspace will work on these
> systems. Is that what we want to do? I don't care much, but having
> non-usable defaults feel a bit... odd. I do spit out a warning, but I
> agree this isn't great either.

I can send patches for QEMU, KVM selftests, and maybe even rust-vmm.
Can you point me to something about these systems I can reference
in my postings? Or I can just reference this mail thread.

> 
> > > The other thing that's left doing is to convince userspace to
> > > actually use the IPA space setting instead of relying on the
> > > antiquated default.
> > 
> > Failing to create any VM which hasn't selected a valid IPA limit
> > should be pretty convincing :-)
> 
> I'll make sure to redirect the reports your way! :D

What's the current error message when this occurs? Is it good enough, or
should we improve it to help provide people hints? Please don't change
it to "Invalid IPA limit, please mail Andrew Jones" :-)

Thanks,
drew

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Marc Zyngier
Hi Suzuki,

On Tue, 09 Mar 2021 11:09:48 +,
Suzuki Poulose  wrote:
> 
> 
> 
> > On 8 Mar 2021, at 17:46, Marc Zyngier  wrote:
> > 
> > KVM/arm64 has forever used a 40bit default IPA space, partially
> > due to its 32bit heritage (where the only choice is 40bit).
> > 
> > However, there are implementations in the wild that have a *cough*
> > much smaller *cough* IPA space, which leads to a misprogramming of
> > VTCR_EL2, and a guest that is stuck on its first memory access
> > if userspace dares to ask for the default IPA setting (which most
> > VMMs do).
> > 
> > Instead, cap the default IPA size to what the host can actually
> > do, and spit out a one-off message on the console. The boot warning
> > is turned into a more meaningfull message, and the new behaviour
> > is also documented.
> > 
> > Although this is a userspace ABI change, it doesn't really change
> > much for userspace:
> > 
> > - the guest couldn't run before this change, while it now has
> >  a chance to if the memory range fits the reduced IPA space
> > 
> > - a memory slot that was accepted because it did fit the default
> >  IPA space but didn't fit the HW constraints is now properly
> >  rejected
> > 
> > The other thing that's left doing is to convince userspace to
> > actually use the IPA space setting instead of relying on the
> > antiquated default.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> 
> Reviewed-by: Suzuki K Poulose 

Thanks for that. Whilst I have your attention and given that you are
responsible for most of the variable IPA stuff... ;-)

I think we have another issue around the handling of our IPA
size. Let's say I create a VM with a 32bit IPA space. If I register a
2GB memslot at 0x800, I'm getting an error, which I think is
bogus.

I came to the conclusion that kvm_arch_prepare_memory_region() is a
bit overzealous when rejecting the memslot, and I used the following
patchlet to address it.

Does this seem sensible to you?

M.

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..8711894db8c2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1312,8 +1312,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 * Prevent userspace from creating a memory region outside of the IPA
 * space addressable by the KVM guest IPA space.
 */
-   if (memslot->base_gfn + memslot->npages >=
-   (kvm_phys_size(kvm) >> PAGE_SHIFT))
+   if ((memslot->base_gfn + memslot->npages) > (kvm_phys_size(kvm) >> 
PAGE_SHIFT))
return -EFAULT;
 
mmap_read_lock(current->mm);

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 15/22] vfio: Refactor ioport trap handler

2021-03-09 Thread Alexandru Elisei
Hi Andre,

On 2/25/21 12:59 AM, Andre Przywara wrote:
> With the planned retirement of the special ioport emulation code, we
> need to provide an emulation function compatible with the MMIO prototype.
>
> Adjust the I/O port trap handler to use that new function, and provide
> shims to implement the old ioport interface, for now.
>
> Signed-off-by: Andre Przywara 
> ---
>  vfio/core.c | 51 ---
>  1 file changed, 36 insertions(+), 15 deletions(-)
>
> diff --git a/vfio/core.c b/vfio/core.c
> index 0b45e78b..ddd3c2c7 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -81,15 +81,12 @@ out_free_buf:
>   return ret;
>  }
>  
> -static bool vfio_ioport_in(struct ioport *ioport, struct kvm_cpu *vcpu,
> -u16 port, void *data, int len)
> +static bool _vfio_ioport_in(struct vfio_region *region, u32 offset,
> + void *data, int len)
>  {
> - u32 val;
> - ssize_t nr;
> - struct vfio_region *region = ioport->priv;
>   struct vfio_device *vdev = region->vdev;
> -
> - u32 offset = port - region->port_base;
> + ssize_t nr;
> + u32 val;
>  
>   if (!(region->info.flags & VFIO_REGION_INFO_FLAG_READ))
>   return false;
> @@ -97,7 +94,7 @@ static bool vfio_ioport_in(struct ioport *ioport, struct 
> kvm_cpu *vcpu,
>   nr = pread(vdev->fd, , len, region->info.offset + offset);
>   if (nr != len) {
>   vfio_dev_err(vdev, "could not read %d bytes from I/O port 
> 0x%x\n",
> -  len, port);
> +  len, offset + region->port_base);
>   return false;
>   }
>  
> @@ -118,15 +115,13 @@ static bool vfio_ioport_in(struct ioport *ioport, 
> struct kvm_cpu *vcpu,
>   return true;
>  }
>  
> -static bool vfio_ioport_out(struct ioport *ioport, struct kvm_cpu *vcpu,
> - u16 port, void *data, int len)
> +static bool _vfio_ioport_out(struct vfio_region *region, u32 offset,
> +  void *data, int len)
>  {
> - u32 val;
> - ssize_t nr;
> - struct vfio_region *region = ioport->priv;
>   struct vfio_device *vdev = region->vdev;
> + ssize_t nr;
> + u32 val;
>  
> - u32 offset = port - region->port_base;
>  
>   if (!(region->info.flags & VFIO_REGION_INFO_FLAG_WRITE))
>   return false;
> @@ -148,11 +143,37 @@ static bool vfio_ioport_out(struct ioport *ioport, 
> struct kvm_cpu *vcpu,
>   nr = pwrite(vdev->fd, , len, region->info.offset + offset);
>   if (nr != len)
>   vfio_dev_err(vdev, "could not write %d bytes to I/O port 0x%x",
> -  len, port);
> +  len, offset + region->port_base);
>  
>   return nr == len;
>  }
>  
> +static void vfio_ioport_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 
> len,
> +  u8 is_write, void *ptr)
> +{
> + struct vfio_region *region = ptr;
> + u32 offset = addr - region->port_base;
> +
> + if (is_write)
> + _vfio_ioport_out(region, offset, data, len);
> + else
> + _vfio_ioport_in(region, offset, data, len);
> +}
> +
> +static bool vfio_ioport_out(struct ioport *ioport, struct kvm_cpu *vcpu,
> + u16 port, void *data, int len)
> +{
> + vfio_ioport_mmio(vcpu, port, data, len, true, ioport->priv);
> + return true;
> +}
> +
> +static bool vfio_ioport_in(struct ioport *ioport, struct kvm_cpu *vcpu,
> +u16 port, void *data, int len)
> +{
> + vfio_ioport_mmio(vcpu, port, data, len, false, ioport->priv);
> + return true;
> +}
> +
>  static struct ioport_operations vfio_ioport_ops = {
>   .io_in  = vfio_ioport_in,
>   .io_out = vfio_ioport_out,

The new code looks functionally identical to the old one:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-09 Thread Will Deacon
On Mon, Mar 08, 2021 at 02:42:00PM +, Marc Zyngier wrote:
> On Fri, 05 Mar 2021 14:36:09 +,
> Anshuman Khandual  wrote:
> > -   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
> > -   default:
> > -   case 1:
> > +   tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift);
> > +   if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE) {
> > kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
> > return -EINVAL;
> > -   case 0:
> > +   } else if (tgran_2 == ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT) {
> > kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
> > -   break;
> > -   case 2:
> > +   } else if (tgran_2 >= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN &&
> > +  tgran_2 <= ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX) {
> > kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
> > -   break;
> > +   } else {
> > +   kvm_err("Unsupported value, giving up\n");
> > +   return -EINVAL;
> 
> nit: this doesn't say *what* value is unsupported, and I really
> preferred the switch-case version, such as this:
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 1f22b36a0eff..d267e4b1aec6 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -312,15 +312,18 @@ int kvm_set_ipa_limit(void)
>  
>   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
>   default:
> - case 1:
> + case ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE:
>   kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
>   return -EINVAL;
> - case 0:
> + case ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT:
>   kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
>   break;
> - case 2:
> + case ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN ... 
> ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX:
>   kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
>   break;
> + default:
> + kvm_err("Unsupported value for TGRAN_2, giving up\n");
> + return -EINVAL;
>   }
>  
>   kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
> 
> 
> Otherwise:
> 
> Acked-by: Marc Zyngier 

Anshuman -- please can you spin a v2 with the switch syntax as suggested
above by Marc?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Marc Zyngier
Hi Andrew,

On Tue, 09 Mar 2021 13:20:21 +,
Andrew Jones  wrote:
> 
> Hi Marc,
> 
> On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> > KVM/arm64 has forever used a 40bit default IPA space, partially
> > due to its 32bit heritage (where the only choice is 40bit).
> > 
> > However, there are implementations in the wild that have a *cough*
> > much smaller *cough* IPA space, which leads to a misprogramming of
> > VTCR_EL2, and a guest that is stuck on its first memory access
> > if userspace dares to ask for the default IPA setting (which most
> > VMMs do).
> > 
> > Instead, cap the default IPA size to what the host can actually
> > do, and spit out a one-off message on the console. The boot warning
> > is turned into a more meaningfull message, and the new behaviour
> > is also documented.
> > 
> > Although this is a userspace ABI change, it doesn't really change
> > much for userspace:
> > 
> > - the guest couldn't run before this change, while it now has
> >   a chance to if the memory range fits the reduced IPA space
> > 
> > - a memory slot that was accepted because it did fit the default
> >   IPA space but didn't fit the HW constraints is now properly
> >   rejected
> 
> I'm not sure deferring the misconfiguration error until memslot
> request time is better than just failing to create a VM. If
> userspace doesn't use KVM_CAP_ARM_VM_IPA_SIZE to determine the
> limit (which it hasn't been obliged to do) and it is able to
> successfully create a VM, then it will assume up to 40-bit IPAs
> are supported. Later, when it tries to add memslots and fails
> it may be confused, especially if that later is much, much later
> with memory hotplug.

That's a fair point. However, no existing userspace will work on these
systems. Is that what we want to do? I don't care much, but having
non-usable defaults feel a bit... odd. I do spit out a warning, but I
agree this isn't great either.

> > The other thing that's left doing is to convince userspace to
> > actually use the IPA space setting instead of relying on the
> > antiquated default.
> 
> Failing to create any VM which hasn't selected a valid IPA limit
> should be pretty convincing :-)

I'll make sure to redirect the reports your way! :D

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

2021-03-09 Thread Will Deacon
On Wed, Mar 03, 2021 at 04:45:05PM +, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).
> 
> If we transpose this to vcpus sharing a single physical CPU,
> it is possible for a vcpu running with its MMU off to influence
> another vcpu running with its MMU on, as the latter is expected to
> fetch from the PoU (and self-patching code doesn't flush below that
> level).
> 
> In order to solve this, reuse the vcpu-private TLB invalidation
> code to apply the same policy to the I-cache, nuking it every time
> the vcpu runs on a physical CPU that ran another vcpu of the same
> VM in the past.
> 
> This involve renaming __kvm_tlb_flush_local_vmid() to
> __kvm_flush_cpu_context(), and inserting a local i-cache invalidation
> there.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_asm.h   | 4 ++--
>  arch/arm64/kvm/arm.c   | 7 ++-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
>  arch/arm64/kvm/hyp/nvhe/tlb.c  | 3 ++-
>  arch/arm64/kvm/hyp/vhe/tlb.c   | 3 ++-
>  5 files changed, 15 insertions(+), 8 deletions(-)

Since the FWB discussion doesn't affect the correctness of this patch:

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Andrew Jones
Hi Marc,

On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> KVM/arm64 has forever used a 40bit default IPA space, partially
> due to its 32bit heritage (where the only choice is 40bit).
> 
> However, there are implementations in the wild that have a *cough*
> much smaller *cough* IPA space, which leads to a misprogramming of
> VTCR_EL2, and a guest that is stuck on its first memory access
> if userspace dares to ask for the default IPA setting (which most
> VMMs do).
> 
> Instead, cap the default IPA size to what the host can actually
> do, and spit out a one-off message on the console. The boot warning
> is turned into a more meaningfull message, and the new behaviour
> is also documented.
> 
> Although this is a userspace ABI change, it doesn't really change
> much for userspace:
> 
> - the guest couldn't run before this change, while it now has
>   a chance to if the memory range fits the reduced IPA space
> 
> - a memory slot that was accepted because it did fit the default
>   IPA space but didn't fit the HW constraints is now properly
>   rejected

I'm not sure deferring the misconfiguration error until memslot
request time is better than just failing to create a VM. If
userspace doesn't use KVM_CAP_ARM_VM_IPA_SIZE to determine the
limit (which it hasn't been obliged to do) and it is able to
successfully create a VM, then it will assume up to 40-bit IPAs
are supported. Later, when it tries to add memslots and fails
it may be confused, especially if that later is much, much later
with memory hotplug.

> 
> The other thing that's left doing is to convince userspace to
> actually use the IPA space setting instead of relying on the
> antiquated default.

Failing to create any VM which hasn't selected a valid IPA limit
should be pretty convincing :-)

Thanks,
drew

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v2 09/22] x86/ioport: Switch to new trap handlers

2021-03-09 Thread Alexandru Elisei
Hi Andre,

On 2/25/21 12:59 AM, Andre Przywara wrote:
> Now that the x86 I/O ports have trap handlers adhering to the MMIO fault
> handler prototype, let's switch over to the joint registration routine.
>
> This allows us to get rid of the ioport shim routines.
>
> Since the debug output was done in ioport.c, we would lose this
> functionality when moving over to the MMIO handlers. So bring this back
> here explicitly, by introducing debug_mmio().
>
> Signed-off-by: Andre Przywara 
> ---
>  x86/ioport.c | 102 +++
>  1 file changed, 37 insertions(+), 65 deletions(-)
>
> diff --git a/x86/ioport.c b/x86/ioport.c
> index 78f9a863..9fcbb6c9 100644
> --- a/x86/ioport.c
> +++ b/x86/ioport.c
> @@ -3,21 +3,35 @@
>  #include 
>  #include 
>  
> -static void dummy_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
> +static void debug_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
>  u8 is_write, void *ptr)

Since the function is about ioports (it even checks cfg.ioport_debug), shouldn't
something like debug_io/debug_pio/debug_ioport/ 
be
more appropriate?

Otherwise looks good: the only emulation callback that could fail was
debug_ops->debug_io_out, which triggered a print if cfg.ioport_debug was set, 
and
the callback is replaced by debug_mmio. With the name change:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>  {
> + if (!vcpu->kvm->cfg.ioport_debug)
> + return;
> +
> + fprintf(stderr, "debug port %s from VCPU%lu: port=0x%lx, size=%u",
> + is_write ? "write" : "read", vcpu->cpu_id,
> + (unsigned long)addr, len);
> + if (is_write) {
> + u32 value;
> +
> + switch (len) {
> + case 1: value = ioport__read8(data); break;
> + case 2: value = ioport__read16((u16*)data); break;
> + case 4: value = ioport__read32((u32*)data); break;
> + default: value = 0; break;
> + }
> + fprintf(stderr, ", data: 0x%x\n", value);
> + } else {
> + fprintf(stderr, "\n");
> + }
>  }
>  
> -static bool debug_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
> +static void dummy_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
> +u8 is_write, void *ptr)
>  {
> - dummy_mmio(vcpu, port, data, size, true, NULL);
> - return 0;
>  }
>  
> -static struct ioport_operations debug_ops = {
> - .io_out = debug_io_out,
> -};
> -
>  static void seabios_debug_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
>  u32 len, u8 is_write, void *ptr)
>  {
> @@ -31,37 +45,6 @@ static void seabios_debug_mmio(struct kvm_cpu *vcpu, u64 
> addr, u8 *data,
>   putchar(ch);
>  }
>  
> -static bool seabios_debug_io_out(struct ioport *ioport, struct kvm_cpu 
> *vcpu, u16 port, void *data, int size)
> -{
> - seabios_debug_mmio(vcpu, port, data, size, true, NULL);
> - return 0;
> -}
> -
> -static struct ioport_operations seabios_debug_ops = {
> - .io_out = seabios_debug_io_out,
> -};
> -
> -static bool dummy_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
> -{
> - dummy_mmio(vcpu, port, data, size, false, NULL);
> - return true;
> -}
> -
> -static bool dummy_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
> -{
> - dummy_mmio(vcpu, port, data, size, true, NULL);
> - return true;
> -}
> -
> -static struct ioport_operations dummy_read_write_ioport_ops = {
> - .io_in  = dummy_io_in,
> - .io_out = dummy_io_out,
> -};
> -
> -static struct ioport_operations dummy_write_only_ioport_ops = {
> - .io_out = dummy_io_out,
> -};
> -
>  /*
>   * The "fast A20 gate"
>   */
> @@ -76,17 +59,6 @@ static void ps2_control_mmio(struct kvm_cpu *vcpu, u64 
> addr, u8 *data, u32 len,
>   ioport__write8(data, 0x02);
>  }
>  
> -static bool ps2_control_a_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> -{
> - ps2_control_mmio(vcpu, port, data, size, false, NULL);
> - return true;
> -}
> -
> -static struct ioport_operations ps2_control_a_ops = {
> - .io_in  = ps2_control_a_io_in,
> - .io_out = dummy_io_out,
> -};
> -
>  void ioport__map_irq(u8 *irq)
>  {
>  }
> @@ -98,75 +70,75 @@ static int ioport__setup_arch(struct kvm *kvm)
>   /* Legacy ioport setup */
>  
>   /*  - 001F - DMA1 controller */
> - r = ioport__register(kvm, 0x, _read_write_ioport_ops, 32, 
> NULL);
> + r = kvm__register_pio(kvm, 0x, 32, dummy_mmio, NULL);
>   if (r < 0)
>   return r;
>  
>   /* 0x0020 - 0x003F - 8259A PIC 1 */
> - r = ioport__register(kvm, 0x0020, _read_write_ioport_ops, 2, 
> NULL);
> + r = kvm__register_pio(kvm, 0x0020, 2, dummy_mmio, NULL);
>   if (r < 0)
>   

Re: [PATCH kvmtool v2 08/22] x86/ioport: Refactor trap handlers

2021-03-09 Thread Alexandru Elisei
Hi Andre,

Regarding the naming of the functions, these are real ioport emulation 
functions,
which are executed because a KVM_EXIT_IO exit reason from KVM_RUN. Wouldn't 
naming
the functions something like *_pio or *_io be more appropriate?

Thanks,

Alex

On 2/25/21 12:59 AM, Andre Przywara wrote:
> With the planned retirement of the special ioport emulation code, we
> need to provide emulation functions compatible with the MMIO
> prototype.
>
> Adjust the trap handlers to use that new function, and provide shims to
> implement the old ioport interface, for now.
>
> Signed-off-by: Andre Przywara 
> Reviewed-by: Alexandru Elisei 
> ---
>  x86/ioport.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/x86/ioport.c b/x86/ioport.c
> index a8d2bb1a..78f9a863 100644
> --- a/x86/ioport.c
> +++ b/x86/ioport.c
> @@ -3,8 +3,14 @@
>  #include 
>  #include 
>  
> +static void dummy_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
> +u8 is_write, void *ptr)
> +{
> +}
> +
>  static bool debug_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
>  {
> + dummy_mmio(vcpu, port, data, size, true, NULL);
>   return 0;
>  }
>  
> @@ -12,15 +18,23 @@ static struct ioport_operations debug_ops = {
>   .io_out = debug_io_out,
>  };
>  
> -static bool seabios_debug_io_out(struct ioport *ioport, struct kvm_cpu 
> *vcpu, u16 port, void *data, int size)
> +static void seabios_debug_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
> +u32 len, u8 is_write, void *ptr)
>  {
>   char ch;
>  
> + if (!is_write)
> + return;
> +
>   ch = ioport__read8(data);
>  
>   putchar(ch);
> +}
>  
> - return true;
> +static bool seabios_debug_io_out(struct ioport *ioport, struct kvm_cpu 
> *vcpu, u16 port, void *data, int size)
> +{
> + seabios_debug_mmio(vcpu, port, data, size, true, NULL);
> + return 0;
>  }
>  
>  static struct ioport_operations seabios_debug_ops = {
> @@ -29,11 +43,13 @@ static struct ioport_operations seabios_debug_ops = {
>  
>  static bool dummy_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
>  {
> + dummy_mmio(vcpu, port, data, size, false, NULL);
>   return true;
>  }
>  
>  static bool dummy_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> port, void *data, int size)
>  {
> + dummy_mmio(vcpu, port, data, size, true, NULL);
>   return true;
>  }
>  
> @@ -50,13 +66,19 @@ static struct ioport_operations 
> dummy_write_only_ioport_ops = {
>   * The "fast A20 gate"
>   */
>  
> -static bool ps2_control_a_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> +static void ps2_control_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 
> len,
> +  u8 is_write, void *ptr)
>  {
>   /*
>* A20 is always enabled.
>*/
> - ioport__write8(data, 0x02);
> + if (!is_write)
> + ioport__write8(data, 0x02);
> +}
>  
> +static bool ps2_control_a_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> +{
> + ps2_control_mmio(vcpu, port, data, size, false, NULL);
>   return true;
>  }
>  
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Will Deacon
On Tue, Mar 09, 2021 at 11:35:54AM +, Marc Zyngier wrote:
> On Tue, 09 Mar 2021 11:26:59 +,
> Will Deacon  wrote:
> > 
> > On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> > > KVM/arm64 has forever used a 40bit default IPA space, partially
> > > due to its 32bit heritage (where the only choice is 40bit).
> > > 
> > > However, there are implementations in the wild that have a *cough*
> > > much smaller *cough* IPA space, which leads to a misprogramming of
> > > VTCR_EL2, and a guest that is stuck on its first memory access
> > > if userspace dares to ask for the default IPA setting (which most
> > > VMMs do).
> > > 
> > > Instead, cap the default IPA size to what the host can actually
> > > do, and spit out a one-off message on the console. The boot warning
> > > is turned into a more meaningfull message, and the new behaviour
> > > is also documented.
> > > 
> > > Although this is a userspace ABI change, it doesn't really change
> > > much for userspace:
> > > 
> > > - the guest couldn't run before this change, while it now has
> > >   a chance to if the memory range fits the reduced IPA space
> > > 
> > > - a memory slot that was accepted because it did fit the default
> > >   IPA space but didn't fit the HW constraints is now properly
> > >   rejected
> > > 
> > > The other thing that's left doing is to convince userspace to
> > > actually use the IPA space setting instead of relying on the
> > > antiquated default.
> > 
> > Is there a way for userspace to discover the default IPA size, or does
> > it have to try setting values until it finds one that sticks?
> 
> Yes, since 233a7cb23531 ("kvm: arm64: Allow tuning the physical
> address size for VM").
> 
> The VMM can issue a KVM_CAP_ARM_VM_IPA_SIZE ioctl(), and get in return
> the maximum IPA size (I have a patch for kvmtool that does this).

Great, thanks -- that's exactly what I was thinking about when I asked the
question!

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Marc Zyngier
On Tue, 09 Mar 2021 11:26:59 +,
Will Deacon  wrote:
> 
> On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> > KVM/arm64 has forever used a 40bit default IPA space, partially
> > due to its 32bit heritage (where the only choice is 40bit).
> > 
> > However, there are implementations in the wild that have a *cough*
> > much smaller *cough* IPA space, which leads to a misprogramming of
> > VTCR_EL2, and a guest that is stuck on its first memory access
> > if userspace dares to ask for the default IPA setting (which most
> > VMMs do).
> > 
> > Instead, cap the default IPA size to what the host can actually
> > do, and spit out a one-off message on the console. The boot warning
> > is turned into a more meaningfull message, and the new behaviour
> > is also documented.
> > 
> > Although this is a userspace ABI change, it doesn't really change
> > much for userspace:
> > 
> > - the guest couldn't run before this change, while it now has
> >   a chance to if the memory range fits the reduced IPA space
> > 
> > - a memory slot that was accepted because it did fit the default
> >   IPA space but didn't fit the HW constraints is now properly
> >   rejected
> > 
> > The other thing that's left doing is to convince userspace to
> > actually use the IPA space setting instead of relying on the
> > antiquated default.
> 
> Is there a way for userspace to discover the default IPA size, or does
> it have to try setting values until it finds one that sticks?

Yes, since 233a7cb23531 ("kvm: arm64: Allow tuning the physical
address size for VM").

The VMM can issue a KVM_CAP_ARM_VM_IPA_SIZE ioctl(), and get in return
the maximum IPA size (I have a patch for kvmtool that does this).

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Will Deacon
On Mon, Mar 08, 2021 at 05:46:43PM +, Marc Zyngier wrote:
> KVM/arm64 has forever used a 40bit default IPA space, partially
> due to its 32bit heritage (where the only choice is 40bit).
> 
> However, there are implementations in the wild that have a *cough*
> much smaller *cough* IPA space, which leads to a misprogramming of
> VTCR_EL2, and a guest that is stuck on its first memory access
> if userspace dares to ask for the default IPA setting (which most
> VMMs do).
> 
> Instead, cap the default IPA size to what the host can actually
> do, and spit out a one-off message on the console. The boot warning
> is turned into a more meaningfull message, and the new behaviour
> is also documented.
> 
> Although this is a userspace ABI change, it doesn't really change
> much for userspace:
> 
> - the guest couldn't run before this change, while it now has
>   a chance to if the memory range fits the reduced IPA space
> 
> - a memory slot that was accepted because it did fit the default
>   IPA space but didn't fit the HW constraints is now properly
>   rejected
> 
> The other thing that's left doing is to convince userspace to
> actually use the IPA space setting instead of relying on the
> antiquated default.

Is there a way for userspace to discover the default IPA size, or does
it have to try setting values until it finds one that sticks?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs

2021-03-09 Thread Shameerali Kolothum Thodi
Hi Marc,

> -Original Message-
> From: Marc Zyngier [mailto:m...@kernel.org]
> Sent: 09 March 2021 10:33
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; io...@lists.linux-foundation.org;
> kvmarm@lists.cs.columbia.edu; alex.william...@redhat.com;
> jean-phili...@linaro.org; eric.au...@redhat.com; zhangfei@linaro.org;
> Jonathan Cameron ; Zengtao (B)
> ; linux...@openeuler.org; Will Deacon
> 
> Subject: Re: [RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs
> 
> Hi Shameer,
> 
> [+Will]
> 
> On Mon, 22 Feb 2021 15:53:36 +,
> Shameer Kolothum  wrote:
> >
> > On an ARM64 system with a SMMUv3 implementation that fully supports
> > Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate
> > instructions are received by SMMU. This is very useful when the
> > SMMU shares the page tables with the CPU(eg: Guest SVA use case).
> > For this to work, the SMMU must use the same VMID that is allocated
> > by KVM to configure the stage 2 translations.
> >
> > At present KVM VMID allocations are recycled on rollover and may
> > change as a result. This will create issues if we have to share
> > the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into
> > two, the first half follows the normal recycle on rollover policy
> > while the second half of the VMID pace is used to allocate pinned
> > VMIDs. This feature is enabled based on a command line option
> > "kvm-arm.pinned_vmid_enable".
> 
> I think this is the wrong approach. Instead of shoving the notion of
> pinned VMID into the current allocator, which really isn't designed
> for this, it'd be a lot better if we aligned the KVM VMID allocator
> with the ASID allocator, which already has support for pinning and is
> in general much more efficient.

Ok. Agree that this is not efficient, but was easy to prototype something :)

> Julien Grall worked on such a series[1] a long while ago, which got
> stalled because of the 32bit KVM port. Since we don't have this burden
> anymore, I'd rather you look in that direction instead of wasting half
> of the VMID space on potentially pinned VMIDs.

Sure. I will check that and work on it.

Thanks,
Shameer

> Thanks,
> 
>   M.
> 
> [1]
> https://patchwork.kernel.org/project/linux-arm-kernel/cover/20190724162534
> .7390-1-julien.gr...@arm.com/
> 
> 
> --
> Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Cap default IPA size to the host's own size

2021-03-09 Thread Suzuki Poulose



> On 8 Mar 2021, at 17:46, Marc Zyngier  wrote:
> 
> KVM/arm64 has forever used a 40bit default IPA space, partially
> due to its 32bit heritage (where the only choice is 40bit).
> 
> However, there are implementations in the wild that have a *cough*
> much smaller *cough* IPA space, which leads to a misprogramming of
> VTCR_EL2, and a guest that is stuck on its first memory access
> if userspace dares to ask for the default IPA setting (which most
> VMMs do).
> 
> Instead, cap the default IPA size to what the host can actually
> do, and spit out a one-off message on the console. The boot warning
> is turned into a more meaningfull message, and the new behaviour
> is also documented.
> 
> Although this is a userspace ABI change, it doesn't really change
> much for userspace:
> 
> - the guest couldn't run before this change, while it now has
>  a chance to if the memory range fits the reduced IPA space
> 
> - a memory slot that was accepted because it did fit the default
>  IPA space but didn't fit the HW constraints is now properly
>  rejected
> 
> The other thing that's left doing is to convince userspace to
> actually use the IPA space setting instead of relying on the
> antiquated default.
> 
> Signed-off-by: Marc Zyngier 
> ---

Reviewed-by: Suzuki K Poulose 


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v9 6/6] KVM: arm64: Document MTE capability and ioctl

2021-03-09 Thread Peter Maydell
On Mon, 1 Mar 2021 at 14:23, Steven Price  wrote:
>
> A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
> granting a guest access to the tags, and provides a mechanism for the
> VMM to enable it.
>
> A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
> access the tags of a guest without having to maintain a PROT_MTE mapping
> in userspace. The above capability gates access to the ioctl.
>
> Signed-off-by: Steven Price 
> ---
>  Documentation/virt/kvm/api.rst | 37 ++
>  1 file changed, 37 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aed52b0fc16e..1406ea138127 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4939,6 +4939,23 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
>  Allows Xen vCPU attributes to be read. For the structure and types,
>  see KVM_XEN_VCPU_SET_ATTR above.
>
> +4.131 KVM_ARM_MTE_COPY_TAGS
> +---
> +
> +:Capability: KVM_CAP_ARM_MTE
> +:Architectures: arm64
> +:Type: vm ioctl
> +:Parameters: struct kvm_arm_copy_mte_tags
> +:Returns: 0 on success, < 0 on error
> +
> +Copies Memory Tagging Extension (MTE) tags to/from guest tag memory.

Mostly virt/kvm/api.rst seems to include documentation of the
associated structs, something like:

::

  struct kvm_arm_copy_mte_tags {
 __u64 guest_ipa;
 __u64 length;
 union {
 void __user *addr;
 __u64 padding;
 };
 __u64 flags;
  };


which saves the reader having to cross-reference against the header file.
It also means you can more naturally use the actual field names in the doc,
eg:

> +The
> +starting address and length of guest memory must be ``PAGE_SIZE`` aligned.

you could say "The guest_ipa and length fields" here.

Also "The addr field must point to a buffer which the tags will
be copied to or from." I assume.

> +The size of the buffer to store the tags is ``(length / MTE_GRANULE_SIZE)``
> +bytes (i.e. 1/16th of the corresponding size).

> + Each byte contains a single tag
> +value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
> +``PTRACE_POKEMTETAGS``.

What are the valid values for 'flags' ? It looks like they specify which
direction the copy is, which we definitely need to document here.

What happens if the caller requests a tag copy for an area of guest
address space which doesn't have tags (eg it has nothing mapped),
or for an area of guest addres space which has tags in some parts
but not in others ?

> +
>  5. The kvm_run structure
>  
>
> @@ -6227,6 +6244,25 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between 
> them.
>  This capability can be used to check / enable 2nd DAWR feature provided
>  by POWER10 processor.
>
> +7.23 KVM_CAP_ARM_MTE
> +
> +
> +:Architectures: arm64
> +:Parameters: none
> +
> +This capability indicates that KVM (and the hardware) supports exposing the
> +Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
> +VMM before the guest will be granted access.
> +
> +When enabled the guest is able to access tags associated with any memory 
> given
> +to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
> +that the tags are maintained during swap or hibernation of the host, however

s/,/;/

> +the VMM needs to manually save/restore the tags as appropriate if the VM is
> +migrated.
> +
> +When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> +perform a bulk copy of tags to/from the guest

"guest."

> +
>  8. Other capabilities.
>  ==
>
> @@ -6716,3 +6752,4 @@ KVM_XEN_HVM_SET_ATTR, KVM_XEN_HVM_GET_ATTR, 
> KVM_XEN_VCPU_SET_ATTR and
>  KVM_XEN_VCPU_GET_ATTR ioctls, as well as the delivery of exception vectors
>  for event channel upcalls when the evtchn_upcall_pending field of a vcpu's
>  vcpu_info is set.
> +
> --
> 2.20.1


Stray whitespace change ?

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs

2021-03-09 Thread Marc Zyngier
Hi Shameer,

[+Will]

On Mon, 22 Feb 2021 15:53:36 +,
Shameer Kolothum  wrote:
> 
> On an ARM64 system with a SMMUv3 implementation that fully supports
> Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate
> instructions are received by SMMU. This is very useful when the
> SMMU shares the page tables with the CPU(eg: Guest SVA use case).
> For this to work, the SMMU must use the same VMID that is allocated
> by KVM to configure the stage 2 translations.
> 
> At present KVM VMID allocations are recycled on rollover and may
> change as a result. This will create issues if we have to share
> the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into
> two, the first half follows the normal recycle on rollover policy
> while the second half of the VMID pace is used to allocate pinned
> VMIDs. This feature is enabled based on a command line option
> "kvm-arm.pinned_vmid_enable".

I think this is the wrong approach. Instead of shoving the notion of
pinned VMID into the current allocator, which really isn't designed
for this, it'd be a lot better if we aligned the KVM VMID allocator
with the ASID allocator, which already has support for pinning and is
in general much more efficient.

Julien Grall worked on such a series[1] a long while ago, which got
stalled because of the 32bit KVM port. Since we don't have this burden
anymore, I'd rather you look in that direction instead of wasting half
of the VMID space on potentially pinned VMIDs.

Thanks,

M.

[1] 
https://patchwork.kernel.org/project/linux-arm-kernel/cover/20190724162534.7390-1-julien.gr...@arm.com/


-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 1/1] kvm: arm64: Add SVE support for nVHE.

2021-03-09 Thread Marc Zyngier
Hi Daniel,.

On Tue, 02 Mar 2021 16:48:50 +,
Daniel Kiss  wrote:
> 
> CPUs that support SVE are architecturally required to support the
> Virtualization Host Extensions (VHE), so far the kernel supported
> SVE alongside KVM with VHE enabled. In same cases it is desired to
> run nVHE config even when VHE is available.
> This patch add support for SVE for nVHE configuration too.
> 
> Tested on FVP with a Linux guest VM that run with a different VL than
> the host system.
> 
> Signed-off-by: Daniel Kiss 
> ---
>  arch/arm64/Kconfig  |  7 -
>  arch/arm64/include/asm/el2_setup.h  |  2 +-
>  arch/arm64/include/asm/fpsimd.h |  6 
>  arch/arm64/include/asm/fpsimdmacros.h   | 24 ++--
>  arch/arm64/include/asm/kvm_arm.h|  6 
>  arch/arm64/include/asm/kvm_host.h   | 17 +++
>  arch/arm64/kernel/entry-fpsimd.S|  5 
>  arch/arm64/kvm/arm.c|  5 
>  arch/arm64/kvm/fpsimd.c | 38 -
>  arch/arm64/kvm/hyp/fpsimd.S | 15 ++
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 34 +++---
>  arch/arm64/kvm/hyp/nvhe/switch.c| 24 
>  arch/arm64/kvm/reset.c  |  4 ---
>  13 files changed, 127 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f39568b28ec1..049428f1bf27 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1676,7 +1676,6 @@ endmenu
>  config ARM64_SVE
>   bool "ARM Scalable Vector Extension support"
>   default y
> - depends on !KVM || ARM64_VHE
>   help
> The Scalable Vector Extension (SVE) is an extension to the AArch64
> execution state which complements and extends the SIMD functionality
> @@ -1705,12 +1704,6 @@ config ARM64_SVE
> booting the kernel.  If unsure and you are not observing these
> symptoms, you should assume that it is safe to say Y.
>  
> -   CPUs that support SVE are architecturally required to support the
> -   Virtualization Host Extensions (VHE), so the kernel makes no
> -   provision for supporting SVE alongside KVM without VHE enabled.
> -   Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
> -   KVM in the same kernel image.
> -
>  config ARM64_MODULE_PLTS
>   bool "Use PLTs to allow module memory to spill over into vmalloc area"
>   depends on MODULES
> diff --git a/arch/arm64/include/asm/el2_setup.h 
> b/arch/arm64/include/asm/el2_setup.h
> index a7f5a1bbc8ac..0207393e67c3 100644
> --- a/arch/arm64/include/asm/el2_setup.h
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -133,7 +133,7 @@
>   bic x0, x0, #CPTR_EL2_TZ// Also disable SVE traps
>   msr cptr_el2, x0// Disable copro. traps to EL2
>   isb
> - mov x1, #ZCR_ELx_LEN_MASK   // SVE: Enable full vector
> + mov x1, #ZCR_EL2_LEN_HOST   // SVE: Enable full vector
>   msr_s   SYS_ZCR_EL2, x1 // length for EL1.
>  1:
>  .endm
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index bec5f14b622a..526d69f3eeb3 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,12 @@ static inline void *sve_pffr(struct thread_struct *thread)
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  unsigned long vq_minus_1);
> +/*
> + * sve_load_state_nvhe function for the hyp code where the SVE registers are
> + * handled from the EL2, vector length is governed by ZCR_EL2.
> + */
> +extern void sve_load_state_nvhe(void const *state, u32 const *pfpsr,
> +unsigned long vq_minus_1);
>  extern void sve_flush_live(void);
>  extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>  unsigned long vq_minus_1);
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h 
> b/arch/arm64/include/asm/fpsimdmacros.h
> index af43367534c7..d309c6071bce 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -205,6 +205,17 @@
>  921:
>  .endm
>  
> +/* Update ZCR_EL2.LEN with the new VQ */
> +.macro sve_load_vq_nvhe xvqminus1, xtmp, xtmp2
> + mrs_s   \xtmp, SYS_ZCR_EL2
> + bic \xtmp2, \xtmp, ZCR_ELx_LEN_MASK
> + orr \xtmp2, \xtmp2, \xvqminus1
> + cmp \xtmp2, \xtmp
> + b.eq922f
> + msr_s   SYS_ZCR_EL2, \xtmp2 //self-synchronising
> +922:
> +.endm

No, please. There is strictly no need for a new macro that only
differs by the EL1 vs EL2 register. That's why we have alternatives
for. And actually, you should be able to use the ZCR_EL2 register at
all times, including on VHE.


Re: [PATCH 2/2] KVM: arm64: Skip the cache flush when coalescing tables into a block

2021-03-09 Thread wangyanan (Y)


On 2021/3/9 16:43, Marc Zyngier wrote:

On Tue, 09 Mar 2021 08:34:43 +,
"wangyanan (Y)"  wrote:


On 2021/3/9 0:34, Will Deacon wrote:

On Mon, Jan 25, 2021 at 10:10:44PM +0800, Yanan Wang wrote:

After dirty-logging is stopped for a VM configured with huge mappings,
KVM will recover the table mappings back to block mappings. As we only
replace the existing page tables with a block entry and the cacheability
has not been changed, the cache maintenance opreations can be skipped.

Signed-off-by: Yanan Wang 
---
   arch/arm64/kvm/mmu.c | 12 +---
   1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8e8549ea1d70..37b427dcbc4f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -744,7 +744,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
   {
int ret = 0;
bool write_fault, writable, force_pte = false;
-   bool exec_fault;
+   bool exec_fault, adjust_hugepage;
bool device = false;
unsigned long mmu_seq;
struct kvm *kvm = vcpu->kvm;
@@ -872,12 +872,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
mark_page_dirty(kvm, gfn);
}
   -if (fault_status != FSC_PERM && !device)
+   /*
+* There is no necessity to perform cache maintenance operations if we
+* will only replace the existing table mappings with a block mapping.
+*/
+   adjust_hugepage = fault_granule < vma_pagesize ? true : false;

nit: you don't need the '? true : false' part

That said, your previous patch checks for 'fault_granule > vma_pagesize',
so I'm not sure the local variable helps all that much here because it
obscures the size checks in my opinion. It would be more straight-forward
if we could structure the logic as:


if (fault_granule < vma_pagesize) {

} else if (fault_granule > vma_page_size) {

} else {

}

With some comments describing what we can infer about the memcache and cache
maintenance requirements for each case.

Thanks for your suggestion here, Will.
But I have resent another newer series [1] (KVM: arm64: Improve
efficiency of stage2 page table)
recently, which has the same theme but different solutions that I
think are better.
[1]
https://lore.kernel.org/lkml/20210208112250.163568-1-wangyana...@huawei.com/

Could you please comment on that series ?  I think it can be found in
your inbox :).

There were already a bunch of comments on that series, and I stopped
at the point where the cache maintenance was broken. Please respin
that series if you want further feedback on it.

Ok, I will send a new version later.


In the future, if you deprecate a series (which is completely
understandable), please leave a note on the list with a pointer to the
new series so that people don't waste time reviewing an obsolete
series. Or post the new series with a new version number so that it is
obvious that the original series has been superseded.

I apologize for this, I will be more careful in the future.

Thanks,

Yanan

Thanks,

M.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Don't use cbz/adr with external symbols

2021-03-09 Thread Marc Zyngier
On Fri, 5 Mar 2021 12:21:24 -0800, Sami Tolvanen wrote:
> allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
> linker errors:
> 
>   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
>   relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
>   [-1048576, 1048575]; references hyp_panic
>   >>> defined in vmlinux.o
> 
> [...]

Applied to next, thanks!

[1/1] KVM: arm64: Don't use cbz/adr with external symbols
  commit: dbaee836d60a8e1b03e7d53a37893235662ba124

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Skip the cache flush when coalescing tables into a block

2021-03-09 Thread Marc Zyngier
On Tue, 09 Mar 2021 08:34:43 +,
"wangyanan (Y)"  wrote:
> 
> 
> On 2021/3/9 0:34, Will Deacon wrote:
> > On Mon, Jan 25, 2021 at 10:10:44PM +0800, Yanan Wang wrote:
> >> After dirty-logging is stopped for a VM configured with huge mappings,
> >> KVM will recover the table mappings back to block mappings. As we only
> >> replace the existing page tables with a block entry and the cacheability
> >> has not been changed, the cache maintenance opreations can be skipped.
> >> 
> >> Signed-off-by: Yanan Wang 
> >> ---
> >>   arch/arm64/kvm/mmu.c | 12 +---
> >>   1 file changed, 9 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index 8e8549ea1d70..37b427dcbc4f 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -744,7 +744,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>   {
> >>int ret = 0;
> >>bool write_fault, writable, force_pte = false;
> >> -  bool exec_fault;
> >> +  bool exec_fault, adjust_hugepage;
> >>bool device = false;
> >>unsigned long mmu_seq;
> >>struct kvm *kvm = vcpu->kvm;
> >> @@ -872,12 +872,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>mark_page_dirty(kvm, gfn);
> >>}
> >>   -if (fault_status != FSC_PERM && !device)
> >> +  /*
> >> +   * There is no necessity to perform cache maintenance operations if we
> >> +   * will only replace the existing table mappings with a block mapping.
> >> +   */
> >> +  adjust_hugepage = fault_granule < vma_pagesize ? true : false;
> > nit: you don't need the '? true : false' part
> > 
> > That said, your previous patch checks for 'fault_granule > vma_pagesize',
> > so I'm not sure the local variable helps all that much here because it
> > obscures the size checks in my opinion. It would be more straight-forward
> > if we could structure the logic as:
> > 
> > 
> > if (fault_granule < vma_pagesize) {
> > 
> > } else if (fault_granule > vma_page_size) {
> > 
> > } else {
> > 
> > }
> > 
> > With some comments describing what we can infer about the memcache and cache
> > maintenance requirements for each case.
> Thanks for your suggestion here, Will.
> But I have resent another newer series [1] (KVM: arm64: Improve
> efficiency of stage2 page table)
> recently, which has the same theme but different solutions that I
> think are better.
> [1]
> https://lore.kernel.org/lkml/20210208112250.163568-1-wangyana...@huawei.com/
> 
> Could you please comment on that series ?  I think it can be found in
> your inbox :).

There were already a bunch of comments on that series, and I stopped
at the point where the cache maintenance was broken. Please respin
that series if you want further feedback on it.

In the future, if you deprecate a series (which is completely
understandable), please leave a note on the list with a pointer to the
new series so that people don't waste time reviewing an obsolete
series. Or post the new series with a new version number so that it is
obvious that the original series has been superseded.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Skip the cache flush when coalescing tables into a block

2021-03-09 Thread wangyanan (Y)


On 2021/3/9 0:34, Will Deacon wrote:

On Mon, Jan 25, 2021 at 10:10:44PM +0800, Yanan Wang wrote:

After dirty-logging is stopped for a VM configured with huge mappings,
KVM will recover the table mappings back to block mappings. As we only
replace the existing page tables with a block entry and the cacheability
has not been changed, the cache maintenance opreations can be skipped.

Signed-off-by: Yanan Wang 
---
  arch/arm64/kvm/mmu.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8e8549ea1d70..37b427dcbc4f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -744,7 +744,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  {
int ret = 0;
bool write_fault, writable, force_pte = false;
-   bool exec_fault;
+   bool exec_fault, adjust_hugepage;
bool device = false;
unsigned long mmu_seq;
struct kvm *kvm = vcpu->kvm;
@@ -872,12 +872,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
mark_page_dirty(kvm, gfn);
}
  
-	if (fault_status != FSC_PERM && !device)

+   /*
+* There is no necessity to perform cache maintenance operations if we
+* will only replace the existing table mappings with a block mapping.
+*/
+   adjust_hugepage = fault_granule < vma_pagesize ? true : false;

nit: you don't need the '? true : false' part

That said, your previous patch checks for 'fault_granule > vma_pagesize',
so I'm not sure the local variable helps all that much here because it
obscures the size checks in my opinion. It would be more straight-forward
if we could structure the logic as:


if (fault_granule < vma_pagesize) {

} else if (fault_granule > vma_page_size) {

} else {

}

With some comments describing what we can infer about the memcache and cache
maintenance requirements for each case.

Thanks for your suggestion here, Will.
But I have resent another newer series [1] (KVM: arm64: Improve 
efficiency of stage2 page table)
recently, which has the same theme but different solutions that I think 
are better.
[1] 
https://lore.kernel.org/lkml/20210208112250.163568-1-wangyana...@huawei.com/


Could you please comment on that series ?  I think it can be found in 
your inbox :).


Thanks,

Yanan


Will
.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm