Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO

2021-04-16 Thread Keqian Zhu
Hi Marc,

On 2021/4/16 22:44, Marc Zyngier wrote:
> On Thu, 15 Apr 2021 15:08:09 +0100,
> Keqian Zhu  wrote:
>>
>> Hi Marc,
>>
>> On 2021/4/15 22:03, Keqian Zhu wrote:
>>> The MMIO region of a device maybe huge (GB level), try to use
>>> block mapping in stage2 to speedup both map and unmap.
>>>
>>> Compared to normal memory mapping, we should consider two more
>>> points when try block mapping for MMIO region:
>>>
>>> 1. For normal memory mapping, the PA(host physical address) and
>>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>>> the HVA to request hugepage, so we don't need to consider PA
>>> alignment when verifing block mapping. But for device memory
>>> mapping, the PA and HVA may have different alignment.
>>>
>>> 2. For normal memory mapping, we are sure hugepage size properly
>>> fit into vma, so we don't check whether the mapping size exceeds
>>> the boundary of vma. But for device memory mapping, we should pay
>>> attention to this.
>>>
>>> This adds get_vma_page_shift() to get page shift for both normal
>>> memory and device MMIO region, and check these two points when
>>> selecting block mapping size for MMIO region.
>>>
>>> Signed-off-by: Keqian Zhu 
>>> ---
>>>  arch/arm64/kvm/mmu.c | 61 
>>>  1 file changed, 51 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index c59af5ca01b0..5a1cc7751e6d 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
>>> *memslot,
>>> return PAGE_SIZE;
>>>  }
>>>  
[...]

>>> +   /*
>>> +* logging_active is guaranteed to never be true for VM_PFNMAP
>>> +* memslots.
>>> +*/
>>> +   if (logging_active) {
>>> force_pte = true;
>>> vma_shift = PAGE_SHIFT;
>>> +   } else {
>>> +   vma_shift = get_vma_page_shift(vma, hva);
>>> }
>> I use a if/else manner in v4, please check that. Thanks very much!
> 
> That's fine. However, it is getting a bit late for 5.13, and we don't
> have much time to left it simmer in -next. I'll probably wait until
> after the merge window to pick it up.
OK, no problem. Thanks! :)

BRs,
Keqian
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature

2021-04-16 Thread Steven Price
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Signed-off-by: Steven Price 
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h|  3 +++
 arch/arm64/kvm/hyp/exception.c   |  3 ++-
 arch/arm64/kvm/mmu.c | 20 
 arch/arm64/kvm/sys_regs.c|  3 +++
 include/uapi/linux/kvm.h |  1 +
 6 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID2;
+
+   if (kvm_has_mte(vcpu->kvm))
+   vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..1170ee137096 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
u8 pfr0_csv2;
u8 pfr0_csv3;
+   /* Memory Tagging Extension enabled for the guest */
+   bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -767,6 +769,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu) \
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
unsigned long target_mode,
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
 
-   // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+   if (kvm_has_mte(vcpu->kvm))
+   new |= PSR_TCO_BIT;
 
new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..5f8e165ea053 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -879,6 +879,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (vma_pagesize == PAGE_SIZE && !force_pte)
vma_pagesize = transparent_hugepage_adjust(memslot, hva,
   , _ipa);
+
+   if (fault_status != FSC_PERM && kvm_has_mte(kvm) && !device &&
+   pfn_valid(pfn)) {
+   /*
+* VM will be able to see the page's tags, so we must ensure
+* they have been initialised. if PG_mte_tagged is set, tags
+* have already been initialised.
+*/
+   unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
+   struct page *page = pfn_to_online_page(pfn);
+
+   if (!page)
+   return -EFAULT;
+
+   for (i = 0; i < nr_pages; i++, page++) {
+   if (!test_and_set_bit(PG_mte_tagged, >flags))
+   mte_clear_page_tags(page_address(page));
+   }
+   }
+
if (writable)
prot |= KVM_PGTABLE_PROT_W;
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2f1e3145de..18c87500a7a8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,9 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
break;
case SYS_ID_AA64PFR1_EL1:
val &= ~FEATURE(ID_AA64PFR1_MTE);
+   if (kvm_has_mte(vcpu->kvm))
+   val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE),
+ ID_AA64PFR1_MTE);
break;
case SYS_ID_AA64ISAR1_EL1:
if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..6dc16c09a2d1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_ARM_MTE 195
 
 #ifdef 

[PATCH v11 4/6] arm64: kvm: Expose KVM_ARM_CAP_MTE

2021-04-16 Thread Steven Price
It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Signed-off-by: Steven Price 
---
 arch/arm64/kvm/arm.c  | 9 +
 arch/arm64/kvm/sys_regs.c | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..46bf319f6cb7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0;
kvm->arch.return_nisv_io_abort_to_user = true;
break;
+   case KVM_CAP_ARM_MTE:
+   if (!system_supports_mte() || kvm->created_vcpus)
+   return -EINVAL;
+   r = 0;
+   kvm->arch.mte_enabled = true;
+   break;
default:
r = -EINVAL;
break;
@@ -234,6 +240,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 */
r = 1;
break;
+   case KVM_CAP_ARM_MTE:
+   r = system_supports_mte();
+   break;
case KVM_CAP_STEAL_TIME:
r = kvm_arm_pvtime_supported();
break;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 377ae6efb0ef..46937bfaac8a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1306,6 +1306,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
   const struct sys_reg_desc *rd)
 {
+   if (kvm_has_mte(vcpu->kvm))
+   return 0;
+
return REG_HIDDEN;
 }
 
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v11 1/6] arm64: mte: Sync tags for pages where PTE is untagged

2021-04-16 Thread Steven Price
A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages which are !pte_valid_user() as these will
not have been swapped out.

Signed-off-by: Steven Price 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/kernel/mte.c  | 16 
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e17b96d0e4b5..cf4b52a33b3c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -312,7 +312,7 @@ static inline void set_pte_at(struct mm_struct *mm, 
unsigned long addr,
__sync_icache_dcache(pte);
 
if (system_supports_mte() &&
-   pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
+   pte_present(pte) && (pte_val(pte) & PTE_USER) && !pte_special(pte))
mte_sync_tags(ptep, pte);
 
__check_racy_pte_update(mm, ptep, pte);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index b3c70a612c7a..e016ab57ea36 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -26,17 +26,23 @@ u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap,
+  bool pte_is_tagged)
 {
pte_t old_pte = READ_ONCE(*ptep);
 
if (check_swap && is_swap_pte(old_pte)) {
swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-   if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
+   if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+   set_bit(PG_mte_tagged, >flags);
return;
+   }
}
 
+   if (!pte_is_tagged || test_and_set_bit(PG_mte_tagged, >flags))
+   return;
+
page_kasan_tag_reset(page);
/*
 * We need smp_wmb() in between setting the flags and clearing the
@@ -54,11 +60,13 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
struct page *page = pte_page(pte);
long i, nr_pages = compound_nr(page);
bool check_swap = nr_pages == 1;
+   bool pte_is_tagged = pte_tagged(pte);
 
/* if PG_mte_tagged is set, tags have already been initialised */
for (i = 0; i < nr_pages; i++, page++) {
-   if (!test_and_set_bit(PG_mte_tagged, >flags))
-   mte_sync_page_tags(page, ptep, check_swap);
+   if (!test_bit(PG_mte_tagged, >flags))
+   mte_sync_page_tags(page, ptep, check_swap,
+  pte_is_tagged);
}
 }
 
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v11 3/6] arm64: kvm: Save/restore MTE registers

2021-04-16 Thread Steven Price
Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price 
---
 arch/arm64/include/asm/kvm_host.h  |  6 ++
 arch/arm64/include/asm/kvm_mte.h   | 66 ++
 arch/arm64/include/asm/sysreg.h|  3 +-
 arch/arm64/kernel/asm-offsets.c|  3 +
 arch/arm64/kvm/hyp/entry.S |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++
 arch/arm64/kvm/sys_regs.c  | 22 ++--
 7 files changed, 123 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 1170ee137096..d00cc3590f6e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
CNTP_CVAL_EL0,
CNTP_CTL_EL0,
 
+   /* Memory Tagging Extension registers */
+   RGSR_EL1,   /* Random Allocation Tag Seed Register */
+   GCR_EL1,/* Tag Control Register */
+   TFSR_EL1,   /* Tag Fault Status Register (EL1) */
+   TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
+
/* 32bit specific registers. Keep them at the end of the range */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index ..6541c7d6ce06
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include 
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+   b   .L__skip_switch\@
+alternative_else_nop_endif
+   mrs \reg1, hcr_el2
+   and \reg1, \reg1, #(HCR_ATA)
+   cbz \reg1, .L__skip_switch\@
+
+   mrs_s   \reg1, SYS_RGSR_EL1
+   str \reg1, [\h_ctxt, #CPU_RGSR_EL1]
+   mrs_s   \reg1, SYS_GCR_EL1
+   str \reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+   ldr \reg1, [\g_ctxt, #CPU_RGSR_EL1]
+   msr_s   SYS_RGSR_EL1, \reg1
+   ldr \reg1, [\g_ctxt, #CPU_GCR_EL1]
+   msr_s   SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+   b   .L__skip_switch\@
+alternative_else_nop_endif
+   mrs \reg1, hcr_el2
+   and \reg1, \reg1, #(HCR_ATA)
+   cbz \reg1, .L__skip_switch\@
+
+   mrs_s   \reg1, SYS_RGSR_EL1
+   str \reg1, [\g_ctxt, #CPU_RGSR_EL1]
+   mrs_s   \reg1, SYS_GCR_EL1
+   str \reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+   ldr \reg1, [\h_ctxt, #CPU_RGSR_EL1]
+   msr_s   SYS_RGSR_EL1, \reg1
+   ldr \reg1, [\h_ctxt, #CPU_GCR_EL1]
+   msr_s   SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index dfd4edbfe360..5424d195cf96 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -580,7 +580,8 @@
 #define SCTLR_ELx_M(BIT(0))
 
 #define SCTLR_ELx_FLAGS(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1 ((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..944e4f1f45d9 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -108,6 +108,9 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,offsetof(struct kvm_vcpu, 
arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1, offsetof(struct kvm_cpu_context, 
sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,  offsetof(struct kvm_cpu_context, 
sys_regs[GCR_EL1]));
+  DEFINE(CPU_TFSRE0_EL1,   offsetof(struct kvm_cpu_context, 
sys_regs[TFSRE0_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,offsetof(struct kvm_cpu_context, 
sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,offsetof(struct kvm_cpu_context, 
sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,offsetof(struct kvm_cpu_context, 
sys_regs[APDAKEYLO_EL1]));
diff 

[PATCH v11 6/6] KVM: arm64: Document MTE capability and ioctl

2021-04-16 Thread Steven Price
A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price 
---
 Documentation/virt/kvm/api.rst | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1a2b5210cdbf..ccc84f21ba5e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4938,6 +4938,40 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.131 KVM_ARM_MTE_COPY_TAGS
+---
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+   __u64 guest_ipa;
+   __u64 length;
+   union {
+   void __user *addr;
+   __u64 padding;
+   };
+   __u64 flags;
+   __u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / MTE_GRANULE_SIZE)``
+bytes (i.e. 1/16th of the corresponding size). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
 5. The kvm_run structure
 
 
@@ -6227,6 +6261,25 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between 
them.
 This capability can be used to check / enable 2nd DAWR feature provided
 by POWER10 processor.
 
+7.23 KVM_CAP_ARM_MTE
+
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before the guest will be granted access.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ==
 
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v11 5/6] KVM: arm64: ioctl to fetch/store tags in a guest

2021-04-16 Thread Steven Price
The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price 
---
 arch/arm64/include/uapi/asm/kvm.h | 14 +++
 arch/arm64/kvm/arm.c  | 69 +++
 include/uapi/linux/kvm.h  |  1 +
 3 files changed, 84 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..2b85a047c37d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,20 @@ struct kvm_vcpu_events {
__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+   __u64 guest_ipa;
+   __u64 length;
+   union {
+   void __user *addr;
+   __u64 padding;
+   };
+   __u64 flags;
+   __u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST  0
+#define KVM_ARM_TAGS_FROM_GUEST1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK0x0FFF
 #define KVM_REG_ARM_COPROC_SHIFT   16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 46bf319f6cb7..9a6b26d37236 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1297,6 +1297,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
}
 }
 
+static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+ struct kvm_arm_copy_mte_tags *copy_tags)
+{
+   gpa_t guest_ipa = copy_tags->guest_ipa;
+   size_t length = copy_tags->length;
+   void __user *tags = copy_tags->addr;
+   gpa_t gfn;
+   bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+   int ret = 0;
+
+   if (copy_tags->reserved[0] || copy_tags->reserved[1])
+   return -EINVAL;
+
+   if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+   return -EINVAL;
+
+   if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+   return -EINVAL;
+
+   gfn = gpa_to_gfn(guest_ipa);
+
+   mutex_lock(>slots_lock);
+
+   while (length > 0) {
+   kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+   void *maddr;
+   unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
+
+   if (is_error_noslot_pfn(pfn)) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   maddr = page_address(pfn_to_page(pfn));
+
+   if (!write) {
+   num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
+   kvm_release_pfn_clean(pfn);
+   } else {
+   num_tags = mte_copy_tags_from_user(maddr, tags,
+  num_tags);
+   kvm_release_pfn_dirty(pfn);
+   }
+
+   if (num_tags != PAGE_SIZE / MTE_GRANULE_SIZE) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   gfn++;
+   tags += num_tags;
+   length -= PAGE_SIZE;
+   }
+
+out:
+   mutex_unlock(>slots_lock);
+   return ret;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -1333,6 +1392,16 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
return 0;
}
+   case KVM_ARM_MTE_COPY_TAGS: {
+   struct kvm_arm_copy_mte_tags copy_tags;
+
+   if (!kvm_has_mte(kvm))
+   return -EINVAL;
+
+   if (copy_from_user(_tags, argp, sizeof(copy_tags)))
+   return -EFAULT;
+   return kvm_vm_ioctl_mte_copy_tags(kvm, _tags);
+   }
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6dc16c09a2d1..470c122f4c2d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1424,6 +1424,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct 
kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS_IOR(KVMIO,  0xb4, struct 
kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE_IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v11 0/6] MTE support for KVM guest

2021-04-16 Thread Steven Price
I know it's likely to be the merge window next week, but since there
were a couple of changes from Catalin's review I thought I'd send
another version out - there are some minor conflicts with what's
currently in -next so I'll rebase after -rc1.

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

This version is rebased on v5.12-rc2.

Changes since v10[1]:

 * Replace pte_valid_user() with (pte_val(pte) & PTE_USER) in
   set_pte_at() as the former has been removed with the EPAN patches.
 * Don't attempt to check tags on memory which is going to be mapped in
   stage 2 as DEVICE as the guest won't be able to see the tags.
 * Use pfn_to_online_page() instead of pfn_to_page() in user_mem_abort()
   to prevent ZONE_DEVICE memory being populated in stage 2 if tags are
   enabled.

[1] https://lore.kernel.org/r/20210312151902.17853-1-steven.price%40arm.com

Steven Price (6):
  arm64: mte: Sync tags for pages where PTE is untagged
  arm64: kvm: Introduce MTE VM feature
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst | 53 +++
 arch/arm64/include/asm/kvm_emulate.h   |  3 +
 arch/arm64/include/asm/kvm_host.h  |  9 +++
 arch/arm64/include/asm/kvm_mte.h   | 66 ++
 arch/arm64/include/asm/pgtable.h   |  2 +-
 arch/arm64/include/asm/sysreg.h|  3 +-
 arch/arm64/include/uapi/asm/kvm.h  | 14 
 arch/arm64/kernel/asm-offsets.c|  3 +
 arch/arm64/kernel/mte.c| 16 +++--
 arch/arm64/kvm/arm.c   | 78 ++
 arch/arm64/kvm/hyp/entry.S |  7 ++
 arch/arm64/kvm/hyp/exception.c |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++
 arch/arm64/kvm/mmu.c   | 20 ++
 arch/arm64/kvm/sys_regs.c  | 28 ++--
 include/uapi/linux/kvm.h   |  2 +
 16 files changed, 317 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO

2021-04-16 Thread Marc Zyngier
On Thu, 15 Apr 2021 15:08:09 +0100,
Keqian Zhu  wrote:
> 
> Hi Marc,
> 
> On 2021/4/15 22:03, Keqian Zhu wrote:
> > The MMIO region of a device maybe huge (GB level), try to use
> > block mapping in stage2 to speedup both map and unmap.
> > 
> > Compared to normal memory mapping, we should consider two more
> > points when try block mapping for MMIO region:
> > 
> > 1. For normal memory mapping, the PA(host physical address) and
> > HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
> > the HVA to request hugepage, so we don't need to consider PA
> > alignment when verifing block mapping. But for device memory
> > mapping, the PA and HVA may have different alignment.
> > 
> > 2. For normal memory mapping, we are sure hugepage size properly
> > fit into vma, so we don't check whether the mapping size exceeds
> > the boundary of vma. But for device memory mapping, we should pay
> > attention to this.
> > 
> > This adds get_vma_page_shift() to get page shift for both normal
> > memory and device MMIO region, and check these two points when
> > selecting block mapping size for MMIO region.
> > 
> > Signed-off-by: Keqian Zhu 
> > ---
> >  arch/arm64/kvm/mmu.c | 61 
> >  1 file changed, 51 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c59af5ca01b0..5a1cc7751e6d 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot 
> > *memslot,
> > return PAGE_SIZE;
> >  }
> >  
> > +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long 
> > hva)
> > +{
> > +   unsigned long pa;
> > +
> > +   if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
> > +   return huge_page_shift(hstate_vma(vma));
> > +
> > +   if (!(vma->vm_flags & VM_PFNMAP))
> > +   return PAGE_SHIFT;
> > +
> > +   VM_BUG_ON(is_vm_hugetlb_page(vma));
> > +
> > +   pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
> > +
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > +   if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
> > +   ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
> > +   ALIGN(hva, PUD_SIZE) <= vma->vm_end)
> > +   return PUD_SHIFT;
> > +#endif
> > +
> > +   if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
> > +   ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
> > +   ALIGN(hva, PMD_SIZE) <= vma->vm_end)
> > +   return PMD_SHIFT;
> > +
> > +   return PAGE_SHIFT;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >   struct kvm_memory_slot *memslot, unsigned long hva,
> >   unsigned long fault_status)
> > @@ -769,7 +798,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > phys_addr_t fault_ipa,
> > return -EFAULT;
> > }
> >  
> > -   /* Let's check if we will get back a huge page backed by hugetlbfs */
> > +   /*
> > +* Let's check if we will get back a huge page backed by hugetlbfs, or
> > +* get block mapping for device MMIO region.
> > +*/
> > mmap_read_lock(current->mm);
> > vma = find_vma_intersection(current->mm, hva, hva + 1);
> > if (unlikely(!vma)) {
> > @@ -778,15 +810,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > phys_addr_t fault_ipa,
> > return -EFAULT;
> > }
> >  
> > -   if (is_vm_hugetlb_page(vma))
> > -   vma_shift = huge_page_shift(hstate_vma(vma));
> > -   else
> > -   vma_shift = PAGE_SHIFT;
> > -
> > -   if (logging_active ||
> > -   (vma->vm_flags & VM_PFNMAP)) {
> > +   /*
> > +* logging_active is guaranteed to never be true for VM_PFNMAP
> > +* memslots.
> > +*/
> > +   if (logging_active) {
> > force_pte = true;
> > vma_shift = PAGE_SHIFT;
> > +   } else {
> > +   vma_shift = get_vma_page_shift(vma, hva);
> > }
> I use a if/else manner in v4, please check that. Thanks very much!

That's fine. However, it is getting a bit late for 5.13, and we don't
have much time to left it simmer in -next. I'll probably wait until
after the merge window to pick it up.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages

2021-04-16 Thread David Hildenbrand

On 16.04.21 13:44, Mike Rapoport wrote:

On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:

Not sure we really need a new pagetype here, PG_Reserved seems to be quite
enough to say "don't touch this".  I generally agree that we could make
PG_Reserved a PageType and then have several sub-types for reserved memory.
This definitely will add clarity but I'm not sure that this justifies
amount of churn and effort required to audit uses of PageResrved().

Then, we could mostly avoid having to query memblock at runtime to figure
out that this is special memory. This would obviously be an extension to
this series. Just a thought.


Stop pushing memblock out of kernel! ;-)


Can't stop. Won't stop. :D

It's lovely for booting up a kernel until we have other data-structures in
place ;)


A bit more seriously, we don't have any data structure that reliably
represents physical memory layout and arch-independent fashion.
memblock is probably the best starting point for eventually having one.


We have the (slowish) kernel resource tree after boot and the (faster) 
memmap. I really don't see why we really need another slowish variant.


We might be better off to just extend and speed up the kernel resource tree.

Memblock as is is not a reasonable datastructure to keep around after 
boot: for example, how we handle boottime allocations and reserve 
regions both as reserved.


--
Thanks,

David / dhildenb

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages

2021-04-16 Thread Mike Rapoport
On Thu, Apr 15, 2021 at 11:30:12AM +0200, David Hildenbrand wrote:
> > Not sure we really need a new pagetype here, PG_Reserved seems to be quite
> > enough to say "don't touch this".  I generally agree that we could make
> > PG_Reserved a PageType and then have several sub-types for reserved memory.
> > This definitely will add clarity but I'm not sure that this justifies
> > amount of churn and effort required to audit uses of PageResrved().
> > > Then, we could mostly avoid having to query memblock at runtime to figure
> > > out that this is special memory. This would obviously be an extension to
> > > this series. Just a thought.
> > 
> > Stop pushing memblock out of kernel! ;-)
> 
> Can't stop. Won't stop. :D
> 
> It's lovely for booting up a kernel until we have other data-structures in
> place ;)

A bit more seriously, we don't have any data structure that reliably
represents physical memory layout and arch-independent fashion. 
memblock is probably the best starting point for eventually having one.

-- 
Sincerely yours,
Mike.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 2/3] arm64: decouple check whether pfn is normal memory from pfn_valid()

2021-04-16 Thread Mike Rapoport
On Thu, Apr 15, 2021 at 11:31:26AM +0200, David Hildenbrand wrote:
> On 14.04.21 22:29, Mike Rapoport wrote:
> > On Wed, Apr 14, 2021 at 05:58:26PM +0200, David Hildenbrand wrote:
> > > On 08.04.21 07:14, Anshuman Khandual wrote:
> > > > 
> > > > On 4/7/21 10:56 PM, Mike Rapoport wrote:
> > > > > From: Mike Rapoport 
> > > > > 
> > > > > The intended semantics of pfn_valid() is to verify whether there is a
> > > > > struct page for the pfn in question and nothing else.
> > > > 
> > > > Should there be a comment affirming this semantics interpretation, 
> > > > above the
> > > > generic pfn_valid() in include/linux/mmzone.h ?
> > > > 
> > > > > 
> > > > > Yet, on arm64 it is used to distinguish memory areas that are mapped 
> > > > > in the
> > > > > linear map vs those that require ioremap() to access them.
> > > > > 
> > > > > Introduce a dedicated pfn_is_memory() to perform such check and use it
> > > > > where appropriate.
> > > > > 
> > > > > Signed-off-by: Mike Rapoport 
> > > > > ---
> > > > >arch/arm64/include/asm/memory.h | 2 +-
> > > > >arch/arm64/include/asm/page.h   | 1 +
> > > > >arch/arm64/kvm/mmu.c| 2 +-
> > > > >arch/arm64/mm/init.c| 6 ++
> > > > >arch/arm64/mm/ioremap.c | 4 ++--
> > > > >arch/arm64/mm/mmu.c | 2 +-
> > > > >6 files changed, 12 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/include/asm/memory.h 
> > > > > b/arch/arm64/include/asm/memory.h
> > > > > index 0aabc3be9a75..7e77fdf71b9d 100644
> > > > > --- a/arch/arm64/include/asm/memory.h
> > > > > +++ b/arch/arm64/include/asm/memory.h
> > > > > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x)
> > > > >#define virt_addr_valid(addr)  ({  
> > > > > \
> > > > >   __typeof__(addr) __addr = __tag_reset(addr);
> > > > > \
> > > > > - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  
> > > > > \
> > > > > + __is_lm_address(__addr) && pfn_is_memory(virt_to_pfn(__addr));  
> > > > > \
> > > > >})
> > > > >void dump_mem_limit(void);
> > > > > diff --git a/arch/arm64/include/asm/page.h 
> > > > > b/arch/arm64/include/asm/page.h
> > > > > index 012cffc574e8..32b485bcc6ff 100644
> > > > > --- a/arch/arm64/include/asm/page.h
> > > > > +++ b/arch/arm64/include/asm/page.h
> > > > > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page 
> > > > > *from);
> > > > >typedef struct page *pgtable_t;
> > > > >extern int pfn_valid(unsigned long);
> > > > > +extern int pfn_is_memory(unsigned long);
> > > > >#include 
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 8711894db8c2..ad2ea65a3937 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> > > > >static bool kvm_is_device_pfn(unsigned long pfn)
> > > > >{
> > > > > - return !pfn_valid(pfn);
> > > > > + return !pfn_is_memory(pfn);
> > > > >}
> > > > >/*
> > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > > index 3685e12aba9b..258b1905ed4a 100644
> > > > > --- a/arch/arm64/mm/init.c
> > > > > +++ b/arch/arm64/mm/init.c
> > > > > @@ -258,6 +258,12 @@ int pfn_valid(unsigned long pfn)
> > > > >}
> > > > >EXPORT_SYMBOL(pfn_valid);
> > > > > +int pfn_is_memory(unsigned long pfn)
> > > > > +{
> > > > > + return memblock_is_map_memory(PFN_PHYS(pfn));
> > > > > +}
> > > > > +EXPORT_SYMBOL(pfn_is_memory);> +
> > > > 
> > > > Should not this be generic though ? There is nothing platform or arm64
> > > > specific in here. Wondering as pfn_is_memory() just indicates that the
> > > > pfn is linear mapped, should not it be renamed as pfn_is_linear_memory()
> > > > instead ? Regardless, it's fine either way.
> > > 
> > > TBH, I dislike (generic) pfn_is_memory(). It feels like we're mixing
> > > concepts.
> > 
> > Yeah, at the moment NOMAP is very much arm specific so I'd keep it this way
> > for now.
> > 
> > >   NOMAP memory vs !NOMAP memory; even NOMAP is some kind of memory
> > > after all. pfn_is_map_memory() would be more expressive, although still
> > > sub-optimal.
> > > 
> > > We'd actually want some kind of arm64-specific pfn_is_system_memory() or 
> > > the
> > > inverse pfn_is_device_memory() -- to be improved.
> > 
> > In my current version (to be posted soon) I've started with
> > pfn_lineary_mapped() but then ended up with pfn_mapped() to make it
> > "upward" compatible with architectures that use direct rather than linear
> > map :)
> 
> And even that is moot. It doesn't tell you if a PFN is *actually* mapped
> (hello secretmem).
> 
> I'd suggest to just use memblock_is_map_memory() in arch specific code. Then
> it's clear what we are querying exactly and what the semantics might be.

Ok, let's export memblock_is_map_memory() for the KEEP_MEMBLOCK case.

-- 
Sincerely