Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
Hi Steven, Catalin, On 18/11/2020 16:01, Steven Price wrote: > On 17/11/2020 16:07, Catalin Marinas wrote: >> On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 19aacc7d64de..38fe25310ca1 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t >>> fault_ipa, >>> if (vma_pagesize == PAGE_SIZE && !force_pte) >>> vma_pagesize = transparent_hugepage_adjust(memslot, hva, >>> , _ipa); >>> + >>> + /* >>> + * The otherwise redundant test for system_supports_mte() allows the >>> + * code to be compiled out when CONFIG_ARM64_MTE is not present. >>> + */ >>> + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { >>> + /* >>> + * VM will be able to see the page's tags, so we must ensure >>> + * they have been initialised. >>> + */ >>> + struct page *page = pfn_to_page(pfn); >>> + long i, nr_pages = compound_nr(page); >>> + >>> + /* if PG_mte_tagged is set, tags have already been initialised */ >>> + for (i = 0; i < nr_pages; i++, page++) { >>> + if (!test_and_set_bit(PG_mte_tagged, >flags)) >>> + mte_clear_page_tags(page_address(page)); >>> + } >>> + } >> >> If this page was swapped out and mapped back in, where does the >> restoring from swap happen? > > Restoring from swap happens above this in the call to gfn_to_pfn_prot() > >> I may have asked in the past, is user_mem_abort() the only path for >> mapping Normal pages into stage 2? >> > > That is my understanding (and yes you asked before) and no one has corrected > me! ;) A recent discovery: Copy on write will cause kvm_set_spte_handler() to fixup the mapping (instead of just invalidating it) on the assumption the guest is going to read whatever was written. Its possible user_mem_abort() will go and stomp on that mapping a second time, but if the VMM triggers this at stage1, you won't have a vcpu for the update. Thanks, James
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On 19/11/2020 16:24, Catalin Marinas wrote: On Thu, Nov 19, 2020 at 12:45:52PM +, Steven Price wrote: On 18/11/2020 17:05, Andrew Jones wrote: On Wed, Nov 18, 2020 at 04:50:01PM +, Catalin Marinas wrote: On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote: On 17/11/2020 16:07, Catalin Marinas wrote: On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 19aacc7d64de..38fe25310ca1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); + + /* +* The otherwise redundant test for system_supports_mte() allows the +* code to be compiled out when CONFIG_ARM64_MTE is not present. +*/ + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { + /* +* VM will be able to see the page's tags, so we must ensure +* they have been initialised. +*/ + struct page *page = pfn_to_page(pfn); + long i, nr_pages = compound_nr(page); + + /* if PG_mte_tagged is set, tags have already been initialised */ + for (i = 0; i < nr_pages; i++, page++) { + if (!test_and_set_bit(PG_mte_tagged, >flags)) + mte_clear_page_tags(page_address(page)); + } + } If this page was swapped out and mapped back in, where does the restoring from swap happen? Restoring from swap happens above this in the call to gfn_to_pfn_prot() Looking at the call chain, gfn_to_pfn_prot() ends up with get_user_pages() using the current->mm (the VMM) and that does a set_pte_at(), presumably restoring the tags. Does this mean that all memory mapped by the VMM in user space should have PROT_MTE set? Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no tags restored from swap (we do save them since when they were mapped, PG_mte_tagged was set). So I think the code above should be similar to mte_sync_tags(), even calling a common function, but I'm not sure where to get the swap pte from. You're right - the code is broken as it stands. I've just been able to reproduce the loss of tags due to swap. The problem is that we also don't have a suitable pte to do the restore from swap from. So either set_pte_at() would have to unconditionally check for MTE tags for all previous swap entries as you suggest below. I had a quick go at testing this and hit issues with the idle task getting killed during boot - I fear there are some fun issues regarding initialisation order here. My attempt here but not fully tested (just booted, no swap support): Ah, very similar to what I had, just without the silly mistake... ;) I just did a quick test with this and it seems to work. I obviously should have looked harder before giving up on this approach. Thanks! Steve diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b35833259f08..27d7fd336a16 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -304,7 +304,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, __sync_icache_dcache(pte); if (system_supports_mte() && - pte_present(pte) && pte_tagged(pte) && !pte_special(pte)) + pte_present(pte) && pte_valid_user(pte) && !pte_special(pte)) mte_sync_tags(ptep, pte); __check_racy_pte_update(mm, ptep, pte); diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 52a0638ed967..bbd6c56d33d9 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -20,18 +20,24 @@ #include #include -static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap) +static void mte_sync_page_tags(struct page *page, pte_t *ptep, pte_t pte, + bool check_swap) { pte_t old_pte = READ_ONCE(*ptep); if (check_swap && is_swap_pte(old_pte)) { swp_entry_t entry = pte_to_swp_entry(old_pte); - if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) + if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) { + set_bit(PG_mte_tagged, >flags); return; + } } - mte_clear_page_tags(page_address(page)); + if (pte_tagged(pte)) { + mte_clear_page_tags(page_address(page)); + set_bit(PG_mte_tagged, >flags); + } } void mte_sync_tags(pte_t *ptep, pte_t pte) @@ -42,8 +48,8 @@ void mte_sync_tags(pte_t *ptep, pte_t pte) /* if PG_mte_tagged is set, tags have already been
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On Thu, Nov 19, 2020 at 12:45:52PM +, Steven Price wrote: > On 18/11/2020 17:05, Andrew Jones wrote: > > On Wed, Nov 18, 2020 at 04:50:01PM +, Catalin Marinas wrote: > > > On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote: > > > > On 17/11/2020 16:07, Catalin Marinas wrote: > > > > > On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: > > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > > > index 19aacc7d64de..38fe25310ca1 100644 > > > > > > --- a/arch/arm64/kvm/mmu.c > > > > > > +++ b/arch/arm64/kvm/mmu.c > > > > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu > > > > > > *vcpu, phys_addr_t fault_ipa, > > > > > > if (vma_pagesize == PAGE_SIZE && !force_pte) > > > > > > vma_pagesize = transparent_hugepage_adjust(memslot, hva, > > > > > >, > > > > > > _ipa); > > > > > > + > > > > > > + /* > > > > > > +* The otherwise redundant test for system_supports_mte() > > > > > > allows the > > > > > > +* code to be compiled out when CONFIG_ARM64_MTE is not present. > > > > > > +*/ > > > > > > + if (system_supports_mte() && kvm->arch.mte_enabled && > > > > > > pfn_valid(pfn)) { > > > > > > + /* > > > > > > +* VM will be able to see the page's tags, so we must > > > > > > ensure > > > > > > +* they have been initialised. > > > > > > +*/ > > > > > > + struct page *page = pfn_to_page(pfn); > > > > > > + long i, nr_pages = compound_nr(page); > > > > > > + > > > > > > + /* if PG_mte_tagged is set, tags have already been > > > > > > initialised */ > > > > > > + for (i = 0; i < nr_pages; i++, page++) { > > > > > > + if (!test_and_set_bit(PG_mte_tagged, > > > > > > >flags)) > > > > > > + mte_clear_page_tags(page_address(page)); > > > > > > + } > > > > > > + } > > > > > > > > > > If this page was swapped out and mapped back in, where does the > > > > > restoring from swap happen? > > > > > > > > Restoring from swap happens above this in the call to gfn_to_pfn_prot() > > > > > > Looking at the call chain, gfn_to_pfn_prot() ends up with > > > get_user_pages() using the current->mm (the VMM) and that does a > > > set_pte_at(), presumably restoring the tags. Does this mean that all > > > memory mapped by the VMM in user space should have PROT_MTE set? > > > Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no > > > tags restored from swap (we do save them since when they were mapped, > > > PG_mte_tagged was set). > > > > > > So I think the code above should be similar to mte_sync_tags(), even > > > calling a common function, but I'm not sure where to get the swap pte > > > from. > > You're right - the code is broken as it stands. I've just been able to > reproduce the loss of tags due to swap. > > The problem is that we also don't have a suitable pte to do the restore from > swap from. So either set_pte_at() would have to unconditionally check for > MTE tags for all previous swap entries as you suggest below. I had a quick > go at testing this and hit issues with the idle task getting killed during > boot - I fear there are some fun issues regarding initialisation order here. My attempt here but not fully tested (just booted, no swap support): diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b35833259f08..27d7fd336a16 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -304,7 +304,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, __sync_icache_dcache(pte); if (system_supports_mte() && - pte_present(pte) && pte_tagged(pte) && !pte_special(pte)) + pte_present(pte) && pte_valid_user(pte) && !pte_special(pte)) mte_sync_tags(ptep, pte); __check_racy_pte_update(mm, ptep, pte); diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 52a0638ed967..bbd6c56d33d9 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -20,18 +20,24 @@ #include #include -static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap) +static void mte_sync_page_tags(struct page *page, pte_t *ptep, pte_t pte, + bool check_swap) { pte_t old_pte = READ_ONCE(*ptep); if (check_swap && is_swap_pte(old_pte)) { swp_entry_t entry = pte_to_swp_entry(old_pte); - if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) + if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) { + set_bit(PG_mte_tagged, >flags); return; + } } - mte_clear_page_tags(page_address(page)); + if (pte_tagged(pte)) { +
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On 18/11/2020 17:05, Andrew Jones wrote: On Wed, Nov 18, 2020 at 04:50:01PM +, Catalin Marinas wrote: On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote: On 17/11/2020 16:07, Catalin Marinas wrote: On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 19aacc7d64de..38fe25310ca1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); + + /* +* The otherwise redundant test for system_supports_mte() allows the +* code to be compiled out when CONFIG_ARM64_MTE is not present. +*/ + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { + /* +* VM will be able to see the page's tags, so we must ensure +* they have been initialised. +*/ + struct page *page = pfn_to_page(pfn); + long i, nr_pages = compound_nr(page); + + /* if PG_mte_tagged is set, tags have already been initialised */ + for (i = 0; i < nr_pages; i++, page++) { + if (!test_and_set_bit(PG_mte_tagged, >flags)) + mte_clear_page_tags(page_address(page)); + } + } If this page was swapped out and mapped back in, where does the restoring from swap happen? Restoring from swap happens above this in the call to gfn_to_pfn_prot() Looking at the call chain, gfn_to_pfn_prot() ends up with get_user_pages() using the current->mm (the VMM) and that does a set_pte_at(), presumably restoring the tags. Does this mean that all memory mapped by the VMM in user space should have PROT_MTE set? Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no tags restored from swap (we do save them since when they were mapped, PG_mte_tagged was set). So I think the code above should be similar to mte_sync_tags(), even calling a common function, but I'm not sure where to get the swap pte from. You're right - the code is broken as it stands. I've just been able to reproduce the loss of tags due to swap. The problem is that we also don't have a suitable pte to do the restore from swap from. So either set_pte_at() would have to unconditionally check for MTE tags for all previous swap entries as you suggest below. I had a quick go at testing this and hit issues with the idle task getting killed during boot - I fear there are some fun issues regarding initialisation order here. Or we enforce PROT_MTE... An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm mapped the memory with PROT_MTE. This is a very reasonable alternative. The VMM must be aware of whether the guest may use MTE anyway. Asking it to map the memory with PROT_MTE when it wants to offer the guest that option is a reasonable requirement. If the memory is not mapped as such, then the host kernel shouldn't assume MTE may be used by the guest, and it should even enforce that it is not (by not enabling the feature). The main issue with this is that the VMM can change the mappings while the guest is running, so the only place we can reliably check this is during user_mem_abort(). So we can't just downgrade HCR_EL2.ATA. This makes the error reporting not so great as the memory access is simply faulted. However I do have this working and it's actually (slightly) less code. Another drawback is that the VMM needs to be more careful with the tags - e.g. for virtualised devices the VMM can't simply have a non-PROT_MTE mapping and ignore what the guest is doing with tags. Steve
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On Wed, Nov 18, 2020 at 04:50:01PM +, Catalin Marinas wrote: > On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote: > > On 17/11/2020 16:07, Catalin Marinas wrote: > > > On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > index 19aacc7d64de..38fe25310ca1 100644 > > > > --- a/arch/arm64/kvm/mmu.c > > > > +++ b/arch/arm64/kvm/mmu.c > > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > > > > phys_addr_t fault_ipa, > > > > if (vma_pagesize == PAGE_SIZE && !force_pte) > > > > vma_pagesize = transparent_hugepage_adjust(memslot, hva, > > > >, > > > > _ipa); > > > > + > > > > + /* > > > > +* The otherwise redundant test for system_supports_mte() > > > > allows the > > > > +* code to be compiled out when CONFIG_ARM64_MTE is not present. > > > > +*/ > > > > + if (system_supports_mte() && kvm->arch.mte_enabled && > > > > pfn_valid(pfn)) { > > > > + /* > > > > +* VM will be able to see the page's tags, so we must > > > > ensure > > > > +* they have been initialised. > > > > +*/ > > > > + struct page *page = pfn_to_page(pfn); > > > > + long i, nr_pages = compound_nr(page); > > > > + > > > > + /* if PG_mte_tagged is set, tags have already been > > > > initialised */ > > > > + for (i = 0; i < nr_pages; i++, page++) { > > > > + if (!test_and_set_bit(PG_mte_tagged, > > > > >flags)) > > > > + mte_clear_page_tags(page_address(page)); > > > > + } > > > > + } > > > > > > If this page was swapped out and mapped back in, where does the > > > restoring from swap happen? > > > > Restoring from swap happens above this in the call to gfn_to_pfn_prot() > > Looking at the call chain, gfn_to_pfn_prot() ends up with > get_user_pages() using the current->mm (the VMM) and that does a > set_pte_at(), presumably restoring the tags. Does this mean that all > memory mapped by the VMM in user space should have PROT_MTE set? > Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no > tags restored from swap (we do save them since when they were mapped, > PG_mte_tagged was set). > > So I think the code above should be similar to mte_sync_tags(), even > calling a common function, but I'm not sure where to get the swap pte > from. > > An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm > mapped the memory with PROT_MTE. This is a very reasonable alternative. The VMM must be aware of whether the guest may use MTE anyway. Asking it to map the memory with PROT_MTE when it wants to offer the guest that option is a reasonable requirement. If the memory is not mapped as such, then the host kernel shouldn't assume MTE may be used by the guest, and it should even enforce that it is not (by not enabling the feature). Thanks, drew > > Yet another option is to always call mte_sync_tags() from set_pte_at() > and defer the pte_tagged() or is_swap_pte() checks to the MTE code. > > -- > Catalin >
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On Wed, Nov 18, 2020 at 04:01:20PM +, Steven Price wrote: > On 17/11/2020 16:07, Catalin Marinas wrote: > > On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > index 19aacc7d64de..38fe25310ca1 100644 > > > --- a/arch/arm64/kvm/mmu.c > > > +++ b/arch/arm64/kvm/mmu.c > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > > > phys_addr_t fault_ipa, > > > if (vma_pagesize == PAGE_SIZE && !force_pte) > > > vma_pagesize = transparent_hugepage_adjust(memslot, hva, > > > , > > > _ipa); > > > + > > > + /* > > > + * The otherwise redundant test for system_supports_mte() allows the > > > + * code to be compiled out when CONFIG_ARM64_MTE is not present. > > > + */ > > > + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { > > > + /* > > > + * VM will be able to see the page's tags, so we must ensure > > > + * they have been initialised. > > > + */ > > > + struct page *page = pfn_to_page(pfn); > > > + long i, nr_pages = compound_nr(page); > > > + > > > + /* if PG_mte_tagged is set, tags have already been initialised > > > */ > > > + for (i = 0; i < nr_pages; i++, page++) { > > > + if (!test_and_set_bit(PG_mte_tagged, >flags)) > > > + mte_clear_page_tags(page_address(page)); > > > + } > > > + } > > > > If this page was swapped out and mapped back in, where does the > > restoring from swap happen? > > Restoring from swap happens above this in the call to gfn_to_pfn_prot() Looking at the call chain, gfn_to_pfn_prot() ends up with get_user_pages() using the current->mm (the VMM) and that does a set_pte_at(), presumably restoring the tags. Does this mean that all memory mapped by the VMM in user space should have PROT_MTE set? Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no tags restored from swap (we do save them since when they were mapped, PG_mte_tagged was set). So I think the code above should be similar to mte_sync_tags(), even calling a common function, but I'm not sure where to get the swap pte from. An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm mapped the memory with PROT_MTE. Yet another option is to always call mte_sync_tags() from set_pte_at() and defer the pte_tagged() or is_swap_pte() checks to the MTE code. -- Catalin
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On 17/11/2020 19:35, Marc Zyngier wrote: Hi Steven, On 2020-10-26 15:57, Steven Price wrote: Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This exposes the feature to the guest and automatically tags memory pages touched by the VM as PG_mte_tagged (and clears the tags storage) to ensure that the guest cannot see stale tags, and so that the tags are correctly saved/restored across swap. Signed-off-by: Steven Price Reviewed-by: Andrew Jones --- arch/arm64/include/asm/kvm_emulate.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/kvm/arm.c | 9 + arch/arm64/kvm/mmu.c | 20 arch/arm64/kvm/sys_regs.c | 6 +- include/uapi/linux/kvm.h | 1 + 6 files changed, 41 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 5ef2669ccd6c..66c0d9e7c2b4 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) || vcpu_el1_is_32bit(vcpu)) vcpu->arch.hcr_el2 |= HCR_TID2; + + if (vcpu->kvm->arch.mte_enabled) Please add a predicate (vcpu_has_mte() or kvm_has_mte()?) for this. Sure + vcpu->arch.hcr_el2 |= HCR_ATA; } static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 95ab7345dcc8..cd993aec0440 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -118,6 +118,9 @@ struct kvm_arch { */ unsigned long *pmu_filter; unsigned int pmuver; + + /* Memory Tagging Extension enabled for the guest */ + bool mte_enabled; }; struct kvm_vcpu_fault_info { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index f56122eedffc..7ee93bcac017 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; kvm->arch.return_nisv_io_abort_to_user = true; break; + case KVM_CAP_ARM_MTE: + if (!system_supports_mte() || kvm->created_vcpus) + return -EINVAL; You also want to avoid 32bit guests. Also, what is the rational for Interesting point, however if I understand correctly the 32 bit flag is a VCPU flag. And at this point kvm->created_vcpus==0, so I don't believe we actually know whether the guest is 32 bit or not at the point of this test. And since this is a per-VM flag it actually can make sense for a heterogeneous VM if anyone is crazy enough to want such a thing. this being a VM capability as opposed to a CPU feature, similar to SVE and PMU? v1/v2 actually had it as a CPU feature. However you need a per-VM flag to enforce the use of tagged memory (the code in user_mem_abort() below). + r = 0; + kvm->arch.mte_enabled = true; + break; default: r = -EINVAL; break; @@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) */ r = 1; break; + case KVM_CAP_ARM_MTE: + r = system_supports_mte(); Same comment about 32bit. As above, we don't know if we're launching a 32 bit guest or not. + break; case KVM_CAP_STEAL_TIME: r = kvm_arm_pvtime_supported(); break; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 19aacc7d64de..38fe25310ca1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); + + /* + * The otherwise redundant test for system_supports_mte() allows the + * code to be compiled out when CONFIG_ARM64_MTE is not present. + */ + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { + /* + * VM will be able to see the page's tags, so we must ensure + * they have been initialised. + */ + struct page *page = pfn_to_page(pfn); + long i, nr_pages = compound_nr(page); + + /* if PG_mte_tagged is set, tags have already been initialised */ + for (i = 0; i < nr_pages; i++, page++) { + if (!test_and_set_bit(PG_mte_tagged, >flags)) + mte_clear_page_tags(page_address(page)); + } + } What are the visibility requirements for the tags, specially if the guest has its MMU off? Is there any cache management that needs to occur? If the guest has its MMU off then the memory would be tread as Untagged by the architecture (the stage 1 page table must provide the 'tagged' flag). Architecturally the tag bits handled the same
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
On 17/11/2020 16:07, Catalin Marinas wrote: Hi Steven, On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 19aacc7d64de..38fe25310ca1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); + + /* +* The otherwise redundant test for system_supports_mte() allows the +* code to be compiled out when CONFIG_ARM64_MTE is not present. +*/ + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { + /* +* VM will be able to see the page's tags, so we must ensure +* they have been initialised. +*/ + struct page *page = pfn_to_page(pfn); + long i, nr_pages = compound_nr(page); + + /* if PG_mte_tagged is set, tags have already been initialised */ + for (i = 0; i < nr_pages; i++, page++) { + if (!test_and_set_bit(PG_mte_tagged, >flags)) + mte_clear_page_tags(page_address(page)); + } + } If this page was swapped out and mapped back in, where does the restoring from swap happen? Restoring from swap happens above this in the call to gfn_to_pfn_prot() I may have asked in the past, is user_mem_abort() the only path for mapping Normal pages into stage 2? That is my understanding (and yes you asked before) and no one has corrected me! ;) Steve
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
Hi Steven, On 2020-10-26 15:57, Steven Price wrote: Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This exposes the feature to the guest and automatically tags memory pages touched by the VM as PG_mte_tagged (and clears the tags storage) to ensure that the guest cannot see stale tags, and so that the tags are correctly saved/restored across swap. Signed-off-by: Steven Price Reviewed-by: Andrew Jones --- arch/arm64/include/asm/kvm_emulate.h | 3 +++ arch/arm64/include/asm/kvm_host.h| 3 +++ arch/arm64/kvm/arm.c | 9 + arch/arm64/kvm/mmu.c | 20 arch/arm64/kvm/sys_regs.c| 6 +- include/uapi/linux/kvm.h | 1 + 6 files changed, 41 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 5ef2669ccd6c..66c0d9e7c2b4 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) || vcpu_el1_is_32bit(vcpu)) vcpu->arch.hcr_el2 |= HCR_TID2; + + if (vcpu->kvm->arch.mte_enabled) Please add a predicate (vcpu_has_mte() or kvm_has_mte()?) for this. + vcpu->arch.hcr_el2 |= HCR_ATA; } static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 95ab7345dcc8..cd993aec0440 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -118,6 +118,9 @@ struct kvm_arch { */ unsigned long *pmu_filter; unsigned int pmuver; + + /* Memory Tagging Extension enabled for the guest */ + bool mte_enabled; }; struct kvm_vcpu_fault_info { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index f56122eedffc..7ee93bcac017 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; kvm->arch.return_nisv_io_abort_to_user = true; break; + case KVM_CAP_ARM_MTE: + if (!system_supports_mte() || kvm->created_vcpus) + return -EINVAL; You also want to avoid 32bit guests. Also, what is the rational for this being a VM capability as opposed to a CPU feature, similar to SVE and PMU? + r = 0; + kvm->arch.mte_enabled = true; + break; default: r = -EINVAL; break; @@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) */ r = 1; break; + case KVM_CAP_ARM_MTE: + r = system_supports_mte(); Same comment about 32bit. + break; case KVM_CAP_STEAL_TIME: r = kvm_arm_pvtime_supported(); break; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 19aacc7d64de..38fe25310ca1 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PAGE_SIZE && !force_pte) vma_pagesize = transparent_hugepage_adjust(memslot, hva, , _ipa); + + /* +* The otherwise redundant test for system_supports_mte() allows the +* code to be compiled out when CONFIG_ARM64_MTE is not present. +*/ + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { + /* +* VM will be able to see the page's tags, so we must ensure +* they have been initialised. +*/ + struct page *page = pfn_to_page(pfn); + long i, nr_pages = compound_nr(page); + + /* if PG_mte_tagged is set, tags have already been initialised */ + for (i = 0; i < nr_pages; i++, page++) { + if (!test_and_set_bit(PG_mte_tagged, >flags)) + mte_clear_page_tags(page_address(page)); + } + } What are the visibility requirements for the tags, specially if the guest has its MMU off? Is there any cache management that needs to occur? Another thing is device-like memory that is managed by userspace, such as the QEMU emulated flash, for which there also might be tags. How is that dealt with? In general, what are the expectations for tags on memory shared between host and guest? Who owns them? + if (writable) { prot |= KVM_PGTABLE_PROT_W; kvm_set_pfn_dirty(pfn); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 430e36e1a13d..35a3dc448231 100644 --- a/arch/arm64/kvm/sys_regs.c +++
Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
Hi Steven, On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote: > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 19aacc7d64de..38fe25310ca1 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > if (vma_pagesize == PAGE_SIZE && !force_pte) > vma_pagesize = transparent_hugepage_adjust(memslot, hva, > , _ipa); > + > + /* > + * The otherwise redundant test for system_supports_mte() allows the > + * code to be compiled out when CONFIG_ARM64_MTE is not present. > + */ > + if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) { > + /* > + * VM will be able to see the page's tags, so we must ensure > + * they have been initialised. > + */ > + struct page *page = pfn_to_page(pfn); > + long i, nr_pages = compound_nr(page); > + > + /* if PG_mte_tagged is set, tags have already been initialised > */ > + for (i = 0; i < nr_pages; i++, page++) { > + if (!test_and_set_bit(PG_mte_tagged, >flags)) > + mte_clear_page_tags(page_address(page)); > + } > + } If this page was swapped out and mapped back in, where does the restoring from swap happen? I may have asked in the past, is user_mem_abort() the only path for mapping Normal pages into stage 2? -- Catalin