Re: [PATCH 2/5] KVM: arm64: Avoid mapping size adjustment on permission fault

2021-07-23 Thread Marc Zyngier
On Fri, 23 Jul 2021 16:55:39 +0100,
Alexandru Elisei  wrote:
> 
> Hi Marc,
> 
> On 7/17/21 10:55 AM, Marc Zyngier wrote:
> > Since we only support PMD-sized mappings for THP, getting
> > a permission fault on a level that results in a mapping
> > being larger than PAGE_SIZE is a sure indication that we have
> > already upgraded our mapping to a PMD.
> >
> > In this case, there is no need to try and parse userspace page
> > tables, as the fault information already tells us everything.
> >
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/mmu.c | 11 ---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index db6314b93e99..c036a480ca27 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1088,9 +1088,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > phys_addr_t fault_ipa,
> >  * If we are not forced to use page mapping, check if we are
> >  * backed by a THP and thus use block mapping if possible.
> >  */
> > -   if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
> > -   vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva,
> > -  , _ipa);
> > +   if (vma_pagesize == PAGE_SIZE && !force_pte) {
> 
> Looks like now it's possible to call transparent_hugepage_adjust()
> for devices (if fault_status != FSC_PERM). Commit 2aa53d68cee6
> ("KVM: arm64: Try stage2 block mapping for host device MMIO") makes
> a good case for the !device check. Was the check removed
> unintentionally?

That's what stupid bugs are made of. I must have resolved a rebase
conflict the wrong way, and lost this crucial bit. Thanks for spotting
this! Now fixed.

> 
> > +   if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
> > +   vma_pagesize = fault_granule;
> > +   else
> > +   vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
> > +  hva, ,
> > +  _ipa);
> > +   }
> 
> This change makes sense to me - we can only get stage 2 permission
> faults on a leaf entry since stage 2 tables don't have the
> APTable/XNTable/PXNTable bits. The biggest block mapping size that
> we support at stage 2 is PMD size (from
> transparent_hugepage_adjust()), therefore if fault_granule is larger
> than PAGE_SIZE, then it must be PMD_SIZE.

Yup, exactly.

Thanks again,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/5] KVM: arm64: Avoid mapping size adjustment on permission fault

2021-07-23 Thread Alexandru Elisei
Hi Marc,

On 7/17/21 10:55 AM, Marc Zyngier wrote:
> Since we only support PMD-sized mappings for THP, getting
> a permission fault on a level that results in a mapping
> being larger than PAGE_SIZE is a sure indication that we have
> already upgraded our mapping to a PMD.
>
> In this case, there is no need to try and parse userspace page
> tables, as the fault information already tells us everything.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/mmu.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index db6314b93e99..c036a480ca27 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1088,9 +1088,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>* If we are not forced to use page mapping, check if we are
>* backed by a THP and thus use block mapping if possible.
>*/
> - if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
> - vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva,
> -, _ipa);
> + if (vma_pagesize == PAGE_SIZE && !force_pte) {

Looks like now it's possible to call transparent_hugepage_adjust() for devices 
(if
fault_status != FSC_PERM). Commit 2aa53d68cee6 ("KVM: arm64: Try stage2 block
mapping for host device MMIO") makes a good case for the !device check. Was the
check removed unintentionally?

> + if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
> + vma_pagesize = fault_granule;
> + else
> + vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
> +hva, ,
> +_ipa);
> + }

This change makes sense to me - we can only get stage 2 permission faults on a
leaf entry since stage 2 tables don't have the APTable/XNTable/PXNTable bits. 
The
biggest block mapping size that we support at stage 2 is PMD size (from
transparent_hugepage_adjust()), therefore if fault_granule is larger than
PAGE_SIZE, then it must be PMD_SIZE.

Thanks,

Alex

>  
>   if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
>   /* Check the VMM hasn't introduced a new VM_SHARED VMA */
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-23 Thread Vladimir Murzin
Hi Will,

On 7/22/21 4:38 PM, Will Deacon wrote:
> Hi Vladimir,
> 
> On Thu, Jul 22, 2021 at 04:22:26PM +0100, Vladimir Murzin wrote:
>> On 7/22/21 10:50 AM, Will Deacon wrote:
>>> As an aside: I'm more and more inclined to rip out the CnP stuff given
>>> that it doesn't appear to being any benefits, but does have some clear
>>> downsides. Perhaps something for next week.
>>
>> Can you please clarify what do you mean by "it doesn't appear to being any
>> benefits"? IIRC, Cortex-A65 implements CnP hint and I've heard that some
>> payloads seen improvement...
> 
> Has anybody taped that out? I'd have thought building an SMT design in 2021
> is a reasonably courageous thing to do.

As you said three can be niche for that...

> 
> The issue I'm getting at is that modern cores seem to advertise CnP even
> if they ignore it internally, maybe because of some big/little worries?

Should we employ CPU errata framework for such cores to demote CnP?

> That would be fine if it didn't introduce complexity and overhead to the
> kernel, but it does and therefore I think we should rip it out (or at
> least stick it behind a "default n" config option if there are some niche
> users).

"default n" still better then no code at all :)

Cheers
Vladimir

> 
> There are also open questions as to exactly what CnP does because the
> architecture is not clear at all (for example TTBRx_EL1.CnP is permitted
> to be cached in a TLB).
> 
> CHeers,
> 
> Will
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 04/16] KVM: arm64: Add MMIO checking infrastructure

2021-07-23 Thread Quentin Perret
On Thursday 22 Jul 2021 at 19:04:59 (+0100), Marc Zyngier wrote:
> So FWIW, I've now pushed out an updated series for the THP changes[1],
> and you will find a similar patch at the base of the branch. Please
> have a look and let me know what you think!

I Like the look of it! I'll pull this patch in my series and rebase on
top -- that should introduce three new users or so, and allow a few nice
cleanups.

Thanks!
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

2021-07-23 Thread Marc Zyngier

On 2021-07-23 14:38, Andrew Jones wrote:

On Fri, Jul 23, 2021 at 02:30:13PM +0100, Marc Zyngier wrote:
...

> > +
> > +==
> > ==
> > +Function ID:  (uint32)0xC604
> > +Arguments:(uint64)The base of the PG-sized IPA range
> > +  that is allowed to be accessed as
> > +   MMIO. Must aligned to the PG size (r1)
>
> align

Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
is here, so I'll just put spaces. I'm sure someone will let me
know if I'm wrong! ;-)


Actually, my comment wasn't regarding the alignment of the text. I was
commenting that we should change 'aligned' to 'align' in the text. 
(Sorry,
that was indeed ambiguous.) Hmm, it might be better to just add 'be', 
i.e.

'be aligned'.


*blink*. duh, of course.


I'm not sure what to do about the tab/space mixing, but keeping it
consistent is good enough for me.


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

2021-07-23 Thread Andrew Jones
On Fri, Jul 23, 2021 at 02:30:13PM +0100, Marc Zyngier wrote:
...
> > > +
> > > +==
> > > ==
> > > +Function ID:  (uint32)0xC604
> > > +Arguments:(uint64)The base of the PG-sized IPA range
> > > +  that is allowed to be accessed as
> > > +   MMIO. Must aligned to the PG size (r1)
> > 
> > align
> 
> Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
> is here, so I'll just put spaces. I'm sure someone will let me
> know if I'm wrong! ;-)

Actually, my comment wasn't regarding the alignment of the text. I was
commenting that we should change 'aligned' to 'align' in the text. (Sorry,
that was indeed ambiguous.) Hmm, it might be better to just add 'be', i.e.
'be aligned'.

I'm not sure what to do about the tab/space mixing, but keeping it
consistent is good enough for me.

Thanks,
drew

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

2021-07-23 Thread Marc Zyngier

On 2021-07-21 22:17, Andrew Jones wrote:

On Thu, Jul 15, 2021 at 05:31:53PM +0100, Marc Zyngier wrote:

Document the hypercalls user for the MMIO guard infrastructure.

Signed-off-by: Marc Zyngier 
---
 Documentation/virt/kvm/arm/index.rst  |  1 +
 Documentation/virt/kvm/arm/mmio-guard.rst | 73 
+++

 2 files changed, 74 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst

diff --git a/Documentation/virt/kvm/arm/index.rst 
b/Documentation/virt/kvm/arm/index.rst

index 78a9b670aafe..e77a0ee2e2d4 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -11,3 +11,4 @@ ARM
psci
pvtime
ptp_kvm
+   mmio-guard
diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst 
b/Documentation/virt/kvm/arm/mmio-guard.rst

new file mode 100644
index ..a5563a3e12cc
--- /dev/null
+++ b/Documentation/virt/kvm/arm/mmio-guard.rst
@@ -0,0 +1,73 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+KVM MMIO guard
+==
+
+KVM implements device emulation by handling translation faults to any
+IPA range that is not contained a memory slot. Such translation fault

  ^ in^ a


+is in most cases passed on to userspace (or in rare cases to the host
+kernel) with the address, size and possibly data of the access for
+emulation.
+
+Should the guest exit with an address that is not one that 
corresponds

+to an emulatable device, userspace may take measures that are not the
+most graceful as far as the guest is concerned (such as terminating 
it

+or delivering a fatal exception).
+
+There is also an element of trust: by forwarding the request to
+userspace, the kernel asumes that the guest trusts userspace to do 
the


assumes


+right thing.
+
+The KVM MMIO guard offers a way to mitigate this last point: a guest
+can request that only certainly regions of the IPA space are valid as


certain


Thanks, all corrections applied.




+MMIO. Only these regions will be handled as an MMIO, and any other
+will result in an exception being delivered to the guest.
+
+This relies on a set of hypercalls defined in the KVM-specific range,
+using the HVC64 calling convention.
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
+
+==
+Function ID:  (uint32)0xC602
+Arguments:none
+Return Values:(int64) NOT_SUPPORTED(-1) on error, or
+  (uint64)Protection Granule (PG) size in
+ bytes (r0)
+==
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
+
+====
+Function ID:  (uint32)0xC603
+Arguments:none
+Return Values:(int64) NOT_SUPPORTED(-1) on error, or
+  RET_SUCCESS(0) (r0)
+====
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
+
+==
==

+Function ID:  (uint32)0xC604
+Arguments:(uint64)The base of the PG-sized IPA range
+  that is allowed to be accessed as
+ MMIO. Must aligned to the PG size (r1)


align


Hmmm. Ugly mix of tab and spaces. I have no idea what the norm
is here, so I'll just put spaces. I'm sure someone will let me
know if I'm wrong! ;-)




+  (uint64)Index in the MAIR_EL1 register
+ providing the memory attribute that
+ is used by the guest (r2)
+Return Values:(int64) NOT_SUPPORTED(-1) on error, or
+  RET_SUCCESS(0) (r0)
+==
==

+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
+
+==
==

+Function ID:  (uint32)0xC604


copy+paste error, should be 0xC605


Gah, well cpotted.




+Arguments:(uint64)The base of the PG-sized IPA range
+  that is forbidden to be accessed as


is now forbidden

or

was allowed

or just drop that part of the sentence because its covered by the "and
have been previously mapped" part. Something like

PG-sized IPA range aligned to the PG size which has been previously 
mapped

(r1)


Picked the latter.

Thanks again,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

2021-07-23 Thread Marco Elver
On Thu, Jul 22, 2021 at 10:11AM +0100, Quentin Perret wrote:
> On Thursday 22 Jul 2021 at 06:45:14 (+), Shameerali Kolothum Thodi wrote:
> > > From: Will Deacon [mailto:w...@kernel.org]
> > > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > index 4b60c0056c04..a02c4877a055 100644
> > > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > > @@ -106,8 +106,7 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool,
> > > void *dev_pgt_pool)
> > > > mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > > > mmu->arch = _kvm.arch;
> > > > mmu->pgt = _kvm.pgt;
> > > > -   mmu->vmid.vmid_gen = 0;
> > > > -   mmu->vmid.vmid = 0;
> > > > +   atomic64_set(>vmid.id, 0);
> > > 
> > > I think this is the first atomic64 use in the EL2 object, which may pull 
> > > in
> > > some fatal KCSAN instrumentation. Quentin, have you run into this before?
> > > 
> > > Might be simple just to zero-initialise mmu for now, if it isn't already.
> > 
> > I will check that.
> 
> Yes I think what saves us here is that, AFAICT. arm64 doesn't support
> KCSAN yet. But the day it does, this should fail to link (hopefully)
> because of out-of-line calls into e.g. __kasan_check_write().
> 
> So yes, a simple zeroing here is probably preferable.

Note: Do not worry about hypothetically breaking with sanitizers here --
whether it's KASAN or KCSAN, they both instrument atomics. In files that
enable instrumentation but the atomic instrumentation should not be
pulled in, use the arch_ variants, but this doesn't apply here because
instrumentation shouldn't even be on.

The indicator that when KCSAN is supported on arm64, the Makefile here
just needs KCSAN_SANITIZE := n, is that all other instrumentation is
also killed entirely:

  $ grep -E "(PROFILE|SANITIZE|INSTRUMENT)" arch/arm64/kvm/hyp/nvhe/Makefile
  GCOV_PROFILE  := n
  KASAN_SANITIZE:= n
  UBSAN_SANITIZE:= n
  KCOV_INSTRUMENT   := n

KCSAN isn't supported on arm64 yet, and when it does, I believe Mark's
arm64 KCSAN series should take care of things like this.

Thanks,
-- Marco
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size

2021-07-23 Thread Marc Zyngier
On Tue, 20 Jul 2021 18:23:02 +0100,
Alexandru Elisei  wrote:
> 
> Hi Marc,
> 
> I just can't figure out why having the mmap lock is not needed to walk the
> userspace page tables. Any hints? Or am I not seeing where it's taken?

I trust Sean's explanation was complete enough!

> On 7/17/21 10:55 AM, Marc Zyngier wrote:
> > We currently rely on the kvm_is_transparent_hugepage() helper to
> > discover whether a given page has the potential to be mapped as
> > a block mapping.
> >
> > However, this API doesn't really give un everything we want:
> > - we don't get the size: this is not crucial today as we only
> >   support PMD-sized THPs, but we'd like to have larger sizes
> >   in the future
> > - we're the only user left of the API, and there is a will
> >   to remove it altogether
> >
> > To address the above, implement a simple walker using the existing
> > page table infrastructure, and plumb it into transparent_hugepage_adjust().
> > No new page sizes are supported in the process.
> >
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/mmu.c | 46 
> >  1 file changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..db6314b93e99 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -433,6 +433,44 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, 
> > size_t size,
> > return 0;
> >  }
> >  
> > +static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
> > +   /* We shouldn't need any other callback to walk the PT */
> > +   .phys_to_virt   = kvm_host_va,
> > +};
> > +
> > +struct user_walk_data {
> > +   u32 level;
> > +};
> > +
> > +static int user_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +  enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +   struct user_walk_data *data = arg;
> > +
> > +   data->level = level;
> > +   return 0;
> > +}
> > +
> > +static int get_user_mapping_size(struct kvm *kvm, u64 addr)
> > +{
> > +   struct user_walk_data data;
> > +   struct kvm_pgtable pgt = {
> > +   .pgd= (kvm_pte_t *)kvm->mm->pgd,
> > +   .ia_bits= VA_BITS,
> > +   .start_level= 4 - CONFIG_PGTABLE_LEVELS,
> > +   .mm_ops = _user_mm_ops,
> > +   };
> > +   struct kvm_pgtable_walker walker = {
> > +   .cb = user_walker,
> > +   .flags  = KVM_PGTABLE_WALK_LEAF,
> > +   .arg= ,
> > +   };
> > +
> > +   kvm_pgtable_walk(, ALIGN_DOWN(addr, PAGE_SIZE), PAGE_SIZE, );
> 
> I take it that it is guaranteed that kvm_pgtable_walk() will never
> fail? For example, I can see it failing if someone messes with
> KVM_PGTABLE_MAX_LEVELS.

But that's an architectural constant. How could it be messed with?
When we introduce 5 levels of page tables, we'll have to check all
this anyway.

> To be honest, I would rather have a check here instead of
> potentially feeding a bogus value to ARM64_HW_PGTABLE_LEVEL_SHIFT.
> It could be a VM_WARN_ON, so there's no runtime overhead unless
> CONFIG_DEBUG_VM.

Fair enough. That's easy enough to check.

> The patch looks good to me so far, but I want to give it another
> look (or two) after I figure out why the mmap semaphone is not
> needed.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [Linuxarm] Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM

2021-07-23 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> Sent: 22 July 2021 17:46
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; io...@lists.linux-foundation.org;
> kvmarm@lists.cs.columbia.edu; m...@kernel.org;
> alex.william...@redhat.com; eric.au...@redhat.com;
> zhangfei@linaro.org; Jonathan Cameron
> ; Zengtao (B) ;
> linux...@openeuler.org; Linuxarm 
> Subject: [Linuxarm] Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned
> VMID for NESTED stage with BTM
> 
> Hi Shameer,
> 
> On Wed, Jul 21, 2021 at 08:54:00AM +, Shameerali Kolothum Thodi
> wrote:
> > > More generally I think this pinned VMID set conflicts with that of
> > > stage-2-only domains (which is the default state until a guest attaches a
> > > PASID table). Say you have one guest using DOMAIN_NESTED without
> PASID
> > > table, just DMA to IPA using VMID 0x8000. Now another guest attaches a
> > > PASID table and obtains the same VMID from KVM. The stage-2 translation
> > > might use TLB entries from the other guest, no?  They'll both create
> > > stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
> >
> > Now that we are trying to align the KVM VMID allocation algorithm similar
> to
> > that of the ASID allocator [1], I attempted to use that for the SMMU pinned
> > VMID allocation. But the issue you have mentioned above is still valid.
> >
> > And as a solution what I have tried now is follow what pinned ASID is doing
> > in SVA,
> >  -Use xarray for private VMIDs
> >  -Get pinned VMID from KVM for DOMAIN_NESTED with PASID table
> >  -If the new pinned VMID is in use by private, then update the private
> >   VMID(VMID update to a live STE).
> >
> > This seems to work, but still need to run more tests with this though.
> >
> > > It's tempting to allocate all VMIDs through KVM instead, but that will
> > > force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and
> might
> > > break
> > > existing users of that extension (though I'm not sure there are any).
> > > Instead we might need to restrict the SMMU VMID bitmap to match the
> > > private VMID set in KVM.
> >
> > Another solution I have in mind is, make the new KVM VMID allocator
> common
> > between SMMUv3 and KVM. This will help to avoid all the private and
> shared
> > VMID splitting, also no need for live updates to STE VMID. One possible
> drawback
> > is less number of available KVM VMIDs but with 16 bit VMID space I am not
> sure
> > how much that is a concern.
> 
> Yes I think that works too. In practice there shouldn't be many VMIDs on
> the SMMU side, the feature's only enabled when a user wants to assign
> devices with nesting translation (unlike ASIDs where each device in the
> system gets a private ASID by default).

Ok. What about implementations that supports only stage 2? Do we
need a private VMID allocator for those or can use the same common
KVM VMID allocator?

> Note that you still need to pin all VMIDs used by the SMMU, otherwise
> you'll have to update the STE after rollover.

Sure.

> The problem we have with VFIO_TYPE1_NESTING_IOMMU might be solved by
> the
> upcoming deprecation of VFIO_*_IOMMU [2]. We need a specific sequence
> from
> userspace:
> 1. Attach VFIO group to KVM (KVM_DEV_VFIO_GROUP_ADD)
> 2. Create nesting IOMMU domain and attach the group to it
>(VFIO_GROUP_SET_CONTAINER, VFIO_SET_IOMMU becomes
> IOMMU_IOASID_ALLOC, VFIO_DEVICE_ATTACH_IOASID)
> Currently QEMU does 2 then 1, which would cause the SMMU to allocate a
> separate VMID.

Yes. I have observed this with my current implementation. I have a check
to see the private S2 config VMID belongs to the same domain s2_cfg, then
skip the live update to the STE VMID.

> If we wanted to extend VFIO_TYPE1_NESTING_IOMMU with
> PASID
> tables we'd need to mandate 1-2 and may break existing users. In the new
> design we can require from the start that creating a nesting IOMMU
> container through /dev/iommu *must* come with a KVM context, that way
> we're sure to reuse the existing VMID.

Ok. That helps.

Thanks,
Shameer
 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm