RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-17 Thread Krishna Reddy
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote:
>> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote:
> >> Changes in v10:
>> > Perform SMMU base ioremap before calling implementation init.
>> > Check for Global faults across both ARM MMU-500s during global interrupt.
>> > Check for context faults across all contexts of both ARM MMU-500s during 
>> > context fault interrupt.
> >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
>>
>> Please repost based on my SMMU queue, as this doesn't currently apply.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=
>> for-joerg/arm-smmu/updates

>Any update on this, please? It would be a shame to miss 5.9 now that the 
>patches look alright.

Thanks for pinging.
I have been out of the office this week. Just started working on reposting 
patches.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/5] iommu/uapi: Handle data and argsz filled by users

2020-07-17 Thread Alex Williamson
On Thu, 16 Jul 2020 11:45:16 -0700
Jacob Pan  wrote:

> IOMMU UAPI data has a user filled argsz field which indicates the data
> length comes with the API call. User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, results in possible argsz increase.
> Backward compatibility is ensured based on size and flags checking.
> 
> This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> reserved/unused fields in padding, flags, and version are also checked.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/iommu.c | 192 
> --
>  include/linux/iommu.h |  20 --
>  2 files changed, 200 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d43120eb1dc5..fe30a940d19e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1950,36 +1950,214 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +/*
> + * Check flags and other user privided data for valid combinations. We also
> + * make sure no reserved fields or unused flags are not set. This is to 
> ensure
> + * not breaking userspace in the future when these fields or flags are used.
> + */
> +static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
> *info)
> +{
> + u32 mask;
> +
> + if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> + if (info->cache & ~mask)
> + return -EINVAL;
> +
> + if (info->granularity >= IOMMU_INV_GRANU_NR)
> + return -EINVAL;
> +
> + switch (info->granularity) {
> + case IOMMU_INV_GRANU_ADDR:
> + mask = IOMMU_INV_ADDR_FLAGS_PASID |
> + IOMMU_INV_ADDR_FLAGS_ARCHID |
> + IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> + if (info->granu.addr_info.flags & ~mask)
> + return -EINVAL;
> + break;
> + case IOMMU_INV_GRANU_PASID:
> + mask = IOMMU_INV_PASID_FLAGS_PASID |
> + IOMMU_INV_PASID_FLAGS_ARCHID;
> + if (info->granu.pasid_info.flags & ~mask)
> + return -EINVAL;
> +
> + break;
> + case IOMMU_INV_GRANU_DOMAIN:
> + /* No flags to check */
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (info->padding[0] || info->padding[1])
> + return -EINVAL;
> +
> + return 0;
> +}
> +
>  int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> -struct iommu_cache_invalidate_info *inv_info)
> +void __user *uinfo)
>  {
> + struct iommu_cache_invalidate_info inv_info = { 0 };
> + u32 minsz, maxsz;
> + int ret = 0;
> +
>   if (unlikely(!domain->ops->cache_invalidate))
>   return -ENODEV;
>  
> - return domain->ops->cache_invalidate(domain, dev, inv_info);
> + /* Current kernel data size is the max to be copied from user */
> + maxsz = sizeof(struct iommu_cache_invalidate_info);
> +
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(_info, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before variable size union is mandatory */
> + if (inv_info.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu requires additional info beyond minsz */
> + if (inv_info.argsz == minsz &&
> + ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
> + (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
> + return -EINVAL;
> + /*
> +  * User might be using a newer UAPI header which has a larger data
> +  * size, we shall support the existing flags within the current
> +  * size. Copy the remaining user data _after_ minsz but not more
> +  * than the current kernel supported size.
> +  */
> + if (copy_from_user((void *)_info + minsz, uinfo + minsz,
> + min(inv_info.argsz, maxsz) - minsz))
> + return -EFAULT;

To further clarify previous comments about maxsz usage:

s/maxsz/sizeof(inv_info)/

> +
> + /* Now the argsz is validated, check the content */
> + ret = iommu_check_cache_invl_data(_info);
> + if (ret)
> + return ret;
> +
> + return domain->ops->cache_invalidate(domain, dev, _info);
>  }
>  

Re: [PATCH v5 1/5] docs: IOMMU user API

2020-07-17 Thread Alex Williamson
On Thu, 16 Jul 2020 11:45:13 -0700
Jacob Pan  wrote:

> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
> mechanics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  Documentation/userspace-api/iommu.rst | 339 
> ++
>  1 file changed, 339 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst 
> b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index ..efc3e1838235
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,339 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=
> +IOMMU Userspace API
> +=
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
> +usage, IOMMU is a system device which does not need to communicate
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
> +
> +.. contents:: :local:
> +
> +Functionalities
> +===
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> +4. Invalidate IOMMU caches
> +5. Service IO page faults (page request and response)
> +
> +Requirements
> +
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +==
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
> +
> +No new fields can be added *after* the variable sized union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +Though at the same time, argsz is user provided data which is not
> +trusted. The argsz field allows the user to indicate how much data
> +they're providing, it's still the kernel's responsibility to validate
> +whether it's correct and sufficient for the requested operation.
> +
> +Compatibility Checking
> +--
> +When IOMMU UAPI extension results in size increase, IOMMU UAPI core
> +and vendor driver shall handle the following cases:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on an older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.
> +
> +Feature Checking
> +
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without upfront compatibility checking, future faults
> +are difficult to report even in normal conditions. For example, TLB
> +invalidations should always succeed. There is no architectural way to
> +report back to the vIOMMU if the UAPI data is incompatible. If that
> +happens, in order to protect IOMMU iosolation guarantee, we have to
> +resort to not giving completion status in vIOMMU. This may result in
> +VM hang.
> +
> +For this reason the following IOMMU UAPIs cannot fail:
> +
> +1. Free PASID
> +2. Unbind guest PASID
> +3. Unbind guest PASID table (SMMU)
> +4. Cache invalidate
> +
> +User applications such as QEMU are expected to import kernel UAPI
> +headers. Backward 

Re: [PATCH v5 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-17 Thread Auger Eric
Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> This patch exports iommu nesting capability info to user space through
> VFIO. User space is expected to check this info for supported uAPIs (e.g.
it is not only to check the supported uAPIS but rather to know which
callbacks it must call upon vIOMMU events and which features are
supported by the physical IOMMU.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
> 
> The nesting info is available only after the nesting iommu type is set
> for a container.
to NESTED type
 Current implementation imposes one limitation - one
> nesting container should include at most one group. The philosophy of
> vfio container is having all groups/devices within the container share
> the same IOMMU context. When vSVA is enabled, one IOMMU context could
> include one 2nd-level address space and multiple 1st-level address spaces.
> While the 2nd-level address space is reasonably sharable by multiple groups
> , blindly sharing 1st-level address spaces across all groups within the
> container might instead break the guest expectation. In the future sub/
> super container concept might be introduced to allow partial address space
> sharing within an IOMMU context. But for now let's go with this restriction
> by requiring singleton container for using nesting iommu features. Below
> link has the related discussion about this decision.

Maybe add a note about SMMU related changes spotted by Jean.
> 
> https://lkml.org/lkml/2020/5/15/1028
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
>cap is much "cheap", if needs extension in future, just define another cap.
>https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/
> 
> v3 -> v4:
> *) address comments against v3.
> 
> v1 -> v2:
> *) added in v2
> ---
>  drivers/vfio/vfio_iommu_type1.c | 102 
> +++-
>  include/uapi/linux/vfio.h   |  19 
>  2 files changed, 109 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3bd70ff..ed80104 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
>"Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> - struct list_headdomain_list;
> - struct list_headiova_list;
> - struct vfio_domain  *external_domain; /* domain for external user */
> - struct mutexlock;
> - struct rb_root  dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned intdma_avail;
> - uint64_tpgsize_bitmap;
> - boolv2;
> - boolnesting;
> - booldirty_page_tracking;
> - boolpinned_page_dirty_scope;
> + struct list_headdomain_list;
> + struct list_headiova_list;
> + /* domain for external user */
> + struct vfio_domain  *external_domain;
> + struct mutexlock;
> + struct rb_root  dma_list;
> + struct blocking_notifier_head   notifier;
> + unsigned intdma_avail;
> + uint64_tpgsize_bitmap;
> + boolv2;
> + boolnesting;
> + booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
> + struct iommu_nesting_info   *nesting_info;
>  };
>  
>  struct vfio_domain {
> @@ -130,6 +132,9 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +#define CONTAINER_HAS_DOMAIN(iommu)  (((iommu)->external_domain) || \
> +  (!list_empty(&(iommu)->domain_list)))
> +
>  #define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
>  
>  /*
> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct 
> vfio_iommu *iommu,
>  
>   list_splice_tail(iova_copy, iova);
>  }
> +
> +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> +{
> + kfree(iommu->nesting_info);
> + iommu->nesting_info = NULL;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>struct iommu_group *iommu_group)
>  {
> @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
> 

Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info

2020-07-17 Thread Auger Eric
Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
> iommu_domain_get_attr() should return an iommu_nesting_info handle.

you may add in the commit message you return an empty nesting info
struct for now as true nesting is not yet supported by the SMMUs.

Besides:
Reviewed-by: Eric Auger 

Thanks

Eric
> 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
>  drivers/iommu/arm-smmu-v3.c | 29 +++--
>  drivers/iommu/arm-smmu.c| 29 +++--
>  2 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677..ec815d7 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -3019,6 +3019,32 @@ static struct iommu_group 
> *arm_smmu_device_group(struct device *dev)
>   return group;
>  }
>  
> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
> + void *data)
> +{
> + struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + unsigned int size;
> +
> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> + return -ENODEV;
> +
> + size = sizeof(struct iommu_nesting_info);
> +
> + /*
> +  * if provided buffer size is smaller than expected, should
> +  * return 0 and also the expected buffer size to caller.
> +  */
> + if (info->size < size) {
> + info->size = size;
> + return 0;
> + }
> +
> + /* report an empty iommu_nesting_info for now */
> + memset(info, 0x0, size);
> + info->size = size;
> + return 0;
> +}
> +
>  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>   enum iommu_attr attr, void *data)
>  {
> @@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
> *domain,
>   case IOMMU_DOMAIN_UNMANAGED:
>   switch (attr) {
>   case DOMAIN_ATTR_NESTING:
> - *(int *)data = (smmu_domain->stage == 
> ARM_SMMU_DOMAIN_NESTED);
> - return 0;
> + return arm_smmu_domain_nesting_info(smmu_domain, data);
>   default:
>   return -ENODEV;
>   }
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 243bc4c..09e2f1b 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1506,6 +1506,32 @@ static struct iommu_group 
> *arm_smmu_device_group(struct device *dev)
>   return group;
>  }
>  
> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
> + void *data)
> +{
> + struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + unsigned int size;
> +
> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> + return -ENODEV;
> +
> + size = sizeof(struct iommu_nesting_info);
> +
> + /*
> +  * if provided buffer size is smaller than expected, should
> +  * return 0 and also the expected buffer size to caller.
> +  */
> + if (info->size < size) {
> + info->size = size;
> + return 0;
> + }
> +
> + /* report an empty iommu_nesting_info for now */
> + memset(info, 0x0, size);
> + info->size = size;
> + return 0;
> +}
> +
>  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>   enum iommu_attr attr, void *data)
>  {
> @@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
> *domain,
>   case IOMMU_DOMAIN_UNMANAGED:
>   switch (attr) {
>   case DOMAIN_ATTR_NESTING:
> - *(int *)data = (smmu_domain->stage == 
> ARM_SMMU_DOMAIN_NESTED);
> - return 0;
> + return arm_smmu_domain_nesting_info(smmu_domain, data);
>   default:
>   return -ENODEV;
>   }
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 15/15] iommu/vt-d: Support reporting nesting capability info

2020-07-17 Thread Auger Eric
Hi Yi,

Missing a proper commit message. You can comment on the fact you only
support the case where all the physical iomms have the same CAP/ECAP MASKS

On 7/12/20 1:21 PM, Liu Yi L wrote:
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
> v2 -> v3:
> *) remove cap/ecap_mask in iommu_nesting_info.
> ---
>  drivers/iommu/intel/iommu.c | 81 
> +++--
>  include/linux/intel-iommu.h | 16 +
>  2 files changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index a9504cb..9f7ad1a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5659,12 +5659,16 @@ static inline bool iommu_pasid_support(void)
>  static inline bool nested_mode_support(void)
>  {
>   struct dmar_drhd_unit *drhd;
> - struct intel_iommu *iommu;
> + struct intel_iommu *iommu, *prev = NULL;
>   bool ret = true;
>  
>   rcu_read_lock();
>   for_each_active_iommu(iommu, drhd) {
> - if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
> + if (!prev)
> + prev = iommu;
> + if (!sm_supported(iommu) || !ecap_nest(iommu->ecap) ||
> + (VTD_CAP_MASK & (iommu->cap ^ prev->cap)) ||
> + (VTD_ECAP_MASK & (iommu->ecap ^ prev->ecap))) {
>   ret = false;
>   break;> }
> @@ -6079,6 +6083,78 @@ intel_iommu_domain_set_attr(struct iommu_domain 
> *domain,
>   return ret;
>  }
>  
> +static int intel_iommu_get_nesting_info(struct iommu_domain *domain,
> + struct iommu_nesting_info *info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + u64 cap = VTD_CAP_MASK, ecap = VTD_ECAP_MASK;
> + struct device_domain_info *domain_info;
> + struct iommu_nesting_info_vtd vtd;
> + unsigned long flags;
> + unsigned int size;
> +
> + if (domain->type != IOMMU_DOMAIN_UNMANAGED ||
> + !(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE))
> + return -ENODEV;
> +
> + if (!info)
> + return -EINVAL;
> +
> + size = sizeof(struct iommu_nesting_info) +
> + sizeof(struct iommu_nesting_info_vtd);
> + /*
> +  * if provided buffer size is smaller than expected, should
> +  * return 0 and also the expected buffer size to caller.
> +  */
> + if (info->size < size) {
> + info->size = size;
> + return 0;
> + }
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + /*
> +  * arbitrary select the first domain_info as all nesting
> +  * related capabilities should be consistent across iommu
> +  * units.
> +  */
> + domain_info = list_first_entry(_domain->devices,
> +struct device_domain_info, link);
> + cap &= domain_info->iommu->cap;
> + ecap &= domain_info->iommu->ecap;
> + spin_unlock_irqrestore(_domain_lock, flags);
> +
> + info->format = IOMMU_PASID_FORMAT_INTEL_VTD;
> + info->features = IOMMU_NESTING_FEAT_SYSWIDE_PASID |
> +  IOMMU_NESTING_FEAT_BIND_PGTBL |
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->addr_width = dmar_domain->gaw;
> + info->pasid_bits = ilog2(intel_pasid_max_id);
> + info->padding = 0;
> + vtd.flags = 0;
> + vtd.padding = 0;
> + vtd.cap_reg = cap;
> + vtd.ecap_reg = ecap;
> +
> + memcpy(info->data, , sizeof(vtd));
> + return 0;
> +}
> +
> +static int intel_iommu_domain_get_attr(struct iommu_domain *domain,
> +enum iommu_attr attr, void *data)
> +{
> + switch (attr) {
> + case DOMAIN_ATTR_NESTING:
> + {
> + struct iommu_nesting_info *info =
> + (struct iommu_nesting_info *)data;
> +
> + return intel_iommu_get_nesting_info(domain, info);
> + }
> + default:
> + return -ENODEV;
-ENOENT?
> + }
> +}
> +
>  /*
>   * Check that the device does not live on an external facing PCI port that is
>   * marked as untrusted. Such devices should not be able to apply quirks and
> @@ -6101,6 +6177,7 @@ const struct iommu_ops intel_iommu_ops = {
>   .domain_alloc   = intel_iommu_domain_alloc,
>   .domain_free= intel_iommu_domain_free,
>   .domain_set_attr= intel_iommu_domain_set_attr,
> + .domain_get_attr= intel_iommu_domain_get_attr,
>   .attach_dev = intel_iommu_attach_device,
>   .detach_dev = intel_iommu_detach_device,
>   .aux_attach_dev = intel_iommu_aux_attach_device,
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 18f292e..c4ed0d4 100644
> --- 

Re: [PATCH v5 02/15] iommu: Report domain nesting info

2020-07-17 Thread Auger Eric
Hi Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> IOMMUs that support nesting translation needs report the capability info
s/needs/need to report
> to userspace, e.g. the format of first level/stage paging structures.
It gives information about requirements the userspace needs to implement
plus other features characterizing the physical implementation.
> 
> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> nesting info after setting DOMAIN_ATTR_NESTING.
I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
> 
> v3 -> v4:
> *) split the SMMU driver changes to be a separate patch
> *) move the @addr_width and @pasid_bits from vendor specific
>part to generic part.
> *) tweak the description for the @features field of struct
>iommu_nesting_info.
> *) add description on the @data[] field of struct iommu_nesting_info
> 
> v2 -> v3:
> *) remvoe cap/ecap_mask in iommu_nesting_info.
> *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> *) return an empty iommu_nesting_info for SMMU drivers per Jean'
>suggestion.
> ---
>  include/uapi/linux/iommu.h | 77 
> ++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 1afc661..d2a47c4 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
>   } vendor;
>  };
>  
> +/*
> + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> + *  user space should check it before using
> + *  nesting capability.
> + *
> + * @size:size of the whole structure
> + * @format:  PASID table entry format, the same definition as struct
> + *   iommu_gpasid_bind_data @format.
> + * @features:supported nesting features.
> + * @flags:   currently reserved for future extension.
> + * @addr_width:  The output addr width of first level/stage translation
> + * @pasid_bits:  Maximum supported PASID bits, 0 represents no PASID
> + *   support.
> + * @data:vendor specific cap info. data[] structure type can be deduced
> + *   from @format field.
> + *
> + * +===+==+
> + * | feature   |  Notes   |
> + * +===+==+
> + * | SYSWIDE_PASID |  PASIDs are managed in system-wide, instead of per   |
s/in system-wide/system-wide ?
> + * |   |  device. When a device is assigned to userspace or   |
> + * |   |  VM, proper uAPI (userspace driver framework uAPI,   |
> + * |   |  e.g. VFIO) must be used to allocate/free PASIDs for |
> + * |   |  the assigned device.  
Isn't it possible to be more explicit, something like:
  |
System-wide PASID management is mandated by the physical IOMMU. All
PASIDs allocation must be mediated through the TBD API.
> + * +---+--+
> + * | BIND_PGTBL|  The owner of the first level/stage page table must  |
> + * |   |  explicitly bind the page table to associated PASID  |
> + * |   |  (either the one specified in bind request or the|
> + * |   |  default PASID of iommu domain), through userspace   |
> + * |   |  driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
understand ARM would not expose that BIND_PGTBL nesting feature, I still
think the above wording is a bit confusing. Maybe you may explicitly
talk about the PASID *entry* that needs to be passed from guest to host.
On ARM we directly pass the PASID table but when reading the above
description I fail to determine if this does not fit that description.
> + * +---+--+
> + * | CACHE_INVLD   |  The owner of the first level/stage page table must  |
> + * |   |  explicitly invalidate the IOMMU cache through uAPI  |
> + * |   |  provided by userspace driver framework (e.g. VFIO)  |
> + * |   |  according to vendor-specific requirement when   |
> + * |   |  changing the page table.|
> + * +---+--+

instead of using the "uAPI provided by userspace driver framework (e.g.
VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
a userspace documentation?

> + *
> + * @data[] types defined for @format:
> + * 

Re: [PATCH v5 5/5] iommu/vt-d: Check UAPI data processed by IOMMU core

2020-07-17 Thread Auger Eric
Hi Jacob,

On 7/16/20 8:45 PM, Jacob Pan wrote:
> IOMMU generic layer already does sanity checks UAPI data for version
> match and argsz range under generic information.
> Remove the redundant version check from VT-d driver and check for vendor
> specific data size.
> 
> Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/iommu/intel/iommu.c | 3 +--
>  drivers/iommu/intel/svm.c   | 7 +--
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index f3a6ca88cf95..5e80484f0537 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5383,8 +5383,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
> struct device *dev,
>   int ret = 0;
>   u64 size = 0;
>  
> - if (!inv_info || !dmar_domain ||
> - inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + if (!inv_info || !dmar_domain)
>   return -EINVAL;
>  
>   if (!dev || !dev_is_pci(dev))
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 713b3a218483..55ea11e9c0f5 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -240,8 +240,11 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   if (WARN_ON(!iommu) || !data)
>   return -EINVAL;
>  
> - if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> - data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + /* IOMMU core ensures argsz is more than the start of the union */
> + if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, 
> vendor.vtd))
>   return -EINVAL;
>  
>   if (!dev_is_pci(dev))
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/5] iommu/uapi: Handle data and argsz filled by users

2020-07-17 Thread Auger Eric
Hi Jacob,
On 7/16/20 8:45 PM, Jacob Pan wrote:

Could you share a branch? I was not able to apply this on either
iommu/next or master?

> IOMMU UAPI data has a user filled argsz field which indicates the data
> length comes with the API call.
s/ comes with the API call/ of the structure
 User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, results in possible argsz increase.
s/results/resulting
> Backward compatibility is ensured based on size and flags checking.
flags is missing in iommu_cache_invalidate_info.
> 
> This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> reserved/unused fields in padding, flags, and version are also checked.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/iommu.c | 192 
> --
>  include/linux/iommu.h |  20 --
>  2 files changed, 200 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d43120eb1dc5..fe30a940d19e 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1950,36 +1950,214 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +/*
> + * Check flags and other user privided data for valid combinations. We also
s/privided/provided
> + * make sure no reserved fields or unused flags are not set. This is to 
> ensure
s/not// ?
> + * not breaking userspace in the future when these fields or flags are used.
> + */
> +static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
> *info)
> +{
> + u32 mask;
> +
> + if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
increased version number?
> + return -EINVAL;
> +
> + mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> + if (info->cache & ~mask)
> + return -EINVAL;
> +
> + if (info->granularity >= IOMMU_INV_GRANU_NR)
> + return -EINVAL;
> +
> + switch (info->granularity) {
> + case IOMMU_INV_GRANU_ADDR:
> + mask = IOMMU_INV_ADDR_FLAGS_PASID |
> + IOMMU_INV_ADDR_FLAGS_ARCHID |
> + IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> + if (info->granu.addr_info.flags & ~mask)
> + return -EINVAL;
> + break;
> + case IOMMU_INV_GRANU_PASID:
> + mask = IOMMU_INV_PASID_FLAGS_PASID |
> + IOMMU_INV_PASID_FLAGS_ARCHID;
> + if (info->granu.pasid_info.flags & ~mask)
> + return -EINVAL;
> +
> + break;
> + case IOMMU_INV_GRANU_DOMAIN:
> + /* No flags to check */
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (info->padding[0] || info->padding[1])
padding has become "__u8padding[6];" in 2/5

> + return -EINVAL;
> +
> + return 0;
> +}
> +
>  int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> -struct iommu_cache_invalidate_info *inv_info)
> +void __user *uinfo)
>  {
> + struct iommu_cache_invalidate_info inv_info = { 0 };
> + u32 minsz, maxsz;
> + int ret = 0;
> +
>   if (unlikely(!domain->ops->cache_invalidate))
>   return -ENODEV;
>  
> - return domain->ops->cache_invalidate(domain, dev, inv_info);
> + /* Current kernel data size is the max to be copied from user */
> + maxsz = sizeof(struct iommu_cache_invalidate_info);
> +
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_cache_invalidate_info, granu);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(_info, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before variable size union is mandatory */
> + if (inv_info.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu requires additional info beyond minsz */
s/requires/require
> + if (inv_info.argsz == minsz &&
> + ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
> + (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
> + return -EINVAL;
> + /*
> +  * User might be using a newer UAPI header which has a larger data
> +  * size, we shall support the existing flags within the current
> +  * size. Copy the remaining user data _after_ minsz but not more
> +  * than the current kernel supported size.
> +  */
> + if (copy_from_user((void *)_info + minsz, uinfo + minsz,
> + min(inv_info.argsz, maxsz) - minsz))
> + return -EFAULT;
> +
> + /* 

Re: [PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-07-17 Thread Sironi, Filippo via iommu
On Fri, 2020-07-17 at 15:36 +0100, Robin Murphy wrote:
> On 2020-07-17 14:22, Sironi, Filippo wrote:
> > On Fri, 2020-07-17 at 10:47 +0100, Robin Murphy wrote:
> > > 
> > > On 2020-07-17 10:20, Sebastian Ott via iommu wrote:
> > > > Hello Joerg,
> > > > 
> > > > On 2020-07-10 14:31, Joerg Roedel wrote:
> > > > > On Wed, Jul 01, 2020 at 12:46:31AM +0200, Sebastian Ott wrote:
> > > > > > The IVRS ACPI table specifies maximum address sizes for I/O
> > > > > > virtual
> > > > > > addresses that can be handled by the IOMMUs in the system.
> > > > > > Parse
> > > > > > that
> > > > > > data from the IVRS header to provide aperture information
> > > > > > for
> > > > > > DMA
> > > > > > mappings and users of the iommu API.
> > > > > > 
> > > > > > Changes for V2:
> > > > > >- use limits in iommu_setup_dma_ops()
> > > > > >- rebased to current upstream
> > > > > > 
> > > > > > Sebastian Ott (3):
> > > > > > iommu/amd: Parse supported address sizes from IVRS
> > > > > > iommu/amd: Restrict aperture for domains to conform with
> > > > > > IVRS
> > > > > > iommu/amd: Actually enforce geometry aperture
> > > > > 
> > > > > Thanks for the changes. May I ask what the reason for those
> > > > > changes are?
> > > > > AFAIK all AMD IOMMU implementations (in hardware) support full
> > > > > 64bit
> > > > > address spaces, and the IVRS table might actually be wrong,
> > > > > limiting the
> > > > > address space in the worst case to only 32 bit.
> > > > 
> > > > It's not the IOMMU, but we've encountered devices that are
> > > > capable
> > > > of
> > > > more than
> > > > 32- but less than 64- bit IOVA, and there's no way to express
> > > > that
> > > > to
> > > > the IOVA
> > > > allocator in the PCIe spec. Our solution was to have our
> > > > platforms
> > > > express an
> > > > IVRS entry that says the IOMMU is capable of 48-bit, which these
> > > > devices
> > > > can generate.
> > > > 48 bits is plenty of address space in this generation for the
> > > > application we have.
> > > 
> > > Hmm, for constraints of individual devices, it should really be
> > > their
> > > drivers' job to call dma_set_mask*() with the appropriate value in
> > > the
> > > first place rather than just assuming that 64 means anything >32.
> > > If
> > > it's a case where the device itself technically is 64-bit capable,
> > > but
> > > an upstream bridge is constraining it, then that limit can also be
> > > described either by dedicated firmware properties (e.g. ACPI _DMA)
> > > or
> > > with a quirk like via_no_dac().
> > > 
> > > Robin.
> > 
> > You cannot rely on the device driver only because the device driver
> > attach might be a generic one like vfio-pci, for instance, that
> > doesn't
> > have any device specific knowledge.
> 
> Indeed, but on the other hand a generic driver that doesn't know the
> device is highly unlikely to set up any DMA transactions by itself
> either. In the case of VFIO, it would then be the guest/userspace
> driver's responsibility to take the equivalent action to avoid
> allocating addresses the hardware can't actually use.

I don't believe that we want to trust a userspace driver here, this may
result in hosts becoming unstable because devices are asked to do things
they aren't meant to do (e.g., DMA beyond 48 bits).

> I'm mostly just wary that trying to fake up a per-device restriction
> as
> a global one is a bit crude, and has the inherent problem that
> whatever
> you think the lowest common denominator is, there's the potential for
> some device to be hotplugged in later and break the assumption you've
> already had to commit to.

I agree, if the BIOS sets up an IVRS table with aperture of 48 bits and
all of a sudden we hotplug a device that only support 36 bits we're in a
bad place.

> And of course I am taking a bit of a DMA-API-centric viewpoint here -
> I
> think exposing per-device properties like bus_dma_limit that aren't
> easily identifiable for VFIO users to take into account is still
> rather
> an open problem.
> 
> Robin.

The use of ACPI _DMA that you suggest looks interesting and would allow
the kernel to prevent a dumb userspace driver using VFIO to make damage,
I think.

It doesn't look like there's much support for ACPI _DMA though. Are you
aware of existing efforts on this front?

Filippo





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-07-17 Thread Robin Murphy

On 2020-07-17 14:22, Sironi, Filippo wrote:

On Fri, 2020-07-17 at 10:47 +0100, Robin Murphy wrote:


On 2020-07-17 10:20, Sebastian Ott via iommu wrote:

Hello Joerg,

On 2020-07-10 14:31, Joerg Roedel wrote:

On Wed, Jul 01, 2020 at 12:46:31AM +0200, Sebastian Ott wrote:

The IVRS ACPI table specifies maximum address sizes for I/O
virtual
addresses that can be handled by the IOMMUs in the system. Parse
that
data from the IVRS header to provide aperture information for
DMA
mappings and users of the iommu API.

Changes for V2:
   - use limits in iommu_setup_dma_ops()
   - rebased to current upstream

Sebastian Ott (3):
iommu/amd: Parse supported address sizes from IVRS
iommu/amd: Restrict aperture for domains to conform with IVRS
iommu/amd: Actually enforce geometry aperture


Thanks for the changes. May I ask what the reason for those
changes are?
AFAIK all AMD IOMMU implementations (in hardware) support full
64bit
address spaces, and the IVRS table might actually be wrong,
limiting the
address space in the worst case to only 32 bit.


It's not the IOMMU, but we've encountered devices that are capable
of
more than
32- but less than 64- bit IOVA, and there's no way to express that
to
the IOVA
allocator in the PCIe spec. Our solution was to have our platforms
express an
IVRS entry that says the IOMMU is capable of 48-bit, which these
devices
can generate.
48 bits is plenty of address space in this generation for the
application we have.


Hmm, for constraints of individual devices, it should really be their
drivers' job to call dma_set_mask*() with the appropriate value in the
first place rather than just assuming that 64 means anything >32. If
it's a case where the device itself technically is 64-bit capable, but
an upstream bridge is constraining it, then that limit can also be
described either by dedicated firmware properties (e.g. ACPI _DMA) or
with a quirk like via_no_dac().

Robin.


You cannot rely on the device driver only because the device driver
attach might be a generic one like vfio-pci, for instance, that doesn't
have any device specific knowledge.


Indeed, but on the other hand a generic driver that doesn't know the 
device is highly unlikely to set up any DMA transactions by itself 
either. In the case of VFIO, it would then be the guest/userspace 
driver's responsibility to take the equivalent action to avoid 
allocating addresses the hardware can't actually use.


I'm mostly just wary that trying to fake up a per-device restriction as 
a global one is a bit crude, and has the inherent problem that whatever 
you think the lowest common denominator is, there's the potential for 
some device to be hotplugged in later and break the assumption you've 
already had to commit to.


And of course I am taking a bit of a DMA-API-centric viewpoint here - I 
think exposing per-device properties like bus_dma_limit that aren't 
easily identifiable for VFIO users to take into account is still rather 
an open problem.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/5] iommu/uapi: Use named union for user data

2020-07-17 Thread Auger Eric
Hi Jacob,

On 7/16/20 8:45 PM, Jacob Pan wrote:
> IOMMU UAPI data size is filled by the user space which must be validated
> by ther kernel. To ensure backward compatibility, user data can only be
s/ther/the
> extended by either re-purpose padding bytes or extend the variable sized
> union at the end. No size change is allowed before the union. Therefore,
> the minimum size is the offset of the union.
> 
> To use offsetof() on the union, we must make it named.
> 
> Link: https://lore.kernel.org/linux-iommu/20200611145518.0c281...@x1.home/
> Signed-off-by: Jacob Pan 
> Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/intel/iommu.c | 24 
>  drivers/iommu/intel/svm.c   |  2 +-
>  include/uapi/linux/iommu.h  |  4 ++--
>  3 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6ad8b6f20235..f3a6ca88cf95 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
> struct device *dev,
>  
>   /* Size is only valid in address selective invalidation */
>   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
> - size = to_vtd_size(inv_info->addr_info.granule_size,
> -inv_info->addr_info.nb_granules);
> + size = to_vtd_size(inv_info->granu.addr_info.granule_size,
> +inv_info->granu.addr_info.nb_granules);
>  
>   for_each_set_bit(cache_type,
>(unsigned long *)_info->cache,
> @@ -5432,20 +5432,20 @@ intel_iommu_sva_invalidate(struct iommu_domain 
> *domain, struct device *dev,
>* granularity.
>*/
>   if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
> - (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
> - pasid = inv_info->pasid_info.pasid;
> + (inv_info->granu.pasid_info.flags & 
> IOMMU_INV_PASID_FLAGS_PASID))
> + pasid = inv_info->granu.pasid_info.pasid;
>   else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> -  (inv_info->addr_info.flags & 
> IOMMU_INV_ADDR_FLAGS_PASID))
> - pasid = inv_info->addr_info.pasid;
> +  (inv_info->granu.addr_info.flags & 
> IOMMU_INV_ADDR_FLAGS_PASID))
> + pasid = inv_info->granu.addr_info.pasid;
>  
>   switch (BIT(cache_type)) {
>   case IOMMU_CACHE_INV_TYPE_IOTLB:
>   /* HW will ignore LSB bits based on address mask */
>   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
>   size &&
> - (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
> size)) - 1))) {
> + (inv_info->granu.addr_info.addr & 
> ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
>   WARN_ONCE(1, "Address out of range, 0x%llx, 
> size order %llu\n",
> -   inv_info->addr_info.addr, size);
> +   inv_info->granu.addr_info.addr, size);
>   }
>  
>   /*
> @@ -5453,9 +5453,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
> struct device *dev,
>* We use npages = -1 to indicate that.
>*/
>   qi_flush_piotlb(iommu, did, pasid,
> - mm_to_dma_pfn(inv_info->addr_info.addr),
> + 
> mm_to_dma_pfn(inv_info->granu.addr_info.addr),
>   (granu == QI_GRAN_NONG_PASID) ? -1 : 1 
> << size,
> - inv_info->addr_info.flags & 
> IOMMU_INV_ADDR_FLAGS_LEAF);
> + inv_info->granu.addr_info.flags & 
> IOMMU_INV_ADDR_FLAGS_LEAF);
>  
>   if (!info->ats_enabled)
>   break;
> @@ -5476,13 +5476,13 @@ intel_iommu_sva_invalidate(struct iommu_domain 
> *domain, struct device *dev,
>   size = 64 - VTD_PAGE_SHIFT;
>   addr = 0;
>   } else if (inv_info->granularity == 
> IOMMU_INV_GRANU_ADDR)
> - addr = inv_info->addr_info.addr;
> + addr = inv_info->granu.addr_info.addr;
>  
>   if (info->ats_enabled)
>   qi_flush_dev_iotlb_pasid(iommu, sid,
>   info->pfsid, pasid,
>   info->ats_qdep,
> - inv_info->addr_info.addr,
> + inv_info->granu.addr_info.addr,
>  

Re: [PATCH v5 2/5] iommu/uapi: Add argsz for user filled data

2020-07-17 Thread Auger Eric
Hi Jacob,

On 7/16/20 8:45 PM, Jacob Pan wrote:
> As IOMMU UAPI gets extended, user data size may increase. To support
> backward compatibiliy, this patch introduces a size field to each UAPI
> data structures. It is *always* the responsibility for the user to fill in
> the correct size. Padding fields are adjusted to ensure 8 byte alignment.
> 
> Specific scenarios for user data handling are documented in:
> Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  include/uapi/linux/iommu.h | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e907b7091a46..d5e9014f690e 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -135,6 +135,7 @@ enum iommu_page_response_code {
>  
>  /**
>   * struct iommu_page_response - Generic page response information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @flags: encodes whether the corresponding fields are valid
>   * (IOMMU_FAULT_PAGE_RESPONSE_* values)
> @@ -143,6 +144,7 @@ enum iommu_page_response_code {
>   * @code: response code from  iommu_page_response_code
>   */
>  struct iommu_page_response {
> + __u32   argsz;
>  #define IOMMU_PAGE_RESP_VERSION_11
Don't you need to incr the version for all the modified structs?
>   __u32   version;
>  #define IOMMU_PAGE_RESP_PASID_VALID  (1 << 0)
> @@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
>  /**
>   * struct iommu_cache_invalidate_info - First level/stage invalidation
>   * information
> + * @argsz: User filled size of this data
>   * @version: API version of this structure
>   * @cache: bitfield that allows to select which caches to invalidate
>   * @granularity: defines the lowest granularity used for the invalidation:
> @@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
>   * must support the used granularity.
>   */
>  struct iommu_cache_invalidate_info {
> + __u32   argsz;
>  #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>   __u32   version;
so there is no "flags" field in this struct. Is it OK?
>  /* IOMMU paging structure cache */
> @@ -255,7 +259,7 @@ struct iommu_cache_invalidate_info {
>  #define IOMMU_CACHE_INV_TYPE_NR  (3)
>   __u8cache;
>   __u8granularity;
> - __u8padding[2];
> + __u8padding[6];
>   union {
>   struct iommu_inv_pasid_info pasid_info;
>   struct iommu_inv_addr_info addr_info;
> @@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
>  
>  /**
>   * struct iommu_gpasid_bind_data - Information about device and guest PASID 
> binding
> + * @argsz:   User filled size of this data
>   * @version: Version of this data structure
>   * @format:  PASID table entry format
>   * @flags:   Additional information on guest bind request
> @@ -309,17 +314,18 @@ struct iommu_gpasid_bind_data_vtd {
>   * PASID to host PASID based on this bind data.
>   */
>  struct iommu_gpasid_bind_data {
> + __u32 argsz;
>  #define IOMMU_GPASID_BIND_VERSION_1  1
>   __u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD 1
>   __u32 format;
> + __u32 addr_width;
>  #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */
>   __u64 flags;
>   __u64 gpgd;
>   __u64 hpasid;
>   __u64 gpasid;
> - __u32 addr_width;
> - __u8  padding[12];
> + __u8  padding[8];
>   /* Vendor specific data */
>   union {
>   struct iommu_gpasid_bind_data_vtd vtd;
> 
Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/5] docs: IOMMU user API

2020-07-17 Thread Auger Eric
Hi Jacob,

On 7/16/20 8:45 PM, Jacob Pan wrote:
> IOMMU UAPI is newly introduced to support communications between guest
> virtual IOMMU and host IOMMU. There has been lots of discussions on how
> it should work with VFIO UAPI and userspace in general.
> 
> This document is indended to clarify the UAPI design and usage. The
intended
> mechanics of how future extensions should be achieved are also covered
> in this documentation.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  Documentation/userspace-api/iommu.rst | 339 
> ++
>  1 file changed, 339 insertions(+)
>  create mode 100644 Documentation/userspace-api/iommu.rst
> 
> diff --git a/Documentation/userspace-api/iommu.rst 
> b/Documentation/userspace-api/iommu.rst
> new file mode 100644
> index ..efc3e1838235
> --- /dev/null
> +++ b/Documentation/userspace-api/iommu.rst
> @@ -0,0 +1,339 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. iommu:
> +
> +=
> +IOMMU Userspace API
> +=
> +
> +IOMMU UAPI is used for virtualization cases where communications are
> +needed between physical and virtual IOMMU drivers. For native
s/native/baremetal?
> +usage, IOMMU is a system device which does not need to communicate
the IOMMU?
> +with user space directly.
> +
> +The primary use cases are guest Shared Virtual Address (SVA) and
> +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
> +required to communicate with the physical IOMMU in the host.
wherin the vIOMMU implementation relies on the physical IOMMU and for
this reason requires interactions with the host driver.

> +
> +.. contents:: :local:
> +
> +Functionalities
> +===
> +Communications of user and kernel involve both directions. The
> +supported user-kernel APIs are as follows:
> +
> +1. Alloc/Free PASID
> +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
s/sMMU/SMMU
> +4. Invalidate IOMMU caches
> +5. Service IO page faults (page request and response)
Report errors to the guest and serve page requests?
> +
> +Requirements
> +
> +The IOMMU UAPIs are generic and extensible to meet the following
> +requirements:
> +
> +1. Emulated and para-virtualised vIOMMUs
> +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
SMMU
> +3. Extensions to the UAPI shall not break existing user space
> +
> +Interfaces
> +==
> +Although the data structures defined in IOMMU UAPI are self-contained,
> +there is no user API functions introduced. Instead, IOMMU UAPI is
> +designed to work with existing user driver frameworks such as VFIO.
> +
> +Extension Rules & Precautions
> +-
> +When IOMMU UAPI gets extended, the data structures can *only* be
> +modified in two ways:
> +
> +1. Adding new fields by re-purposing the padding[] field. No size change.
> +2. Adding new union members at the end. May increase in size.
May increase the structure sizes
> +
> +No new fields can be added *after* the variable sized union in that it
> +will break backward compatibility when offset moves. In both cases, a
> +new flag must be accompanied with a new field such that the IOMMU
a new flag must be introduced whenever a change affects the structure
using either method?
> +driver can process the data based on the new flag. Version field is
> +only reserved for the unlikely event of UAPI upgrade at its entirety.
> +
> +It's *always* the caller's responsibility to indicate the size of the
> +structure passed by setting argsz appropriately.
> +Though at the same time, argsz is user provided data which is not
> +trusted. The argsz field allows the user to indicate how much data
> +they're providing, it's still the kernel's responsibility to validate
he is providing
> +whether it's correct and sufficient for the requested operation.
> +
> +Compatibility Checking
> +--
> +When IOMMU UAPI extension results in size increase, IOMMU UAPI core
in some structure size increase, the IOMMU UAPI code
> +and vendor driver shall handle the following cases:
> +
> +1. User and kernel has exact size match
> +2. An older user with older kernel header (smaller UAPI size) running on a
> +   newer kernel (larger UAPI size)
> +3. A newer user with newer kernel header (larger UAPI size) running
> +   on an older kernel.
> +4. A malicious/misbehaving user pass illegal/invalid size but within
> +   range. The data may contain garbage.> +
> +Feature Checking
> +
> +While launching a guest with vIOMMU, it is important to ensure that host
> +can support the UAPI data structures to be used for vIOMMU-pIOMMU
> +communications. Without upfront compatibility checking, future faults
s/faults/failures?
> +are difficult to report even in normal conditions. For example, TLB
> +invalidations should always succeed. There is no architectural way to
> +report back to the vIOMMU if the UAPI data is 

Re: [PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-07-17 Thread Sironi, Filippo via iommu
On Fri, 2020-07-17 at 10:47 +0100, Robin Murphy wrote:
> 
> On 2020-07-17 10:20, Sebastian Ott via iommu wrote:
> > Hello Joerg,
> > 
> > On 2020-07-10 14:31, Joerg Roedel wrote:
> > > On Wed, Jul 01, 2020 at 12:46:31AM +0200, Sebastian Ott wrote:
> > > > The IVRS ACPI table specifies maximum address sizes for I/O
> > > > virtual
> > > > addresses that can be handled by the IOMMUs in the system. Parse
> > > > that
> > > > data from the IVRS header to provide aperture information for
> > > > DMA
> > > > mappings and users of the iommu API.
> > > > 
> > > > Changes for V2:
> > > >   - use limits in iommu_setup_dma_ops()
> > > >   - rebased to current upstream
> > > > 
> > > > Sebastian Ott (3):
> > > >iommu/amd: Parse supported address sizes from IVRS
> > > >iommu/amd: Restrict aperture for domains to conform with IVRS
> > > >iommu/amd: Actually enforce geometry aperture
> > > 
> > > Thanks for the changes. May I ask what the reason for those
> > > changes are?
> > > AFAIK all AMD IOMMU implementations (in hardware) support full
> > > 64bit
> > > address spaces, and the IVRS table might actually be wrong,
> > > limiting the
> > > address space in the worst case to only 32 bit.
> > 
> > It's not the IOMMU, but we've encountered devices that are capable
> > of
> > more than
> > 32- but less than 64- bit IOVA, and there's no way to express that
> > to
> > the IOVA
> > allocator in the PCIe spec. Our solution was to have our platforms
> > express an
> > IVRS entry that says the IOMMU is capable of 48-bit, which these
> > devices
> > can generate.
> > 48 bits is plenty of address space in this generation for the
> > application we have.
> 
> Hmm, for constraints of individual devices, it should really be their
> drivers' job to call dma_set_mask*() with the appropriate value in the
> first place rather than just assuming that 64 means anything >32. If
> it's a case where the device itself technically is 64-bit capable, but
> an upstream bridge is constraining it, then that limit can also be
> described either by dedicated firmware properties (e.g. ACPI _DMA) or
> with a quirk like via_no_dac().
> 
> Robin.

You cannot rely on the device driver only because the device driver
attach might be a generic one like vfio-pci, for instance, that doesn't
have any device specific knowledge.

Filippo





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-17 Thread Robin Murphy

On 2020-07-13 14:44, Will Deacon wrote:

On Tue, Jul 07, 2020 at 10:00:17PM -0700, Krishna Reddy wrote:

Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Signed-off-by: Krishna Reddy 
---
  drivers/iommu/arm-smmu-nvidia.c | 99 +
  drivers/iommu/arm-smmu.c| 17 +-
  drivers/iommu/arm-smmu.h|  3 +
  3 files changed, 117 insertions(+), 2 deletions(-)


Given that faults shouldn't occur during normal operation, is this patch
actually necessary?


Indeed they shouldn't, but if something *does* happen to go wrong then I 
think it's worth having proper handling in place, since the consequences 
otherwise include a screaming "spurious" fault or just silently losing 
some transactions and possibly locking up part of the system altogether 
(depending on HUPCF at least - I recall MMU-500 also behaving funnily 
WRT TLB maintenance while an IRQ is outstanding, but that was long 
enough ago that it might have been related to the old CFCFG behaviour).


Until we sort out the reserved memory regions thing (the new IORT spec 
is due Real Soon Now(TM)...) some systems are going to keep suffering 
transient context faults during boot - those may make the display 
unhappy until it gets reset, but we certainly don't want to invite the 
possibility of them wedging the SMMU itself.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-17 Thread Will Deacon
On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote:
> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote:
> > Changes in v10:
> > Perform SMMU base ioremap before calling implementation init.
> > Check for Global faults across both ARM MMU-500s during global interrupt.
> > Check for context faults across all contexts of both ARM MMU-500s during 
> > context fault interrupt.
> > Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
> 
> Please repost based on my SMMU queue, as this doesn't currently apply.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Any update on this, please? It would be a shame to miss 5.9 now that the
patches look alright.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-07-17 Thread Robin Murphy

On 2020-07-17 10:20, Sebastian Ott via iommu wrote:

Hello Joerg,

On 2020-07-10 14:31, Joerg Roedel wrote:

On Wed, Jul 01, 2020 at 12:46:31AM +0200, Sebastian Ott wrote:

The IVRS ACPI table specifies maximum address sizes for I/O virtual
addresses that can be handled by the IOMMUs in the system. Parse that
data from the IVRS header to provide aperture information for DMA
mappings and users of the iommu API.

Changes for V2:
  - use limits in iommu_setup_dma_ops()
  - rebased to current upstream

Sebastian Ott (3):
   iommu/amd: Parse supported address sizes from IVRS
   iommu/amd: Restrict aperture for domains to conform with IVRS
   iommu/amd: Actually enforce geometry aperture

Thanks for the changes. May I ask what the reason for those changes are?
AFAIK all AMD IOMMU implementations (in hardware) support full 64bit
address spaces, and the IVRS table might actually be wrong, limiting the
address space in the worst case to only 32 bit.


It's not the IOMMU, but we've encountered devices that are capable of 
more than
32- but less than 64- bit IOVA, and there's no way to express that to 
the IOVA
allocator in the PCIe spec. Our solution was to have our platforms 
express an
IVRS entry that says the IOMMU is capable of 48-bit, which these devices 
can generate.
48 bits is plenty of address space in this generation for the 
application we have.


Hmm, for constraints of individual devices, it should really be their 
drivers' job to call dma_set_mask*() with the appropriate value in the 
first place rather than just assuming that 64 means anything >32. If 
it's a case where the device itself technically is 64-bit capable, but 
an upstream bridge is constraining it, then that limit can also be 
described either by dedicated firmware properties (e.g. ACPI _DMA) or 
with a quirk like via_no_dac().


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-07-17 Thread Sebastian Ott via iommu

Hello Joerg,

On 2020-07-10 14:31, Joerg Roedel wrote:

On Wed, Jul 01, 2020 at 12:46:31AM +0200, Sebastian Ott wrote:

The IVRS ACPI table specifies maximum address sizes for I/O virtual
addresses that can be handled by the IOMMUs in the system. Parse that
data from the IVRS header to provide aperture information for DMA
mappings and users of the iommu API.

Changes for V2:
  - use limits in iommu_setup_dma_ops()
  - rebased to current upstream

Sebastian Ott (3):
   iommu/amd: Parse supported address sizes from IVRS
   iommu/amd: Restrict aperture for domains to conform with IVRS
   iommu/amd: Actually enforce geometry aperture

Thanks for the changes. May I ask what the reason for those changes are?
AFAIK all AMD IOMMU implementations (in hardware) support full 64bit
address spaces, and the IVRS table might actually be wrong, limiting the
address space in the worst case to only 32 bit.


It's not the IOMMU, but we've encountered devices that are capable of 
more than
32- but less than 64- bit IOVA, and there's no way to express that to 
the IOVA
allocator in the PCIe spec. Our solution was to have our platforms 
express an
IVRS entry that says the IOMMU is capable of 48-bit, which these devices 
can generate.
48 bits is plenty of address space in this generation for the 
application we have.


Sebastian




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 03/15] iommu/smmu: Report empty domain nesting info

2020-07-17 Thread Jean-Philippe Brucker
On Thu, Jul 16, 2020 at 10:38:17PM +0200, Auger Eric wrote:
> Hi Jean,
> 
> On 7/16/20 5:39 PM, Jean-Philippe Brucker wrote:
> > On Tue, Jul 14, 2020 at 10:12:49AM +, Liu, Yi L wrote:
> >>> Have you verified that this doesn't break the existing usage of
> >>> DOMAIN_ATTR_NESTING in drivers/vfio/vfio_iommu_type1.c?
> >>
> >> I didn't have ARM machine on my hand. But I contacted with Jean
> >> Philippe, he confirmed no compiling issue. I didn't see any code
> >> getting DOMAIN_ATTR_NESTING attr in current 
> >> drivers/vfio/vfio_iommu_type1.c.
> >> What I'm adding is to call iommu_domai_get_attr(, DOMAIN_ATTR_NESTIN)
> >> and won't fail if the iommu_domai_get_attr() returns 0. This patch
> >> returns an empty nesting info for DOMAIN_ATTR_NESTIN and return
> >> value is 0 if no error. So I guess it won't fail nesting for ARM.
> > 
> > I confirm that this series doesn't break the current support for
> > VFIO_IOMMU_TYPE1_NESTING with an SMMUv3. That said...
> > 
> > If the SMMU does not support stage-2 then there is a change in behavior
> > (untested): after the domain is silently switched to stage-1 by the SMMU
> > driver, VFIO will now query nesting info and obtain -ENODEV. Instead of
> > succeding as before, the VFIO ioctl will now fail. I believe that's a fix
> > rather than a regression, it should have been like this since the
> > beginning. No known userspace has been using VFIO_IOMMU_TYPE1_NESTING so
> > far, so I don't think it should be a concern.
> But as Yi mentioned ealier, in the current vfio code there is no
> DOMAIN_ATTR_NESTING query yet.

That's why something that would have succeeded before will now fail:
Before this series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would
have succeeded even if the SMMU didn't support stage-2, as the driver
would have silently fallen back on stage-1 mappings (which work exactly
the same as stage-2-only since there was no nesting supported). After the
series, we do check for DOMAIN_ATTR_NESTING so if user asks for
VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the ioctl
fails.

I believe it's a good fix and completely harmless, but wanted to make sure
no one objects because it's an ABI change.

Thanks,
Jean

> In my SMMUV3 nested stage series, I added
> such a query in vfio-pci.c to detect if I need to expose a fault region
> but I already test both the returned value and the output arg. So to me
> there is no issue with that change.
> > 
> > And if userspace queries the nesting properties using the new ABI
> > introduced in this patchset, it will obtain an empty struct. I think
> > that's acceptable, but it may be better to avoid adding the nesting cap if
> > @format is 0?
> agreed
> 
> Thanks
> 
> Eric
> > 
> > Thanks,
> > Jean
> > 
> >>
> >> @Eric, how about your opinion? your dual-stage vSMMU support may
> >> also share the vfio_iommu_type1.c code.
> >>
> >> Regards,
> >> Yi Liu
> >>
> >>> Will
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI polling for CMD SYNC

2020-07-17 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Friday, July 17, 2020 8:55 PM
> To: Song Bao Hua (Barry Song) ; w...@kernel.org;
> j...@8bytes.org
> Cc: linux-ker...@vger.kernel.org; Linuxarm ;
> linux-arm-ker...@lists.infradead.org; iommu@lists.linux-foundation.org;
> Zengtao (B) 
> Subject: Re: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI
> polling for CMD SYNC
> 
> On 2020-07-17 00:07, Barry Song wrote:
> > Before commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention
> during
> > command-queue insertion"), msi polling perhaps performed better since
> > it could run outside the spin_lock_irqsave() while the code polling cons
> > reg was running in the lock.
> >
> > But after the great reorganization of smmu queue, neither of these two
> > polling methods are running in a spinlock. And real tests show polling
> > cons reg via sev means smaller latency. It is probably because polling
> > by msi will ask hardware to write memory but sev polling depends on the
> > update of register only.
> >
> > Using 16 threads to run netperf on hns3 100G NIC with UDP packet size
> > in 32768bytes and set iommu to strict, TX throughput can improve from
> > 25227.74Mbps to 27145.59Mbps by this patch. In this case, SMMU is super
> > busy as hns3 sends map/unmap requests extremely frequently.
> 
> How many different systems and SMMU implementations are those numbers
> representative of? Given that we may have cases where the SMMU can use
> MSIs but can't use SEV, so would have to fall back to inefficient
> busy-polling, I'd be wary of removing this entirely. Allowing particular
> platforms or SMMU implementations to suppress MSI functionality if they
> know for sure it makes sense seems like a safer bet.
> 
Hello Robin,

Thanks for taking a look. Actually I was really struggling with the good way to 
make every platform happy.
And I don't have other platforms to test and check if those platforms run 
better by sev polling. Even two
platforms have completely same SMMU features, it is still possible they behave 
differently.
So I simply sent this patch to get the discussion started to get opinions.

At the first beginning, I wanted to have a module parameter for users to decide 
if msi polling should be disabled.
But the module parameter might be totally ignored by linux distro.

--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -418,6 +418,11 @@ module_param_named(disable_bypass, disable_bypass, bool, 
S_IRUGO);  MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
+static bool disable_msipolling = 1;
+module_param_named(disable_msipolling, disable_msipolling, bool, 
+S_IRUGO); MODULE_PARM_DESC(disable_msipolling,
+   "Don't use MSI to poll the completion of CMD_SYNC if it is slower than 
+SEV");
+
 enum pri_resp {
PRI_RESP_DENY = 0,
PRI_RESP_FAIL = 1,
@@ -992,7 +997,7 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct 
arm_smmu_device *smmu,
 * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI
 * payload, so the write will zero the entire command on that platform.
 */
-   if (smmu->features & ARM_SMMU_FEAT_MSI &&
+   if (!disable_msipolling && smmu->features & ARM_SMMU_FEAT_MSI &&
smmu->features & ARM_SMMU_FEAT_COHERENCY) {
ent.sync.msiaddr = q->base_dma + Q_IDX(>llq, prod) *
   q->ent_dwords * 8;
@@ -1332,7 +1337,7 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct 
arm_smmu_device *smmu,  static int arm_smmu_cmdq_poll_until_sync(struct 
arm_smmu_device *smmu,
 struct arm_smmu_ll_queue *llq)
 {
-   if (smmu->features & ARM_SMMU_FEAT_MSI &&
+   if (!disable_msipolling && smmu->features & ARM_SMMU_FEAT_MSI &&
smmu->features & ARM_SMMU_FEAT_COHERENCY)
return __arm_smmu_cmdq_poll_until_msi(smmu, llq);


Another option is that we don't use module parameter, alternatively, we check 
the vendor/chip ID,
if the chip has better performance on sev polling, it may set 
disable_msipolling to true.

You are very welcome to give your suggestions.

Thanks
Barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI polling for CMD SYNC

2020-07-17 Thread Robin Murphy

On 2020-07-17 00:07, Barry Song wrote:

Before commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during
command-queue insertion"), msi polling perhaps performed better since
it could run outside the spin_lock_irqsave() while the code polling cons
reg was running in the lock.

But after the great reorganization of smmu queue, neither of these two
polling methods are running in a spinlock. And real tests show polling
cons reg via sev means smaller latency. It is probably because polling
by msi will ask hardware to write memory but sev polling depends on the
update of register only.

Using 16 threads to run netperf on hns3 100G NIC with UDP packet size
in 32768bytes and set iommu to strict, TX throughput can improve from
25227.74Mbps to 27145.59Mbps by this patch. In this case, SMMU is super
busy as hns3 sends map/unmap requests extremely frequently.


How many different systems and SMMU implementations are those numbers 
representative of? Given that we may have cases where the SMMU can use 
MSIs but can't use SEV, so would have to fall back to inefficient 
busy-polling, I'd be wary of removing this entirely. Allowing particular 
platforms or SMMU implementations to suppress MSI functionality if they 
know for sure it makes sense seems like a safer bet.


Robin.


Cc: Prime Zeng 
Signed-off-by: Barry Song 
---
  drivers/iommu/arm-smmu-v3.c | 46 +
  1 file changed, 1 insertion(+), 45 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677a5c41..e55282a636c8 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -964,12 +964,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
break;
case CMDQ_OP_CMD_SYNC:
-   if (ent->sync.msiaddr) {
-   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, 
CMDQ_SYNC_0_CS_IRQ);
-   cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
-   } else {
-   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, 
CMDQ_SYNC_0_CS_SEV);
-   }
+   cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, 
ARM_SMMU_MEMATTR_OIWB);
break;
@@ -983,21 +978,10 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
  static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device 
*smmu,
 u32 prod)
  {
-   struct arm_smmu_queue *q = >cmdq.q;
struct arm_smmu_cmdq_ent ent = {
.opcode = CMDQ_OP_CMD_SYNC,
};
  
-	/*

-* Beware that Hi16xx adds an extra 32 bits of goodness to its MSI
-* payload, so the write will zero the entire command on that platform.
-*/
-   if (smmu->features & ARM_SMMU_FEAT_MSI &&
-   smmu->features & ARM_SMMU_FEAT_COHERENCY) {
-   ent.sync.msiaddr = q->base_dma + Q_IDX(>llq, prod) *
-  q->ent_dwords * 8;
-   }
-
arm_smmu_cmdq_build_cmd(cmd, );
  }
  
@@ -1251,30 +1235,6 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu,

return ret;
  }
  
-/*

- * Wait until the SMMU signals a CMD_SYNC completion MSI.
- * Must be called with the cmdq lock held in some capacity.
- */
-static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu,
- struct arm_smmu_ll_queue *llq)
-{
-   int ret = 0;
-   struct arm_smmu_queue_poll qp;
-   struct arm_smmu_cmdq *cmdq = >cmdq;
-   u32 *cmd = (u32 *)(Q_ENT(>q, llq->prod));
-
-   queue_poll_init(smmu, );
-
-   /*
-* The MSI won't generate an event, since it's being written back
-* into the command queue.
-*/
-   qp.wfe = false;
-   smp_cond_load_relaxed(cmd, !VAL || (ret = queue_poll()));
-   llq->cons = ret ? llq->prod : queue_inc_prod_n(llq, 1);
-   return ret;
-}
-
  /*
   * Wait until the SMMU cons index passes llq->prod.
   * Must be called with the cmdq lock held in some capacity.
@@ -1332,10 +1292,6 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct 
arm_smmu_device *smmu,
  static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu,
 struct arm_smmu_ll_queue *llq)
  {
-   if (smmu->features & ARM_SMMU_FEAT_MSI &&
-   smmu->features & ARM_SMMU_FEAT_COHERENCY)
-   return __arm_smmu_cmdq_poll_until_msi(smmu, llq);
-
return __arm_smmu_cmdq_poll_until_consumed(smmu, llq);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] iommu/mediatek: check 4GB mode by reading infracfg

2020-07-17 Thread Miles Chen
On Wed, 2020-07-15 at 23:05 +0200, Matthias Brugger wrote:
> 
> On 02/07/2020 11:37, Miles Chen wrote:
> > In previous disscusion [1] and [2], we found that it is risky to
> > use max_pfn or totalram_pages to tell if 4GB mode is enabled.
> > 
> > Check 4GB mode by reading infracfg register, remove the usage
> > of the unexported symbol max_pfn.
> > 
> > [1] 
> > https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/3/733__;!!CTRNKA9wMg0ARbw!16gAfVnSY87W4t5kE4iw20QPxBgS_SHBvPKlePKU7CGIb18nUzuRUjHumcf4oYVhIQ$
> >  
> > [2] 
> > https://urldefense.com/v3/__https://lkml.org/lkml/2020/6/4/136__;!!CTRNKA9wMg0ARbw!16gAfVnSY87W4t5kE4iw20QPxBgS_SHBvPKlePKU7CGIb18nUzuRUjHumcfr4i9p5g$
> >  
> > 
> > Cc: Mike Rapoport 
> > Cc: David Hildenbrand 
> > Cc: Yong Wu 
> > Cc: Yingjoe Chen 
> > Cc: Christoph Hellwig 
> > Signed-off-by: Miles Chen 
> > ---
> >   drivers/iommu/mtk_iommu.c | 22 ++
> >   1 file changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2be96f1cdbd2..09be57bd8d74 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -3,7 +3,6 @@
> >* Copyright (c) 2015-2016 MediaTek Inc.
> >* Author: Yong Wu 
> >*/
> > -#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -15,11 +14,13 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -91,6 +92,9 @@
> >   #define F_MMU_INT_ID_LARB_ID(a)   (((a) >> 7) & 0x7)
> >   #define F_MMU_INT_ID_PORT_ID(a)   (((a) >> 2) & 0x1f)
> >   
> > +#define REG_INFRA_MISC 0xf00
> > +#define F_DDR_4GB_SUPPORT_EN   BIT(13)
> > +
> 
> As this is used for infracfg, I think it would be good to add it to 
> include/linux/soc/mediatek/infracfg.h and include that file here.
Thanks for your comment.

ok. I'll do this in next version.
> 
> >   #define MTK_PROTECT_PA_ALIGN  128
> >   
> >   /*
> > @@ -599,8 +603,10 @@ static int mtk_iommu_probe(struct platform_device 
> > *pdev)
> > struct resource *res;
> > resource_size_t ioaddr;
> > struct component_match  *match = NULL;
> > +   struct regmap   *infracfg_regmap;
> 
> Maybe call it just infracfg.

ok. I'll do this in next version. 
> 
> > void*protect;
> > int i, larb_nr, ret;
> > +   u32 val;
> >   
> > data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > if (!data)
> > @@ -614,10 +620,18 @@ static int mtk_iommu_probe(struct platform_device 
> > *pdev)
> > return -ENOMEM;
> > data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN);
> >   
> > -   /* Whether the current dram is over 4GB */
> > -   data->enable_4GB = !!(max_pfn > (BIT_ULL(32) >> PAGE_SHIFT));
> > -   if (!data->plat_data->has_4gb_mode)
> > +   if (data->plat_data->has_4gb_mode) {
> > +   infracfg_regmap = syscon_regmap_lookup_by_phandle(dev->of_node,
> > +   "mediatek,infracfg");
> > +   if (IS_ERR(infracfg_regmap))
> > +   return PTR_ERR(infracfg_regmap);
> 
> Do we need to error out, or could we be conservative and set endable_4GB = 
> false?

We have to error out in this case because the 4gb_mode setting must be
consistent with the h/w setting.

> 
> > +   ret = regmap_read(infracfg_regmap, REG_INFRA_MISC, );
> > +   if (ret)
> > +   return ret;
> > +   data->enable_4GB = !!(val & F_DDR_4GB_SUPPORT_EN);
> > +   } else {
> > data->enable_4GB = false;
> 
> Move that before the if() and update enable_4GB only in case of has_4gb_mode.

ok. I'll do this in next version.

Miles
> 
> > +   }
> >   
> > res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > data->base = devm_ioremap_resource(dev, res);
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dt-bindings: mediatek: add mediatek,infracfg phandle

2020-07-17 Thread Miles Chen
On Wed, 2020-07-15 at 14:51 -0600, Rob Herring wrote:
> On Thu, Jul 02, 2020 at 05:37:17PM +0800, Miles Chen wrote:
> > Add a description for mediatek,infracfg. We can check if 4GB mode
> > is enable by reading it instead of checking the unexported
> > symbol "max_pfn".
> > 
> > This is a step towards building mtk_iommu as a kernel module.
> 
> You determined this before without DT, so it is an OS problem and 
> shouldn't need a DT update.

Thanks for your comment.

The old way (using max_pfn) do determine this is risky because the
max_pfn may lower than (GB if reserved memory regions occupy memory
higher than 4GB.

So, the better way to do this is by reading register from H/W.
> 
> I'd assume there's only one instance of the node mediatek,infracfg 
> points to, so just search for it if you want to get the info from DT.
> 
I can do syscon_regmap_lookup_by_compatible() to search for it. However,
the compatibles are different in mt2712e.dtsi and mt8173.dtsi. so I have
to search "mediatek,mt2712-infracfg" and "mediatek,mt8173-infracfg"
respectively.

Using mediatek,infracfg phandle can make the code easier to read.
Is it possible to reconsider the phandle approach, please?


arch/arm64/boot/dts/mediatek/mt2712e.dtsi:253:
infracfg: syscon@10001000 {
compatible = "mediatek,mt2712-infracfg", "syscon";
arch/arm64/boot/dts/mediatek/mt8173.dtsi:363:   
infracfg: power-controller@10001000 {
compatible = "mediatek,mt8173-infracfg", "syscon";



> 
> > 
> > Cc: Yong Wu 
> > Signed-off-by: Miles Chen 
> > ---
> >  Documentation/devicetree/bindings/iommu/mediatek,iommu.txt | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
> > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > index ce59a505f5a4..a7881deabcca 100644
> > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > @@ -74,6 +74,8 @@ Required properties:
> >  - mediatek,larbs : List of phandle to the local arbiters in the current 
> > Socs.
> > Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must sort
> > according to the local arbiter index, like larb0, larb1, larb2...
> > +- mediatek,infracfg: a phandle to infracfg. It is used to confirm if 4GB 
> > mode is set.
> > +   It is an optional property, add it when the SoC have 4g mode.
> >  - iommu-cells : must be 1. This is the mtk_m4u_id according to the HW.
> > Specifies the mtk_m4u_id as defined in
> > dt-binding/memory/mt2701-larb-port.h for mt2701, mt7623
> > -- 
> > 2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu