Re: [PATCH] swiotlb: Allow swiotlb to live at pre-defined address

2020-03-28 Thread Dave Young
On 03/26/20 at 05:29pm, Alexander Graf wrote:
> The swiotlb is a very convenient fallback mechanism for bounce buffering of
> DMAable data. It is usually used for the compatibility case where devices
> can only DMA to a "low region".
> 
> However, in some scenarios this "low region" may be bound even more
> heavily. For example, there are embedded system where only an SRAM region
> is shared between device and CPU. There are also heterogeneous computing
> scenarios where only a subset of RAM is cache coherent between the
> components of the system. There are partitioning hypervisors, where
> a "control VM" that implements device emulation has limited view into a
> partition's memory for DMA capabilities due to safety concerns.
> 
> This patch adds a command line driven mechanism to move all DMA memory into
> a predefined shared memory region which may or may not be part of the
> physical address layout of the Operating System.
> 
> Ideally, the typical path to set this configuration would be through Device
> Tree or ACPI, but neither of the two mechanisms is standardized yet. Also,
> in the x86 MicroVM use case, we have neither ACPI nor Device Tree, but
> instead configure the system purely through kernel command line options.
> 
> I'm sure other people will find the functionality useful going forward
> though and extend it to be triggered by DT/ACPI in the future.

Hmm, we have a use case for kdump, this maybe useful.  For example
swiotlb is enabled by default if AMD SME/SEV is active, and in kdump
kernel we have to increase the crashkernel reserved size for the extra
swiotlb requirement.  I wonder if we can just reuse the old kernel's
swiotlb region and pass the addr to kdump kernel.

> 
> Signed-off-by: Alexander Graf 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +-
>  Documentation/x86/x86_64/boot-options.rst   |  4 ++-
>  kernel/dma/swiotlb.c| 46 
> +++--
>  3 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..d085d55c3cbe 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4785,11 +4785,12 @@
>   it if 0 is given (See 
> Documentation/admin-guide/cgroup-v1/memory.rst)
>  
>   swiotlb=[ARM,IA-64,PPC,MIPS,X86]
> - Format: {  | force | noforce }
> + Format: {  | force | noforce | addr= }
>-- Number of I/O TLB slabs
>   force -- force using of bounce buffers even if they
>wouldn't be automatically used by the kernel
>   noforce -- Never use bounce buffers (for debugging)
> + addr= -- Try to allocate SWIOTLB at defined 
> address
>  
>   switches=   [HW,M68k]
>  
> diff --git a/Documentation/x86/x86_64/boot-options.rst 
> b/Documentation/x86/x86_64/boot-options.rst
> index 2b98efb5ba7f..ca46c57b68c9 100644
> --- a/Documentation/x86/x86_64/boot-options.rst
> +++ b/Documentation/x86/x86_64/boot-options.rst
> @@ -297,11 +297,13 @@ iommu options only relevant to the AMD GART hardware 
> IOMMU:
>  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
>  implementation:
>  
> -swiotlb=[,force]
> +swiotlb=[,force][,addr=]
>
>  Prereserve that many 128K pages for the software IO bounce buffering.
>force
>  Force all IO through the software TLB.
> +  addr=
> +Try to allocate SWIOTLB at defined address
>  
>  Settings for the IBM Calgary hardware IOMMU currently found in IBM
>  pSeries and xSeries machines
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..83da0caa2f93 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include 
> @@ -102,6 +103,12 @@ unsigned int max_segment;
>  #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
>  static phys_addr_t *io_tlb_orig_addr;
>  
> +/*
> + * The TLB phys addr may be defined on the command line. Store it here if it 
> is.
> + */
> +static phys_addr_t io_tlb_addr = INVALID_PHYS_ADDR;
> +
> +
>  /*
>   * Protect the above data structures in the map and unmap calls
>   */
> @@ -119,11 +126,23 @@ setup_io_tlb_npages(char *str)
>   }
>   if (*str == ',')
>   ++str;
> - if (!strcmp(str, "force")) {
> + if (!strncmp(str, "force", 5)) {
>   swiotlb_force = SWIOTLB_FORCE;
> - } else if (!strcmp(str, "noforce")) {
> + str += 5;
> + } else if (!strncmp(str, "noforce", 7)) {
>   swiotlb_force = SWIOTLB_NO_FORCE;
>   io_tlb_nslabs = 1;
> + str += 7;
> + }
> +
> + if (*str == ',')
> + 

RE: [PATCH V10 11/11] iommu/vt-d: Add custom allocator for IOASID

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 21, 2020 7:28 AM
> 
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 84
> +
>  include/linux/intel-iommu.h |  2 ++
>  2 files changed, 86 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index a76afb0fd51a..c1c0b0fb93c3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1757,6 +1757,9 @@ static void free_dmar_iommu(struct intel_iommu
> *iommu)
>   if (ecap_prs(iommu->ecap))
>   intel_svm_finish_prq(iommu);
>   }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(>pasid_allocator);
> +
>  #endif
>  }
> 
> @@ -3291,6 +3294,84 @@ static int copy_translation_tables(struct
> intel_iommu *iommu)
>   return ret;
>  }
> 
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)

the name is too generic... can we add vcmd in the name to clarify
its purpose, e.g. intel_vcmd_ioasid_alloc?

> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + if (!iommu)
> + return INVALID_IOASID;
> + /*
> +  * VT-d virtual command interface always uses the full 20 bit
> +  * PASID range. Host can partition guest PASID range based on
> +  * policies but it is out of guest's control.
> +  */
> + if (min < PASID_MIN || max > intel_pasid_max_id)
> + return INVALID_IOASID;
> +
> + if (vcmd_alloc_pasid(iommu, ))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> +  * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +  * We can only free the PASID when all the devices are unbound.
> +  */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }

However the sanity check is not done in default_free. Is there a reason
why using vcmd adds such  new requirement?

> + vcmd_free_pasid(iommu, ioasid);
> +}
> +
> +static void register_pasid_allocator(struct intel_iommu *iommu)
> +{
> + /*
> +  * If we are running in the host, no need for custom allocator
> +  * in that PASIDs are allocated from the host system-wide.
> +  */
> + if (!cap_caching_mode(iommu->cap))
> + return;

is it more accurate to check against vcmd capability?

> +
> + if (!sm_supported(iommu)) {
> + pr_warn("VT-d Scalable Mode not enabled, no PASID
> allocation\n");
> + return;
> + }
> +
> + /*
> +  * Register a custom PASID allocator if we are running in a guest,
> +  * guest PASID must be obtained via virtual command interface.
> +  * There can be multiple vIOMMUs in each guest but only one
> allocator
> +  * is active. All vIOMMU allocators will eventually be calling the same

which one? the first or last?

> +  * host allocator.
> +  */
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> + pr_info("Register custom PASID allocator\n");
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + if (ioasid_register_allocator(>pasid_allocator)) {
> + pr_warn("Custom PASID allocator failed, scalable
> mode disabled\n");
> + /*
> +  * Disable scalable mode on this IOMMU if there
> +  * is no custom allocator. Mixing SM capable
> vIOMMU
> +  * and non-SM vIOMMU are not supported.
> +  */
> + intel_iommu_sm = 0;

since you register an allocator for every vIOMMU, means previously
registered allocators should also be unregistered here?

> + }
> + }
> +}
> +#endif
> +
>  static int __init init_dmars(void)
>  {
>   struct dmar_drhd_unit *drhd;
> @@ -3408,6 +3489,9 @@ static int __init init_dmars(void)
>*/
>   for_each_active_iommu(iommu, drhd) {
>   iommu_flush_write_buffer(iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + register_pasid_allocator(iommu);
> +#endif
>   iommu_set_root_entry(iommu);
>  

RE: [PATCH V10 10/11] iommu/vt-d: Enlightened PASID allocation

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 21, 2020 7:28 AM
> 
> From: Lu Baolu 
> 
> Enabling IOMMU in a guest requires communication with the host
> driver for certain aspects. Use of PASID ID to enable Shared Virtual
> Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
> provides a Virtual Command Register (VCMD) to facilitate this.
> Writes to this register in the guest are trapped by QEMU which
> proxies the call to the host driver.

Qemu -> vIOMMU

> 
> This virtual command interface consists of a capability register,
> a virtual command register, and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
> 
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command interface.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> ---
>  drivers/iommu/intel-pasid.c | 57
> +
>  drivers/iommu/intel-pasid.h | 13 ++-
>  include/linux/intel-iommu.h |  1 +
>  3 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 9f6d07410722..e87ad67aad36 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -27,6 +27,63 @@
>  static DEFINE_SPINLOCK(pasid_lock);
>  u32 intel_pasid_max_id = PASID_MAX;
> 
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + int ret = 0;
> + u64 res;
> +
> + raw_spin_lock_irqsave(>register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG,
> VCMD_CMD_ALLOC);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(>register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + *pasid = VCMD_VRSP_RESULT_PASID(res);
> + break;
> + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> + pr_info("IOMMU: %s: No PASID available\n", iommu->name);
> + ret = -ENOSPC;
> + break;
> + default:
> + ret = -ENODEV;
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +
> + return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + u64 res;
> +
> + raw_spin_lock_irqsave(>register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG,
> + VCMD_CMD_OPERAND(pasid) | VCMD_CMD_FREE);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(>register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + break;
> + case VCMD_VRSP_SC_INVALID_PASID:
> + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> + break;
> + default:
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +}
> +
>  /*
>   * Per device pasid table management:
>   */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 698015ee3f04..cd3d63f3e936 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -23,6 +23,16 @@
>  #define is_pasid_enabled(entry)  (((entry)->lo >> 3) & 0x1)
>  #define get_pasid_dir_size(entry)(1 << entry)->lo >> 9) & 0x7) + 7))
> 
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC   0x1
> +#define VCMD_CMD_FREE0x2
> +#define VCMD_VRSP_IP 0x1
> +#define VCMD_VRSP_SC(e)  (((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS 0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL  1
> +#define VCMD_VRSP_SC_INVALID_PASID   1
> +#define VCMD_VRSP_RESULT_PASID(e)(((e) >> 8) & 0xf)
> +#define VCMD_CMD_OPERAND(e)  ((e) << 8)
>  /*
>   * Domain ID reserved for pasid entries programmed for first-level
>   * only and pass-through transfer modes.
> @@ -113,5 +123,6 @@ int intel_pasid_setup_nested(struct intel_iommu
> *iommu,
>   int addr_width);
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>struct device *dev, int pasid);
> -
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ccbf164fb711..9cbf5357138b 

RE: [PATCH V10 09/11] iommu/vt-d: Cache virtual command capability register

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 21, 2020 7:28 AM
> 
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
> 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> Reviewed-by: Lu Baolu 
> 
> ---
> v7 Reviewed by Eric & Baolu
> ---
> ---
>  drivers/iommu/dmar.c| 1 +
>  include/linux/intel-iommu.h | 5 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 4d6b7b5b37ee..3b36491c8bbb 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -963,6 +963,7 @@ static int map_iommu(struct intel_iommu *iommu,
> u64 phys_addr)
>   warn_invalid_dmar(phys_addr, " returns all ones");
>   goto unmap;
>   }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
> 
>   /* the registers might be more than one page */
>   map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 43539713b3b3..ccbf164fb711 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -194,6 +194,9 @@
>  #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
>  #define ecap_sc_support(e)   ((e >> 7) & 0x1) /* Snooping Control */
> 
> +/* Virtual command interface capabilities */

capabilities -> capability

Reviewed-by: Kevin Tian 

> +#define vccap_pasid(v)   ((v & DMA_VCS_PAS)) /* PASID
> allocation */
> +
>  /* IOTLB_REG */
>  #define DMA_TLB_FLUSH_GRANU_OFFSET  60
>  #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -287,6 +290,7 @@
> 
>  /* PRS_REG */
>  #define DMA_PRS_PPR  ((u32)1)
> +#define DMA_VCS_PAS  ((u64)1)
> 
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)
>   \
>  do { \
> @@ -537,6 +541,7 @@ struct intel_iommu {
>   u64 reg_size; /* size of hw register set */
>   u64 cap;
>   u64 ecap;
> + u64 vccap;
>   u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF
> */
>   raw_spinlock_t  register_lock; /* protect register handling */
>   int seq_id; /* sequence id of the iommu */
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V10 08/11] iommu/vt-d: Add svm/sva invalidate function

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 21, 2020 7:28 AM
> 
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
> 
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue

emulated IOMMU -> vIOMMU, since virito-iommu could use the
interface as well.

> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
> 
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
> 
> ---
> v7 review fixed in v10
> ---
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c | 182
> 
>  1 file changed, 182 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b1477cd423dd..a76afb0fd51a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5619,6 +5619,187 @@ static void
> intel_iommu_aux_detach_device(struct iommu_domain *domain,
>   aux_domain_remove_dev(to_dmar_domain(domain), dev);
>  }
> 
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap 
> operation
> + * as a result of DMA or VFIO unmap. However, for assigned devices guest
> + * owns the first level page tables. Invalidations of translation caches in 
> the
> + * guest are trapped and passed down to the host.
> + *
> + * vIOMMU in the guest will only expose first level page tables, therefore
> + * we do not include IOTLB granularity for request without PASID (second
> level).

I would revise above as "We do not support IOTLB granularity for request 
without PASID (second level), therefore any vIOMMU implementation that
exposes the SVA capability to the guest should only expose the first level
page tables, implying all invalidation requests from the guest will include
a valid PASID"

> + *
> + * For example, to find the VT-d granularity encoding for IOTLB
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + *
> + * Granu_map array indicates validity of the table. 1: valid, 0: invalid
> + *
> + */
> +const static int
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_
> NR] = {
> + /*
> +  * PASID based IOTLB invalidation: PASID selective (per PASID),
> +  * page selective (address granularity)
> +  */
> + {0, 1, 1},
> + /* PASID based dev TLBs, only support all PASIDs or single PASID */
> + {1, 1, 0},

Is this combination correct? when single PASID is being specified, it is 
essentially a page-selective invalidation since you need provide Address
and Size. 

> + /* PASID cache */

PASID cache is fully managed by the host. Guest PASID cache invalidation
is interpreted by vIOMMU for bind and unbind operations. I don't think
we should accept any PASID cache invalidation from userspace or guest.

> + {1, 1, 0}
> +};
> +
> +const static int
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU
> _NR] = {
> + /* PASID based IOTLB */
> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> + /* PASID cache */
> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu, int *vtd_granu)
> +{
> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> IOMMU_INV_GRANU_NR ||
> + !inv_type_granu_map[type][granu])
> + return -EINVAL;
> +
> + *vtd_granu = inv_type_granu_table[type][granu];
> +

btw do we really need both map and table here? Can't we just
use one table with unsupported granularity marked as a special
value?

> + return 0;
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> +  * IOMMU cache invalidate API passes granu_size in bytes, and
> number of
> +  * granu size in contiguous memory.
> +  */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info
> *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +   

RE: [PATCH V10 06/11] iommu/vt-d: Add bind guest PASID support

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 21, 2020 7:28 AM
> 
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
> .-.  .---.
> |   vIOMMU|  | Guest process CR3, FL only|
> | |  '---'
> ./
> | PASID Entry |--- PASID cache flush -
> '-'   |
> | |   V
> | |CR3 in GPA
> '-'
> Guest
> --| Shadow |--|
>   vv  v
> Host
> .-.  .--.
> |   pIOMMU|  | Bind FL for GVA-GPA  |
> | |  '--'
> ./  |
> | PASID Entry | V (Nested xlate)
> '\.--.
> | |   |SL for GPA-HPA, default domain|
> | |   '--'
> '-'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 224
> 
>  include/linux/intel-iommu.h |   8 +-
>  include/linux/intel-svm.h   |  17 
>  4 files changed, 252 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index e599b2537b1c..b1477cd423dd 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6203,6 +6203,10 @@ const struct iommu_ops intel_iommu_ops = {
>   .dev_disable_feat   = intel_iommu_dev_disable_feat,
>   .is_attach_deferred = intel_iommu_is_attach_deferred,
>   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid= intel_svm_bind_gpasid,
> + .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> +#endif
>  };
> 
>  static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index d7f2a5358900..47c0deb5ae56 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -226,6 +226,230 @@ static LIST_HEAD(global_svm_list);
>   list_for_each_entry((sdev), &(svm)->devs, list) \
>   if ((d) != (sdev)->dev) {} else
> 
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev,
> + struct iommu_gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct dmar_domain *ddomain;

what about the full name e.g. dmar_domain? though a bit longer
but clearer than ddomain.

> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;
> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + } else {
> + return -ENOTSUPP;
> + }
> +
> + /*
> +  * We only check host PASID range, we have no knowledge to check
> +  * guest PASID range nor do we use the guest PASID.
> +  */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + ddomain = to_dmar_domain(domain);
> +
> + /* Sanity check paging mode support match between host and guest
> */
> + if (data->addr_width == ADDR_WIDTH_5LEVEL &&
> + !cap_5lp_support(iommu->cap)) {
> + pr_err("Cannot support 5 level paging requested by
> guest!\n");
> + return -EINVAL;
> + }

-ENOTSUPP?

> +
> + mutex_lock(_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> + if (svm) {
> + /*
> +  * If we found svm for the PASID, there must be at
> +  * least one device bond, otherwise svm should be freed.
> +  */
> + if (WARN_ON(list_empty(>devs))) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (svm->mm == get_task_mm(current) &&
> + data->hpasid == svm->pasid &&
> +   

Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

2020-03-28 Thread Sai Prakash Ranjan

Hi Robin,

On 2020-03-28 00:32, Robin Murphy wrote:

On 2020-03-27 3:09 pm, Sai Prakash Ranjan wrote:

Imagine your network driver doesn't implement a .shutdown method (so
the hardware is still active regardless of device links), happens to
have an Rx buffer or descriptor ring DMA-mapped at an IOVA that looks
like the physical address of the memory containing some part of the
kernel text lower down that call stack, and the MAC receives a
broadcast IP packet at about the point arm_smmu_device_shutdown() is
returning. Enjoy debugging that ;)

And if coincidental memory corruption seems too far-fetched for your
liking, other fun alternatives might include "display tries to scan
out from powered-off device, deadlocks interconnect and prevents
anything else making progress", or "access to TZC-protected physical
address triggers interrupt and over-eager Secure firmware resets
system before orderly poweroff has a chance to finish".

Of course the fact that in practice we'll *always* see the warning
because there's no way to tear down the default DMA domains, and even
if all devices *have* been nicely quiesced there's no way to tell, is
certainly less than ideal. Like I say, it's not entirely clear-cut
either way...



Thanks for these examples, good to know these scenarios in case we come 
across these.
However, if we see these error/warning messages appear everytime then 
what will be
the credibility of these messages? We will just ignore these messages 
when
these issues you mention actually appears because we see them everytime 
on

reboot or shutdown. So doesn't it make sense to enable these only when
we are debugging? We could argue that how will we know the issue could 
be related

to SMMU, but that's the case even now.

The reason why this came up was that, we had a NOC(interconnect) error 
which does
have a logging atleast in QCOM platforms from the secure side(it prints 
these on the console)
after the SMMU err messages and there was a confusion if it was related 
to these messages.
However, NOC error messages did identify the issue with the USB and it 
was solved later.
So these SMMU err/warning messages could be misleading like the above 
case almost everytime.


The probability of the issues you mentioned occuring is very less than 
the actual reboot,
shutdown scenarios, for ex: we run reboot stress test for thousands of 
times and these messages
don't add anything special in those cases when any issue occurs because 
they are seen

everytime.

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 09/10] iommu/ioasid: Support ioasid_set quota adjustment

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 28, 2020 7:31 AM
> 
> On Fri, 27 Mar 2020 10:09:04 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan 
> > > Sent: Thursday, March 26, 2020 1:56 AM
> > >
> > > IOASID set is allocated with an initial quota, at runtime there may
> > > be needs to balance IOASID resources among different VMs/sets.
> > >
> >
> > I may overlook previous patches but I didn't see any place setting the
> > initial quota...
> >
> Initial quota is in place when the ioasid_set is allocated.
> 
> > > This patch adds a new API to adjust per set quota.
> >
> > since this is purely an internal kernel API, implies that the
> > publisher (e.g. VFIO) is responsible for exposing its own uAPI to set
> > the quota?
> >
> yes, VFIO will do the adjustment. I think Alex suggested module
> parameters.

ok, I remember that.

> 
> > >
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/ioasid.c | 44
> > > 
> > >  include/linux/ioasid.h |  6 ++
> > >  2 files changed, 50 insertions(+)
> > >
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 27dce2cb5af2..5ac28862a1db 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -578,6 +578,50 @@ void ioasid_free_set(int sid, bool destroy_set)
> > >  }
> > >  EXPORT_SYMBOL_GPL(ioasid_free_set);
> > >
> > > +/**
> > > + * ioasid_adjust_set - Adjust the quota of an IOASID set
> > > + * @quota:   Quota allowed in this set
> > > + * @sid: IOASID set ID to be assigned
> > > + *
> > > + * Return 0 on success. If the new quota is smaller than the
> > > number of
> > > + * IOASIDs already allocated, -EINVAL will be returned. No change
> > > will be
> > > + * made to the existing quota.
> > > + */
> > > +int ioasid_adjust_set(int sid, int quota)
> > > +{
> > > + struct ioasid_set_data *sdata;
> > > + int ret = 0;
> > > +
> > > + mutex_lock(_allocator_lock);
> > > + sdata = xa_load(_sets, sid);
> > > + if (!sdata || sdata->nr_ioasids > quota) {
> > > + pr_err("Failed to adjust IOASID set %d quota %d\n",
> > > + sid, quota);
> > > + ret = -EINVAL;
> > > + goto done_unlock;
> > > + }
> > > +
> > > + if (quota >= ioasid_capacity_avail) {
> > > + ret = -ENOSPC;
> > > + goto done_unlock;
> > > + }
> > > +
> > > + /* Return the delta back to system pool */
> > > + ioasid_capacity_avail += sdata->size - quota;
> > > +
> > > + /*
> > > +  * May have a policy to prevent giving all available
> > > IOASIDs
> > > +  * to one set. But we don't enforce here, it should be in
> > > the
> > > +  * upper layers.
> > > +  */
> > > + sdata->size = quota;
> > > +
> > > +done_unlock:
> > > + mutex_unlock(_allocator_lock);
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_adjust_set);
> > >
> > >  /**
> > >   * ioasid_find - Find IOASID data
> > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > index 32d032913828..6e7de6fb91bf 100644
> > > --- a/include/linux/ioasid.h
> > > +++ b/include/linux/ioasid.h
> > > @@ -73,6 +73,7 @@ int ioasid_alloc_set(struct ioasid_set *token,
> > > ioasid_t quota, int *sid);
> > >  void ioasid_free_set(int sid, bool destroy_set);
> > >  int ioasid_find_sid(ioasid_t ioasid);
> > >  int ioasid_notify(ioasid_t id, enum ioasid_notify_val cmd);
> > > +int ioasid_adjust_set(int sid, int quota);
> > >
> > >  #else /* !CONFIG_IOASID */
> > >  static inline ioasid_t ioasid_alloc(int sid, ioasid_t min,
> > > @@ -136,5 +137,10 @@ static inline int ioasid_alloc_system_set(int
> > > quota) return -ENOTSUPP;
> > >  }
> > >
> > > +static inline int ioasid_adjust_set(int sid, int quota)
> > > +{
> > > + return -ENOTSUPP;
> > > +}
> > > +
> > >  #endif /* CONFIG_IOASID */
> > >  #endif /* __LINUX_IOASID_H */
> > > --
> > > 2.7.4
> >
> 
> [Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 08/10] iommu/ioasid: Introduce notifier APIs

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 28, 2020 2:37 AM
> 
> On Fri, 27 Mar 2020 10:03:26 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan 
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > IOASID users fit into the publisher-subscriber pattern, a system
> > > wide blocking notifier chain can be used to inform subscribers of
> > > state changes. Notifier mechanism also abstracts publisher from
> > > knowing the private context each subcriber may have.
> > >
> > > This patch adds APIs and a global notifier chain, a further
> > > optimization might be per set notifier for ioasid_set aware users.
> > >
> > > Usage example:
> > > KVM register notifier block such that it can keep its guest-host
> > > PASID translation table in sync with any IOASID updates.
> > >
> > > VFIO publish IOASID change by performing alloc/free, bind/unbind
> > > operations.
> > >
> > > IOMMU driver gets notified when IOASID is freed by VFIO or core mm
> > > code such that PASID context can be cleaned up.
> >
> > above example looks mixed. You have KVM registers the notifier but
> > finally having IOMMU driver to get notified... 
> >
> Right, felt like a tale of two subscribers got mixed. I meant to list a
> few use cases with publisher and subscriber roles separate.
> I will change that to "Usage examples", and explicit state each role.
> 
> > >
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/ioasid.c | 61
> > > ++
> > >  include/linux/ioasid.h | 40 +
> > >  2 files changed, 101 insertions(+)
> > >
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 8612fe6477dc..27dce2cb5af2 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -11,6 +11,22 @@
> > >  #include 
> > >
> > >  static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > > +/*
> > > + * An IOASID could have multiple consumers. When a status change
> > > occurs,
> > > + * this notifier chain is used to keep them in sync. Each consumer
> > > of the
> > > + * IOASID service must register notifier block early to ensure no
> > > events
> > > + * are missed.
> > > + *
> > > + * This is a publisher-subscriber pattern where publisher can
> > > change the
> > > + * state of each IOASID, e.g. alloc/free, bind IOASID to a device
> > > and mm.
> > > + * On the other hand, subscribers gets notified for the state
> > > change and
> > > + * keep local states in sync.
> > > + *
> > > + * Currently, the notifier is global. A further optimization could
> > > be per
> > > + * IOASID set notifier chain.
> > > + */
> > > +static BLOCKING_NOTIFIER_HEAD(ioasid_chain);
> > > +
> > >  /**
> > >   * struct ioasid_set_data - Meta data about ioasid_set
> > >   *
> > > @@ -408,6 +424,7 @@ static void ioasid_free_locked(ioasid_t ioasid)
> > >  {
> > >   struct ioasid_data *ioasid_data;
> > >   struct ioasid_set_data *sdata;
> > > + struct ioasid_nb_args args;
> > >
> > >   ioasid_data = xa_load(_allocator->xa, ioasid);
> > >   if (!ioasid_data) {
> > > @@ -415,6 +432,13 @@ static void ioasid_free_locked(ioasid_t ioasid)
> > >   return;
> > >   }
> > >
> > > + args.id = ioasid;
> > > + args.sid = ioasid_data->sdata->sid;
> > > + args.pdata = ioasid_data->private;
> > > + args.set_token = ioasid_data->sdata->token;
> > > +
> > > + /* Notify all users that this IOASID is being freed */
> > > + blocking_notifier_call_chain(_chain, IOASID_FREE,
> > > ); active_allocator->ops->free(ioasid,
> > > active_allocator->ops->pdata); /* Custom allocator needs additional
> > > steps to free the xa element */ if (active_allocator->flags &
> > > IOASID_ALLOCATOR_CUSTOM) { @@ -624,6 +648,43 @@ int
> > > ioasid_find_sid(ioasid_t ioasid) }
> > >  EXPORT_SYMBOL_GPL(ioasid_find_sid);
> > >
> > > +int ioasid_add_notifier(struct notifier_block *nb)
> > > +{
> > > + return blocking_notifier_chain_register(_chain, nb);
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_add_notifier);
> > > +
> > > +void ioasid_remove_notifier(struct notifier_block *nb)
> > > +{
> > > + blocking_notifier_chain_unregister(_chain, nb);
> > > +}
> > > +EXPORT_SYMBOL_GPL(ioasid_remove_notifier);
> >
> > register/unregister
> >
> Sounds good.
> 
> > > +
> > > +int ioasid_notify(ioasid_t ioasid, enum ioasid_notify_val cmd)
> >
> > add a comment on when this function should be used?
> >
> Sure, how about:
> /**
>  * ioasid_notify - Send notification on a given IOASID for status change.
>  * Used by publishers when the status change may affect
>  * subscriber's internal state.
>  *
>  * @ioasid:   The IOASID to which the notification will send
>  * @cmd:  The notification event
>  *
>  */

looks good.

> 
> > > +{
> > > + struct ioasid_data *ioasid_data;
> > > + struct ioasid_nb_args args;
> > > + int ret = 0;
> > > +
> > > + mutex_lock(_allocator_lock);
> > > + ioasid_data = xa_load(_allocator->xa, ioasid);
> > > + if (!ioasid_data) {

RE: [PATCH 06/10] iommu/ioasid: Convert to set aware allocations

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 28, 2020 1:42 AM
> 
> On Fri, 27 Mar 2020 09:54:11 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan 
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > The current ioasid_alloc function takes a token/ioasid_set then
> > > record it on the IOASID being allocated. There is no alloc/free on
> > > the ioasid_set.
> > >
> > > With the IOASID set APIs, callers must allocate an ioasid_set before
> > > allocate IOASIDs within the set. Quota and other ioasid_set level
> > > activities can then be enforced.
> > >
> > > This patch converts existing API to the new ioasid_set model.
> > >
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/intel-iommu.c | 10 +++---
> > >  drivers/iommu/intel-svm.c   | 10 +++---
> > >  drivers/iommu/ioasid.c  | 78
> > > +--- -
> > >  include/linux/ioasid.h  | 11 +++
> > >  4 files changed, 72 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index af7a1ef7b31e..c571cc8d9e57
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -3323,11 +3323,11 @@ static void intel_ioasid_free(ioasid_t
> > > ioasid, void *data)
> > >   if (!iommu)
> > >   return;
> > >   /*
> > > -  * Sanity check the ioasid owner is done at upper layer,
> > > e.g. VFIO
> > > -  * We can only free the PASID when all the devices are
> > > unbound.
> > > +  * In the guest, all IOASIDs belong to the system_ioasid
> > > set.
> > > +  * Sanity check against the system set.
> >
> > below code has nothing to deal with guest, then why putting the
> > comment specifically for guest?
> >
> intel_ioasid_alloc/free() is the custom IOASID allocator only
> registered when running in the guest.

in that case may be rename the functions to intel_guest_ioasid_alloc/free
would avoid similar confusion as I had?

> 
> The custom allocator calls virtual command. Since we don't support
> nested guest, all IOASIDs belong to the system ioasid_set.

could you put no support of nested guest in the comment, so later
when people want to add nested support they will know some
additional work required here?

> 
> > >*/
> > > - if (ioasid_find(NULL, ioasid, NULL)) {
> > > - pr_alert("Cannot free active IOASID %d\n", ioasid);
> > > + if (IS_ERR(ioasid_find(system_ioasid_sid, ioasid, NULL))) {
> > > + pr_err("Cannot free IOASID %d, not in system
> > > set\n", ioasid); return;
> > >   }
> > >   vcmd_free_pasid(iommu, ioasid);
> > > @@ -5541,7 +5541,7 @@ static int aux_domain_add_dev(struct
> > > dmar_domain *domain,
> > >   int pasid;
> > >
> > >   /* No private data needed for the default pasid */
> > > - pasid = ioasid_alloc(NULL, PASID_MIN,
> > > + pasid = ioasid_alloc(system_ioasid_sid, PASID_MIN,
> > >pci_max_pasids(to_pci_dev(dev))
> > > - 1, NULL);
> > >   if (pasid == INVALID_IOASID) {
> > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > > index 1991587fd3fd..f511855d187b 100644
> > > --- a/drivers/iommu/intel-svm.c
> > > +++ b/drivers/iommu/intel-svm.c
> > > @@ -268,7 +268,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> > > *domain,
> > >   }
> > >
> > >   mutex_lock(_mutex);
> > > - svm = ioasid_find(NULL, data->hpasid, NULL);
> > > + svm = ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);
> > >   if (IS_ERR(svm)) {
> > >   ret = PTR_ERR(svm);
> > >   goto out;
> > > @@ -401,7 +401,7 @@ int intel_svm_unbind_gpasid(struct device *dev,
> > > int pasid)
> > >   return -EINVAL;
> > >
> > >   mutex_lock(_mutex);
> > > - svm = ioasid_find(NULL, pasid, NULL);
> > > + svm = ioasid_find(INVALID_IOASID_SET, pasid, NULL);
> > >   if (!svm) {
> > >   ret = -EINVAL;
> > >   goto out;
> > > @@ -559,7 +559,7 @@ static int intel_svm_bind_mm(struct device
> > > *dev, int flags, struct svm_dev_ops *
> > >   pasid_max = intel_pasid_max_id;
> > >
> > >   /* Do not use PASID 0, reserved for RID to PASID */
> > > - svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> > > + svm->pasid = ioasid_alloc(system_ioasid_sid,
> > > PASID_MIN, pasid_max - 1, svm);
> > >   if (svm->pasid == INVALID_IOASID) {
> > >   kfree(svm);
> > > @@ -642,7 +642,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> > > pasid)
> > >   if (!iommu)
> > >   goto out;
> > >
> > > - svm = ioasid_find(NULL, pasid, NULL);
> > > + svm = ioasid_find(system_ioasid_sid, pasid, NULL);
> > >   if (!svm)
> > >   goto out;
> > >
> > > @@ -778,7 +778,7 @@ static irqreturn_t prq_event_thread(int irq,
> > > void *d)
> > >
> > >   if (!svm || svm->pasid != req->pasid) {
> > >   rcu_read_lock();
> > > - svm = ioasid_find(NULL, req->pasid, NULL);
> > > + svm 

RE: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 28, 2020 1:29 AM
> 
> On Fri, 27 Mar 2020 09:41:55 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan 
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > Bare metal SVA allocates IOASIDs for native process addresses. This
> > > should be separated from VM allocated IOASIDs thus under its own
> > > set.
> >
> > A curious question. Now bare metal SVA uses the system set and guest
> > SVA uses dynamically-created set. Then do we still allow ioasid_alloc
> > with a NULL set pointer?
> >
> Good point! There shouldn't be NULL set. That was one of the sticky
> point in the previous allocation API. I will add a check in
> ioasid_alloc_set().
> 
> However, there is still need for global search, e.g. PRS.
> https://lore.kernel.org/linux-arm-kernel/1d62b2e1-fe8c-066d-34e0-
> f7929f6a7...@arm.com/#t
> 
> In that case, use INVALID_IOASID_SET to indicate the search is global.
> e.g.
> ioasid_find(INVALID_IOASID_SET, data->hpasid, NULL);

ok, it makes sense.

> 
> > >
> > > This patch creates a system IOASID set with its quota set to
> > > PID_MAX. This is a reasonable default in that SVM capable devices
> > > can only bind to limited user processes.
> > >
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/intel-iommu.c | 8 +++-
> > >  drivers/iommu/ioasid.c  | 9 +
> > >  include/linux/ioasid.h  | 9 +
> > >  3 files changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > >   goto free_iommu;
> > >
> > >   /* PASID is needed for scalable mode irrespective to SVM */
> > > - if (intel_iommu_sm)
> > > + if (intel_iommu_sm) {
> > >   ioasid_install_capacity(intel_pasid_max_id);
> > > + /* We should not run out of IOASIDs at boot */
> > > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > > + pr_err("Failed to enable host PASID
> > > allocator\n");
> > > + intel_iommu_sm = 0;
> > > + }
> > > + }
> > >
> > >   /*
> > >* for each drhd
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 6265d2dbbced..9135af171a7c 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -39,6 +39,9 @@ struct ioasid_data {
> > >  static ioasid_t ioasid_capacity;
> > >  static ioasid_t ioasid_capacity_avail;
> > >
> > > +int system_ioasid_sid;
> > > +static DECLARE_IOASID_SET(system_ioasid);
> > > +
> > >  /* System capacity can only be set once */
> > >  void ioasid_install_capacity(ioasid_t total)
> > >  {
> > > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > >  }
> > >  EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > >
> > > +int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return ioasid_alloc_set(_ioasid, quota,
> > > _ioasid_sid); +}
> > > +EXPORT_SYMBOL_GPL(ioasid_alloc_system_set);
> > > +
> > >  /*
> > >   * struct ioasid_allocator_data - Internal data structure to hold
> > > information
> > >   * about an allocator. There are two types of allocators:
> > > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > > index 8c82d2625671..097b1cc043a3 100644
> > > --- a/include/linux/ioasid.h
> > > +++ b/include/linux/ioasid.h
> > > @@ -29,6 +29,9 @@ struct ioasid_allocator_ops {
> > >   void *pdata;
> > >  };
> > >
> > > +/* Shared IOASID set for reserved for host system use */
> > > +extern int system_ioasid_sid;
> > > +
> > >  #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
> > >
> > >  #if IS_ENABLED(CONFIG_IOASID)
> > > @@ -41,6 +44,7 @@ int ioasid_register_allocator(struct
> > > ioasid_allocator_ops *allocator);
> > >  void ioasid_unregister_allocator(struct ioasid_allocator_ops
> > > *allocator); int ioasid_attach_data(ioasid_t ioasid, void *data);
> > >  void ioasid_install_capacity(ioasid_t total);
> > > +int ioasid_alloc_system_set(int quota);
> > >  int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > > *sid); void ioasid_free_set(int sid, bool destroy_set);
> > >  int ioasid_find_sid(ioasid_t ioasid);
> > > @@ -88,5 +92,10 @@ static inline void
> > > ioasid_install_capacity(ioasid_t total) {
> > >  }
> > >
> > > +static inline int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return -ENOTSUPP;
> > > +}
> > > +
> > >  #endif /* CONFIG_IOASID */
> > >  #endif /* __LINUX_IOASID_H */
> > > --
> > > 2.7.4
> >
> 
> [Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

2020-03-28 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Saturday, March 28, 2020 12:59 AM
> 
> On Fri, 27 Mar 2020 08:38:44 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jacob Pan 
> > > Sent: Thursday, March 26, 2020 1:55 AM
> > >
> > > IOASID set defines a group of IDs that share the same token. The
> > > ioasid_set concept helps to do permission checking among users as
> > > in the current code.
> > >
> > > With guest SVA usage, each VM has its own IOASID set. More
> > > functionalities are needed:
> > > 1. Enforce quota, each guest may be assigned limited quota such
> > > that one guest cannot abuse all the system resource.
> > > 2. Stores IOASID mapping between guest and host IOASIDs
> > > 3. Per set operations, e.g. free the entire set
> > >
> > > For each ioasid_set token, a unique set ID is assigned. This makes
> > > reference of the set and data lookup much easier to implement.
> > >
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/ioasid.c | 147
> > > +
> > >  include/linux/ioasid.h |  13 +
> > >  2 files changed, 160 insertions(+)
> > >
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 4026e52855b9..27ee57f7079b 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -10,6 +10,25 @@
> > >  #include 
> > >  #include 
> > >
> > > +static DEFINE_XARRAY_ALLOC(ioasid_sets);
> > > +/**
> > > + * struct ioasid_set_data - Meta data about ioasid_set
> > > + *
> > > + * @token:   Unique to identify an IOASID set
> > > + * @xa:  XArray to store subset ID and IOASID
> > > mapping
> >
> > what is a subset? is it a different thing from set?
> >
> Subset is a set, but a subset ID is an ID only valid within the set.
> When we have non-identity Guest-Host PASID mapping, Subset ID is
> the Guest PASID but in more general terms. Or call it "Set Private ID"
> 
> This can be confusing, perhaps I rephrase it as:
> "XArray to store ioasid_set private ID to system-wide IOASID mapping"
> 
> 
> > > + * @size:Max number of IOASIDs can be allocated within the
> > > set
> >
> > 'size' reads more like 'current size' instead of 'max size'. maybe
> > call it 'max_ioasids' to align with 'nr_ioasids'? or simplify both as
> > 'max' and 'nr'?
> >
> Right, how about max_id and nr_id?

sounds good.

> 
> > > + * @nr_ioasids   Number of IOASIDs allocated in the set
> > > + * @sid  ID of the set
> > > + */
> > > +struct ioasid_set_data {
> > > + struct ioasid_set *token;
> > > + struct xarray xa;
> > > + int size;
> > > + int nr_ioasids;
> > > + int sid;
> > > + struct rcu_head rcu;
> > > +};
> > > +
> > >  struct ioasid_data {
> > >   ioasid_t id;
> > >   struct ioasid_set *set;
> > > @@ -388,6 +407,111 @@ void ioasid_free(ioasid_t ioasid)
> > >  EXPORT_SYMBOL_GPL(ioasid_free);
> > >
> > >  /**
> > > + * ioasid_alloc_set - Allocate a set of IOASIDs
> >
> > 'a set of IOASIDS' sounds like 'many IOASIDs'. Just saying 'allocate
> > an IOASID set' is more clear. 
> >
> Make sense
> 
> > > + * @token:   Unique token of the IOASID set
> > > + * @quota:   Quota allowed in this set
> > > + * @sid: IOASID set ID to be assigned
> > > + *
> > > + * Return 0 upon success. Token will be stored internally for
> > > lookup,
> > > + * IOASID allocation within the set and other per set operations
> > > will use
> > > + * the @sid assigned.
> > > + *
> > > + */
> > > +int ioasid_alloc_set(struct ioasid_set *token, ioasid_t quota, int
> > > *sid) +{
> > > + struct ioasid_set_data *sdata;
> > > + ioasid_t id;
> > > + int ret = 0;
> > > +
> > > + if (quota > ioasid_capacity_avail) {
> > > + pr_warn("Out of IOASID capacity! ask %d, avail
> > > %d\n",
> > > + quota, ioasid_capacity_avail);
> > > + return -ENOSPC;
> > > + }
> > > +
> > > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > > + if (!sdata)
> > > + return -ENOMEM;
> > > +
> > > + spin_lock(_allocator_lock);
> > > +
> > > + ret = xa_alloc(_sets, , sdata,
> > > +XA_LIMIT(0, ioasid_capacity_avail - quota),
> > > +GFP_KERNEL);
> >
> > Interestingly I didn't find the definition of ioasid_sets. and it is
> > not in existing file.
> >
> It is at the beginning of this file
> +static DEFINE_XARRAY_ALLOC(ioasid_sets);

How did I overlook it after several checks... 

> 
> > I'm not sure how many sets can be created, but anyway the set
> > namespace is different from ioasid name space. Then why do we
> > use ioasid capability as the limitation for allocating set id here?
> >
> I am assuming the worst case scenario which is one IOASID per set, that
> is why the number of sets are limited by the number of system IOASIDs.

I feel using a static max is simpler and clearer here. Anyway the set id
is never used on hardware so it is not necessary to tie it with dynamic
IOAPIC numbers. 

> 
> > > + if (ret) {
> > > + kfree(sdata);
> > > + goto error;
> > > + }
> > >