Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function

2019-05-21 Thread Jacob Pan
On Tue, 21 May 2019 17:09:40 +0100
Jean-Philippe Brucker  wrote:

> On 20/05/2019 20:22, Jacob Pan wrote:
> > On Thu, 16 May 2019 09:14:29 -0700
> > Jacob Pan  wrote:
> >   
> >> On Thu, 16 May 2019 15:14:40 +0100
> >> Jean-Philippe Brucker  wrote:
> >>  
> >>> Hi Jacob,
> >>>
> >>> On 03/05/2019 23:32, Jacob Pan wrote:
>  +/**
>  + * struct gpasid_bind_data - Information about device and guest
>  PASID binding
>  + * @gcr3:   Guest CR3 value from guest mm
>  + * @pasid:  Process address space ID used for the guest mm
>  + * @addr_width: Guest address width. Paging mode can also
>  be derived.
>  + */
>  +struct gpasid_bind_data {
>  +__u64 gcr3;
>  +__u32 pasid;
>  +__u32 addr_width;
>  +__u32 flags;
>  +#define IOMMU_SVA_GPASID_SREBIT(0) /* supervisor
>  request */
>  +__u8 padding[4];
>  +};  
> >>>
> >>> Could you wrap this structure into a generic one like we now do
> >>> for bind_pasid_table? It would make the API easier to extend,
> >>> because if we ever add individual PASID bind on Arm (something
> >>> I'd like to do for virtio-iommu, eventually) it will have
> >>> different parameters, as our PASID table entry has a lot of
> >>> fields describing the page table format.
> >>>
> >>> Maybe something like the following would do?
> >>>
> >>> struct gpasid_bind_data {
> >>> #define IOMMU_GPASID_BIND_VERSION_1 1
> >>>   __u32 version;
> >>> #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD1
> >>>   __u32 format;
> >>>   union {
> >>>   // the current gpasid_bind_data:
> >>>   struct gpasid_bind_intel_vtd vtd;
> >>>   };
> >>> };
> >>> 
> > 
> > Could you review the struct below? I am trying to extract the
> > common fileds as much as possible. Didn't do exactly as you
> > suggested to keep vendor specific data in separate struct under the
> > same union.  
> 
> Thanks, it looks good and I think we can reuse it for SMMUv2 and v3.
> Some comments below.
> 
> > 
> > Also, can you review the v3 ioasid allocator common code patches? I
> > am hoping we can get the common code in v5.3 so that we can focus
> > on the vendor specific part. The common code should include
> > bind_guest_pasid and ioasid allocator.
> > https://lkml.org/lkml/2019/5/3/787
> > https://lkml.org/lkml/2019/5/3/780
> > 
> > Thanks,
> > 
> > Jacob
> > 
> > 
> > /**
> >  * struct gpasid_bind_data_vtd - Intel VT-d specific data on device
> > and guest
> >  * SVA binding.
> >  *
> >  * @flags:  VT-d PASID table entry attributes
> >  * @pat:Page attribute table data to compute effective
> > memory type
> >  * @emt:Extended memory type
> >  *
> >  * Only guest vIOMMU selectable and effective options are passed
> > down to
> >  * the host IOMMU.
> >  */
> > struct gpasid_bind_data_vtd {
> > #define IOMMU_SVA_VTD_GPASID_SREBIT(0) /* supervisor
> > request */ #define  IOMMU_SVA_VTD_GPASID_EAFE
> > BIT(1) /* extended access enable */ #define
> > IOMMU_SVA_VTD_GPASID_PCDBIT(2) /* page-level cache disable
> > */ #define  IOMMU_SVA_VTD_GPASID_PWTBIT(3) /*
> > page-level write through */ #define
> > IOMMU_SVA_VTD_GPASID_EMTE   BIT(4) /* extended memory type
> > enable */ #define   IOMMU_SVA_VTD_GPASID_CD
> > BIT(5) /* PASID-level cache disable */  
> 
> It doesn't seem like the BIT() macro is exported to userspace, so we
> can't use it here
> 
good point, will avoid BIT()
> > __u64 flags;
> > __u32 pat;
> > __u32 emt;
> > };
> > 
> > /**
> >  * struct gpasid_bind_data - Information about device and guest
> > PASID binding
> >  * @version:Version of this data structure
> >  * @format: PASID table entry format
> >  * @flags:  Additional information on guest bind request
> >  * @gpgd:   Guest page directory base of the guest mm to bind
> >  * @hpasid: Process address space ID used for the guest mm
> > in host IOMMU
> >  * @gpasid: Process address space ID used for the guest mm
> > in guest IOMMU  
> 
> Trying to understand the full flow:
> * @gpasid is the one allocated by the guest using a virtual command.
> The guest writes @gpgd into the virtual PASID table at index @gpasid,
> then sends an invalidate command to QEMU.
yes
> * QEMU issues a gpasid_bind ioctl (on the mdev or its container?).
> VFIO forwards. The IOMMU driver installs @gpgd into the PASID table
> using @hpasid, which is associated with the auxiliary domain.
> 
> But why do we need the @hpasid field here? Does userspace know about
> it at all, and does VFIO need to pass it to the IOMMU driver?
> 
We need to support two guest-host PASID mappings through this API. Idea
comes from Kevin & Yi.
1. identity mapping between host and guest PASID
2. guest owns its own pasid space

For option 1, which will plan to support first in this series. There is
no need for gpasid field since gpasid=hpasid. Guest allocates PASID
using virtual command interface which gets a host PASID. Then PASID
cache 

Re: [PATCH v2 01/15] iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains

2019-05-21 Thread Jordan Crouse
On Tue, May 21, 2019 at 06:43:34PM +0100, Robin Murphy wrote:
> On 21/05/2019 17:13, Jordan Crouse wrote:
> >Allow IOMMU enabled devices specified on an opt-in list to create a
> >default identity domain for a new IOMMU group and bypass the DMA
> >domain created by the IOMMU core. This allows the group to be properly
> >set up but otherwise skips touching the hardware until the client
> >device attaches a unmanaged domain of its own.
> 
> All the cool kids are using iommu_request_dm_for_dev() to force an identity
> domain for particular devices, won't that suffice for this case too? There
> is definite scope for improvement in this area, so I'd really like to keep
> things as consistent as possible to make that easier in future.

I initially rejected iommu_request_dm_for_dev() since it still allowed the DMA
domain to consume the context bank but now that I look at it again as long as
the domain free returns the context bank to the pool it might work. Let me give
it a shot and see if it does what we need.

Jordan

> >Signed-off-by: Jordan Crouse 
> >---
> >
> >  drivers/iommu/arm-smmu.c | 42 ++
> >  drivers/iommu/iommu.c| 29 +++--
> >  include/linux/iommu.h|  3 +++
> >  3 files changed, 68 insertions(+), 6 deletions(-)
> >
> >diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> >index 5e54cc0..a795ada 100644
> >--- a/drivers/iommu/arm-smmu.c
> >+++ b/drivers/iommu/arm-smmu.c
> >@@ -1235,6 +1235,35 @@ static int arm_smmu_domain_add_master(struct 
> >arm_smmu_domain *smmu_domain,
> > return 0;
> >  }
> >+struct arm_smmu_client_match_data {
> >+bool use_identity_domain;
> >+};
> >+
> >+static const struct arm_smmu_client_match_data qcom_adreno = {
> >+.use_identity_domain = true,
> >+};
> >+
> >+static const struct arm_smmu_client_match_data qcom_mdss = {
> >+.use_identity_domain = true,
> >+};
> >+
> >+static const struct of_device_id arm_smmu_client_of_match[] = {
> >+{ .compatible = "qcom,adreno", .data = _adreno },
> >+{ .compatible = "qcom,mdp4", .data = _mdss },
> >+{ .compatible = "qcom,mdss", .data = _mdss },
> >+{ .compatible = "qcom,sdm845-mdss", .data = _mdss },
> >+{},
> >+};
> >+
> >+static const struct arm_smmu_client_match_data *
> >+arm_smmu_client_data(struct device *dev)
> >+{
> >+const struct of_device_id *match =
> >+of_match_device(arm_smmu_client_of_match, dev);
> >+
> >+return match ? match->data : NULL;
> >+}
> >+
> >  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device 
> > *dev)
> >  {
> > int ret;
> >@@ -1552,6 +1581,7 @@ static struct iommu_group 
> >*arm_smmu_device_group(struct device *dev)
> >  {
> > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
> >+const struct arm_smmu_client_match_data *client;
> > struct iommu_group *group = NULL;
> > int i, idx;
> >@@ -1573,6 +1603,18 @@ static struct iommu_group 
> >*arm_smmu_device_group(struct device *dev)
> > else
> > group = generic_device_group(dev);
> >+client = arm_smmu_client_data(dev);
> >+
> >+/*
> >+ * If the client chooses to bypass the dma domain, create a identity
> >+ * domain as a default placeholder. This will give the device a
> >+ * default domain but skip DMA operations and not consume a context
> >+ * bank
> >+ */
> >+if (client && client->no_dma_domain)
> >+iommu_group_set_default_domain(group, dev,
> >+IOMMU_DOMAIN_IDENTITY);
> >+
> > return group;
> >  }
> >diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >index 67ee662..af3e1ed 100644
> >--- a/drivers/iommu/iommu.c
> >+++ b/drivers/iommu/iommu.c
> >@@ -1062,6 +1062,24 @@ struct iommu_group *fsl_mc_device_group(struct device 
> >*dev)
> > return group;
> >  }
> >+struct iommu_domain *iommu_group_set_default_domain(struct iommu_group 
> >*group,
> >+struct device *dev, unsigned int type)
> >+{
> >+struct iommu_domain *dom;
> >+
> >+dom = __iommu_domain_alloc(dev->bus, type);
> >+if (!dom)
> >+return NULL;
> >+
> >+/* FIXME: Error if the default domain is already set? */
> >+group->default_domain = dom;
> >+if (!group->domain)
> >+group->domain = dom;
> >+
> >+return dom;
> >+}
> >+EXPORT_SYMBOL_GPL(iommu_group_set_default_domain);
> >+
> >  /**
> >   * iommu_group_get_for_dev - Find or create the IOMMU group for a device
> >   * @dev: target device
> >@@ -1099,9 +1117,12 @@ struct iommu_group *iommu_group_get_for_dev(struct 
> >device *dev)
> > if (!group->default_domain) {
> > struct iommu_domain *dom;
> >-dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
> >+dom = iommu_group_set_default_domain(group, dev,
> >+iommu_def_domain_type);
> >+
> > if (!dom && 

Re: [PATCH v2 03/15] iommu/arm-smmu: Add split pagetable support for arm-smmu-v2

2019-05-21 Thread Robin Murphy

On 21/05/2019 17:13, Jordan Crouse wrote:

Add support for a split pagetable (TTBR0/TTBR1) scheme for arm-smmu-v2.
If split pagetables are enabled, create a pagetable for TTBR1 and set
up the sign extension bit so that all IOVAs with that bit set are mapped
and translated from the TTBR1 pagetable.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu-regs.h  |  19 +
  drivers/iommu/arm-smmu.c   | 179 ++---
  drivers/iommu/io-pgtable-arm.c |   3 +-
  3 files changed, 186 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index e9132a9..23f27c2 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -195,7 +195,26 @@ enum arm_smmu_s2cr_privcfg {
  #define RESUME_RETRY  (0 << 0)
  #define RESUME_TERMINATE  (1 << 0)
  
+#define TTBCR_EPD1			(1 << 23)

+#define TTBCR_T0SZ_SHIFT   0
+#define TTBCR_T1SZ_SHIFT   16
+#define TTBCR_IRGN1_SHIFT  24
+#define TTBCR_ORGN1_SHIFT  26
+#define TTBCR_RGN_WBWA 1
+#define TTBCR_SH1_SHIFT28
+#define TTBCR_SH_IS3
+
+#define TTBCR_TG1_16K  (1 << 30)
+#define TTBCR_TG1_4K   (2 << 30)
+#define TTBCR_TG1_64K  (3 << 30)
+
  #define TTBCR2_SEP_SHIFT  15
+#define TTBCR2_SEP_31  (0x0 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_35  (0x1 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_39  (0x2 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_41  (0x3 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_43  (0x4 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_47  (0x5 << TTBCR2_SEP_SHIFT)
  #define TTBCR2_SEP_UPSTREAM   (0x7 << TTBCR2_SEP_SHIFT)
  #define TTBCR2_AS (1 << 4)
  
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c

index a795ada..e09c0e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -152,6 +152,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   unsigned long   split_table_mask;
  };
  
  struct arm_smmu_master_cfg {

@@ -253,13 +254,14 @@ enum arm_smmu_domain_stage {
  
  struct arm_smmu_domain {

struct arm_smmu_device  *smmu;
-   struct io_pgtable_ops   *pgtbl_ops;
+   struct io_pgtable_ops   *pgtbl_ops[2];


This seems a bit off - surely the primary domain and aux domain only 
ever need one set of tables each, but either way there's definitely 
unnecessary redundancy in having four sets of io_pgtable_ops between them.



const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
+   u32 attributes;
struct iommu_domain domain;
  };
  
@@ -621,6 +623,85 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)

return IRQ_HANDLED;
  }
  
+/* Adjust the context bank settings to support TTBR1 */

+static void arm_smmu_init_ttbr1(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_cfg *cfg = _domain->cfg;
+   struct arm_smmu_cb *cb = _domain->smmu->cbs[cfg->cbndx];
+   int pgsize = 1 << __ffs(pgtbl_cfg->pgsize_bitmap);
+
+   /* Enable speculative walks through the TTBR1 */
+   cb->tcr[0] &= ~TTBCR_EPD1;
+
+   cb->tcr[0] |= TTBCR_SH_IS << TTBCR_SH1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_IRGN1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_ORGN1_SHIFT;
+
+   switch (pgsize) {
+   case SZ_4K:
+   cb->tcr[0] |= TTBCR_TG1_4K;
+   break;
+   case SZ_16K:
+   cb->tcr[0] |= TTBCR_TG1_16K;
+   break;
+   case SZ_64K:
+   cb->tcr[0] |= TTBCR_TG1_64K;
+   break;
+   }
+
+   /*
+* Outside of the special 49 bit UBS case that has a dedicated sign
+* extension bit, setting the SEP for any other va_size will force us to
+* shrink the size of the T0/T1 regions by one bit to accommodate the
+* SEP
+*/
+   if (smmu->va_size != 48) {
+   /* Replace the T0 size */
+   cb->tcr[0] &= ~(0x3f << TTBCR_T0SZ_SHIFT);
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T0SZ_SHIFT;
+   /* Set the T1 size */
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T1SZ_SHIFT;
+   } 

Re: [PATCH v2 01/15] iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains

2019-05-21 Thread Robin Murphy

On 21/05/2019 17:13, Jordan Crouse wrote:

Allow IOMMU enabled devices specified on an opt-in list to create a
default identity domain for a new IOMMU group and bypass the DMA
domain created by the IOMMU core. This allows the group to be properly
set up but otherwise skips touching the hardware until the client
device attaches a unmanaged domain of its own.


All the cool kids are using iommu_request_dm_for_dev() to force an 
identity domain for particular devices, won't that suffice for this case 
too? There is definite scope for improvement in this area, so I'd really 
like to keep things as consistent as possible to make that easier in future.


Robin.


Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu.c | 42 ++
  drivers/iommu/iommu.c| 29 +++--
  include/linux/iommu.h|  3 +++
  3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0..a795ada 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1235,6 +1235,35 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
return 0;
  }
  
+struct arm_smmu_client_match_data {

+   bool use_identity_domain;
+};
+
+static const struct arm_smmu_client_match_data qcom_adreno = {
+   .use_identity_domain = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+   .use_identity_domain = true,
+};
+
+static const struct of_device_id arm_smmu_client_of_match[] = {
+   { .compatible = "qcom,adreno", .data = _adreno },
+   { .compatible = "qcom,mdp4", .data = _mdss },
+   { .compatible = "qcom,mdss", .data = _mdss },
+   { .compatible = "qcom,sdm845-mdss", .data = _mdss },
+   {},
+};
+
+static const struct arm_smmu_client_match_data *
+arm_smmu_client_data(struct device *dev)
+{
+   const struct of_device_id *match =
+   of_match_device(arm_smmu_client_of_match, dev);
+
+   return match ? match->data : NULL;
+}
+
  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device 
*dev)
  {
int ret;
@@ -1552,6 +1581,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
  {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
+   const struct arm_smmu_client_match_data *client;
struct iommu_group *group = NULL;
int i, idx;
  
@@ -1573,6 +1603,18 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)

else
group = generic_device_group(dev);
  
+	client = arm_smmu_client_data(dev);

+
+   /*
+* If the client chooses to bypass the dma domain, create a identity
+* domain as a default placeholder. This will give the device a
+* default domain but skip DMA operations and not consume a context
+* bank
+*/
+   if (client && client->no_dma_domain)
+   iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_IDENTITY);
+
return group;
  }
  
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

index 67ee662..af3e1ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1062,6 +1062,24 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
return group;
  }
  
+struct iommu_domain *iommu_group_set_default_domain(struct iommu_group *group,

+   struct device *dev, unsigned int type)
+{
+   struct iommu_domain *dom;
+
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom)
+   return NULL;
+
+   /* FIXME: Error if the default domain is already set? */
+   group->default_domain = dom;
+   if (!group->domain)
+   group->domain = dom;
+
+   return dom;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_default_domain);
+
  /**
   * iommu_group_get_for_dev - Find or create the IOMMU group for a device
   * @dev: target device
@@ -1099,9 +1117,12 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (!group->default_domain) {
struct iommu_domain *dom;
  
-		dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);

+   dom = iommu_group_set_default_domain(group, dev,
+   iommu_def_domain_type);
+
if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   dom = iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
@@ -1109,10 +1130,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
}
   

Re: Device obligation to write into a DMA_FROM_DEVICE streaming DMA mapping

2019-05-21 Thread Robin Murphy

On 21/05/2019 18:14, Horia Geanta wrote:

Hi,

Is it mandatory for a device to write data in an area DMA mapped 
DMA_FROM_DEVICE?
Can't the device just "ignore" that mapping - i.e. not write anything - and
driver should expect original data to be found in that location (since it was
not touched / written to by the device)?
[Let's leave cache coherency aside, and consider "original data" to be in RAM.]

I am asking this since I am seeing what seems to be an inconsistent behavior /
semantics between cases when swiotlb bouncing is used and when it's not.

Specifically, the context is:
1. driver prepares a scatterlist with several entries and performs a
dma_map_sg() with direction FROM_DEVICE
2. device decides there's no need to write into the buffer pointed by first
scatterlist entry and skips it (writing into subsequent buffers)
3. driver is notified the device finished processing and dma unmaps the 
scatterlist

When swiotlb bounce is used, the buffer pointed to by first scatterlist entry is
corrupted. That's because swiotlb implementation expects the device to write
something into that buffer, however the device logic is "whatever was previously
in that buffer should be used" (2. above).

For FROM_DEVICE direction:
-swiotlb_tbl_map_single() does not copy data from original location to swiotlb
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
-swiotlb_tbl_unmap_single() copies data from swiotlb to original location
if (orig_addr != INVALID_PHYS_ADDR &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
and when device did not write anything (as in current situation), it overwrites
original data with zeros

In case swiotlb bounce is not used and device does not write into the
FROM_DEVICE streaming DMA maping, the original data is available.

Could you please clarify whether:
-I am missing something obvious OR
-the DMA API documentation should be updated - to mandate for device writes into
FROM_DEVICE mappings) OR
-the swiotlb implementation should be updated - to copy data from original
location irrespective of DMA mapping direction?


Hmm, that certainly feels like a bug in SWIOTLB - it seems reasonable in 
principle for a device to only partially update a mapped buffer before a 
sync/unmap, so I'd say it probably should be filling the bounce buffer 
with the original data at the start, regardless of direction.


Robin.


Device obligation to write into a DMA_FROM_DEVICE streaming DMA mapping

2019-05-21 Thread Horia Geanta
Hi,

Is it mandatory for a device to write data in an area DMA mapped 
DMA_FROM_DEVICE?
Can't the device just "ignore" that mapping - i.e. not write anything - and
driver should expect original data to be found in that location (since it was
not touched / written to by the device)?
[Let's leave cache coherency aside, and consider "original data" to be in RAM.]

I am asking this since I am seeing what seems to be an inconsistent behavior /
semantics between cases when swiotlb bouncing is used and when it's not.

Specifically, the context is:
1. driver prepares a scatterlist with several entries and performs a
dma_map_sg() with direction FROM_DEVICE
2. device decides there's no need to write into the buffer pointed by first
scatterlist entry and skips it (writing into subsequent buffers)
3. driver is notified the device finished processing and dma unmaps the 
scatterlist

When swiotlb bounce is used, the buffer pointed to by first scatterlist entry is
corrupted. That's because swiotlb implementation expects the device to write
something into that buffer, however the device logic is "whatever was previously
in that buffer should be used" (2. above).

For FROM_DEVICE direction:
-swiotlb_tbl_map_single() does not copy data from original location to swiotlb
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
-swiotlb_tbl_unmap_single() copies data from swiotlb to original location
if (orig_addr != INVALID_PHYS_ADDR &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
and when device did not write anything (as in current situation), it overwrites
original data with zeros

In case swiotlb bounce is not used and device does not write into the
FROM_DEVICE streaming DMA maping, the original data is available.

Could you please clarify whether:
-I am missing something obvious OR
-the DMA API documentation should be updated - to mandate for device writes into
FROM_DEVICE mappings) OR
-the swiotlb implementation should be updated - to copy data from original
location irrespective of DMA mapping direction?

Thanks,
Horia


Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator

2019-05-21 Thread Jacob Pan
On Tue, 21 May 2019 11:41:52 +0200
Auger Eric  wrote:

> Hi,
> 
> On 5/4/19 12:32 AM, Jacob Pan wrote:
> > From: Jean-Philippe Brucker 
> > 
> > Some devices might support multiple DMA address spaces, in
> > particular those that have the PCI PASID feature. PASID (Process
> > Address Space ID) allows to share process address spaces with
> > devices (SVA), partition a device into VM-assignable entities (VFIO
> > mdev) or simply provide multiple DMA address space to kernel
> > drivers. Add a global PASID allocator usable by different drivers
> > at the same time. Name it I/O ASID to avoid confusion with ASIDs
> > allocated by arch code, which are usually a separate ID space.
> > 
> > The IOASID space is global. Each device can have its own PASID
> > space, but by convention the IOMMU ended up having a global PASID
> > space, so that with SVA, each mm_struct is associated to a single
> > PASID.
> > 
> > The allocator is primarily used by IOMMU subsystem but in rare
> > occasions drivers would like to allocate PASIDs for devices that
> > aren't managed by an IOMMU, using the same ID space as IOMMU.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Jacob Pan 
> > Link: https://lkml.org/lkml/2019/4/26/462
> > ---
> >  drivers/iommu/Kconfig  |   6 +++
> >  drivers/iommu/Makefile |   1 +
> >  drivers/iommu/ioasid.c | 140
> > +
> > include/linux/ioasid.h |  67 +++ 4 files
> > changed, 214 insertions(+) create mode 100644 drivers/iommu/ioasid.c
> >  create mode 100644 include/linux/ioasid.h
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 6f07f3b..75e7f97 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -2,6 +2,12 @@
> >  config IOMMU_IOVA
> > tristate
> >  
> > +config IOASID
> > +   bool
> > +   help
> > + Enable the I/O Address Space ID allocator. A single ID
> > space shared
> > + between different users.
> > +
> >  # IOMMU_API always gets selected by whoever wants it.
> >  config IOMMU_API
> > bool
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 8c71a15..0efac6f 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOASID) += ioasid.o
> >  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >  obj-$(CONFIG_OF_IOMMU) += of_iommu.o
> >  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > new file mode 100644
> > index 000..99f5e0a
> > --- /dev/null
> > +++ b/drivers/iommu/ioasid.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * I/O Address Space ID allocator. There is one global IOASID
> > space, split into
> > + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> > allocate and
> > + * free IOASIDs with ioasid_alloc and ioasid_free.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +struct ioasid_data {
> > +   ioasid_t id;
> > +   struct ioasid_set *set;
> > +   void *private;
> > +   struct rcu_head rcu;
> > +};
> > +
> > +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> > +
> > +/**
> > + * ioasid_set_data - Set private data for an allocated ioasid
> > + * @ioasid: the ID to set data
> > + * @data:   the private data
> > + *
> > + * For IOASID that is already allocated, private data can be set
> > + * via this API. Future lookup can be done via ioasid_find.
> > + */
> > +int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +   struct ioasid_data *ioasid_data;
> > +   int ret = 0;
> > +
> > +   ioasid_data = xa_load(_xa, ioasid);
> > +   if (ioasid_data)
> > +   ioasid_data->private = data;
> > +   else
> > +   ret = -ENOENT;
> > +
> > +   /* getter may use the private data */
> > +   synchronize_rcu();
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_set_data);
> > +
> > +/**
> > + * ioasid_alloc - Allocate an IOASID
> > + * @set: the IOASID set
> > + * @min: the minimum ID (inclusive)
> > + * @max: the maximum ID (inclusive)
> > + * @private: data private to the caller
> > + *
> > + * Allocate an ID between @min and @max (or %0 and %INT_MAX).
> > Return the
> > + * allocated ID on success, or INVALID_IOASID on failure. The
> > @private pointer
> > + * is stored internally and can be retrieved with ioasid_find().
> > + */
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > + void *private)
> > +{
> > +   int id = INVALID_IOASID;
> > +   struct ioasid_data *data;
> > +
> > +   data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +   if (!data)
> > +   return INVALID_IOASID;
> > +
> > +   data->set = set;
> > +   data->private = private;
> > +
> > +   if 

Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator

2019-05-21 Thread Jacob Pan
On Tue, 21 May 2019 10:21:55 +0200
Auger Eric  wrote:

> Hi,
> 
> On 5/4/19 12:32 AM, Jacob Pan wrote:
> > From: Jean-Philippe Brucker 
> > 
> > Some devices might support multiple DMA address spaces, in
> > particular those that have the PCI PASID feature. PASID (Process
> > Address Space ID) allows to share process address spaces with
> > devices (SVA), partition a device into VM-assignable entities (VFIO
> > mdev) or simply provide multiple DMA address space to kernel
> > drivers. Add a global PASID allocator usable by different drivers
> > at the same time. Name it I/O ASID to avoid confusion with ASIDs
> > allocated by arch code, which are usually a separate ID space.
> > 
> > The IOASID space is global. Each device can have its own PASID
> > space, but by convention the IOMMU ended up having a global PASID
> > space, so that with SVA, each mm_struct is associated to a single
> > PASID.
> > 
> > The allocator is primarily used by IOMMU subsystem but in rare
> > occasions drivers would like to allocate PASIDs for devices that
> > aren't managed by an IOMMU, using the same ID space as IOMMU.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > Signed-off-by: Jacob Pan 
> > Link: https://lkml.org/lkml/2019/4/26/462
> > ---
> >  drivers/iommu/Kconfig  |   6 +++
> >  drivers/iommu/Makefile |   1 +
> >  drivers/iommu/ioasid.c | 140
> > +
> > include/linux/ioasid.h |  67 +++ 4 files
> > changed, 214 insertions(+) create mode 100644 drivers/iommu/ioasid.c
> >  create mode 100644 include/linux/ioasid.h
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 6f07f3b..75e7f97 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -2,6 +2,12 @@
> >  config IOMMU_IOVA
> > tristate
> >  
> > +config IOASID
> > +   bool  
> don't we want a tristate here too?
> 
> Also refering to the past discussions we could add "# The IOASID
> library may also be used by non-IOMMU_API users"
I agree. For device driver modules to use ioasid w/o iommu, this does
not have to be built-in.
Jean, would you agree?

> > +   help
> > + Enable the I/O Address Space ID allocator. A single ID
> > space shared
> > + between different users.
> > +
> >  # IOMMU_API always gets selected by whoever wants it.
> >  config IOMMU_API
> > bool
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 8c71a15..0efac6f 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOASID) += ioasid.o
> >  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >  obj-$(CONFIG_OF_IOMMU) += of_iommu.o
> >  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > new file mode 100644
> > index 000..99f5e0a
> > --- /dev/null
> > +++ b/drivers/iommu/ioasid.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * I/O Address Space ID allocator. There is one global IOASID
> > space, split into
> > + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> > allocate and
> > + * free IOASIDs with ioasid_alloc and ioasid_free.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +struct ioasid_data {
> > +   ioasid_t id;
> > +   struct ioasid_set *set;
> > +   void *private;
> > +   struct rcu_head rcu;
> > +};
> > +
> > +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> > +
> > +/**
> > + * ioasid_set_data - Set private data for an allocated ioasid
> > + * @ioasid: the ID to set data
> > + * @data:   the private data
> > + *
> > + * For IOASID that is already allocated, private data can be set
> > + * via this API. Future lookup can be done via ioasid_find.
> > + */
> > +int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +   struct ioasid_data *ioasid_data;
> > +   int ret = 0;
> > +
> > +   ioasid_data = xa_load(_xa, ioasid);
> > +   if (ioasid_data)
> > +   ioasid_data->private = data;
> > +   else
> > +   ret = -ENOENT;
> > +
> > +   /* getter may use the private data */
> > +   synchronize_rcu();
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_set_data);
> > +
> > +/**
> > + * ioasid_alloc - Allocate an IOASID
> > + * @set: the IOASID set
> > + * @min: the minimum ID (inclusive)
> > + * @max: the maximum ID (inclusive)
> > + * @private: data private to the caller
> > + *
> > + * Allocate an ID between @min and @max (or %0 and %INT_MAX).  
> I think we agreed to drop (or %0 and %INT_MAX)
>  Return the
sorry I don't recall but works for me.

> > + * allocated ID on success, or INVALID_IOASID on failure. The
> > @private pointer
> > + * is stored internally and can be retrieved with ioasid_find().
> > + */
> > +ioasid_t 

[PATCH v2 01/15] iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains

2019-05-21 Thread Jordan Crouse
Allow IOMMU enabled devices specified on an opt-in list to create a
default identity domain for a new IOMMU group and bypass the DMA
domain created by the IOMMU core. This allows the group to be properly
set up but otherwise skips touching the hardware until the client
device attaches a unmanaged domain of its own.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 42 ++
 drivers/iommu/iommu.c| 29 +++--
 include/linux/iommu.h|  3 +++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0..a795ada 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1235,6 +1235,35 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
return 0;
 }
 
+struct arm_smmu_client_match_data {
+   bool use_identity_domain;
+};
+
+static const struct arm_smmu_client_match_data qcom_adreno = {
+   .use_identity_domain = true,
+};
+
+static const struct arm_smmu_client_match_data qcom_mdss = {
+   .use_identity_domain = true,
+};
+
+static const struct of_device_id arm_smmu_client_of_match[] = {
+   { .compatible = "qcom,adreno", .data = _adreno },
+   { .compatible = "qcom,mdp4", .data = _mdss },
+   { .compatible = "qcom,mdss", .data = _mdss },
+   { .compatible = "qcom,sdm845-mdss", .data = _mdss },
+   {},
+};
+
+static const struct arm_smmu_client_match_data *
+arm_smmu_client_data(struct device *dev)
+{
+   const struct of_device_id *match =
+   of_match_device(arm_smmu_client_of_match, dev);
+
+   return match ? match->data : NULL;
+}
+
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
int ret;
@@ -1552,6 +1581,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
+   const struct arm_smmu_client_match_data *client;
struct iommu_group *group = NULL;
int i, idx;
 
@@ -1573,6 +1603,18 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
else
group = generic_device_group(dev);
 
+   client = arm_smmu_client_data(dev);
+
+   /*
+* If the client chooses to bypass the dma domain, create a identity
+* domain as a default placeholder. This will give the device a
+* default domain but skip DMA operations and not consume a context
+* bank
+*/
+   if (client && client->no_dma_domain)
+   iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_IDENTITY);
+
return group;
 }
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee662..af3e1ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1062,6 +1062,24 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
return group;
 }
 
+struct iommu_domain *iommu_group_set_default_domain(struct iommu_group *group,
+   struct device *dev, unsigned int type)
+{
+   struct iommu_domain *dom;
+
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom)
+   return NULL;
+
+   /* FIXME: Error if the default domain is already set? */
+   group->default_domain = dom;
+   if (!group->domain)
+   group->domain = dom;
+
+   return dom;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_default_domain);
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1099,9 +1117,12 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (!group->default_domain) {
struct iommu_domain *dom;
 
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
+   dom = iommu_group_set_default_domain(group, dev,
+   iommu_def_domain_type);
+
if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   dom = iommu_group_set_default_domain(group, dev,
+   IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU 
domain of type %u; falling back to IOMMU_DOMAIN_DMA",
@@ -1109,10 +1130,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
}
}
 
-   group->default_domain = dom;
-   if (!group->domain)
-   group->domain = dom;
-
if (dom && !iommu_dma_strict) {
int attr = 1;
iommu_domain_set_attr(dom,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 

[PATCH v2 05/15] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2019-05-21 Thread Jordan Crouse
Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
it is left as an exercise to the caller to actually use the pagetable;
aux domains in the IOMMU driver are only preoccupied with creating and
managing the pagetable memory.

Following is a pseudo code example of how a domain can be created

 /* Check to see if aux domains are supported */
 if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, );
 }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable. The driver/hardware is used
to switch the pagetable according to its own specific implementation.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 133 +--
 1 file changed, 117 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e09c0e6..27ff554 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -263,6 +263,8 @@ struct arm_smmu_domain {
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
u32 attributes;
struct iommu_domain domain;
+   boolis_aux;
+   u64 ttbr0;
 };
 
 struct arm_smmu_option_prop {
@@ -892,6 +894,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+   /* Aux domains can only be created for stage-1 tables */
+   if (smmu_domain->is_aux && smmu_domain->stage != ARM_SMMU_DOMAIN_S1) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
/*
 * Choosing a suitable context format is even more fiddly. Until we
 * grow some way for the caller to express a preference, and/or move
@@ -942,6 +950,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
ias = min(ias, 32UL);
oas = min(oas, 32UL);
}
+
smmu_domain->tlb_ops = _smmu_s1_tlb_ops;
break;
case ARM_SMMU_DOMAIN_NESTED:
@@ -961,6 +970,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
ias = min(ias, 40UL);
oas = min(oas, 40UL);
}
+
if (smmu->version == ARM_SMMU_V2)
smmu_domain->tlb_ops = _smmu_s2_tlb_ops_v2;
else
@@ -970,23 +980,30 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
ret = -EINVAL;
goto out_unlock;
}
-   ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
- smmu->num_context_banks);
-   if (ret < 0)
-   goto out_unlock;
 
-   cfg->cbndx = ret;
-   if (smmu->version < ARM_SMMU_V2) {
-   cfg->irptndx = atomic_inc_return(>irptndx);
-   cfg->irptndx %= smmu->num_context_irqs;
-   } else {
-   cfg->irptndx = cfg->cbndx;
-   }
+   /*
+* Aux domains will use the same context bank assigned to the master
+* domain for the device
+*/
+   if (!smmu_domain->is_aux) {
+   ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+ smmu->num_context_banks);
+   if (ret < 0)
+   goto out_unlock;
 
-   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
-   cfg->vmid = cfg->cbndx + 1 + smmu->cavium_id_base;
-   else
-   cfg->asid = cfg->cbndx + smmu->cavium_id_base;
+   cfg->cbndx = ret;
+   if (smmu->version < ARM_SMMU_V2) {
+   cfg->irptndx = atomic_inc_return(>irptndx);
+   cfg->irptndx %= smmu->num_context_irqs;
+   } else {
+   cfg->irptndx = cfg->cbndx;
+   }
+
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
+   cfg->vmid = cfg->cbndx + 1 + smmu->cavium_id_base;
+   else
+   cfg->asid = cfg->cbndx + smmu->cavium_id_base;
+   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -1009,11 +1026,21 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
goto out_clear_smmu;
}
 
+   /* Cache the TTBR0 for the aux domain */
+   smmu_domain->ttbr0 = 

[PATCH v2 04/15] iommu: Add DOMAIN_ATTR_PTBASE

2019-05-21 Thread Jordan Crouse
Add an attribute to return the base address of the pagetable. This is used
by auxiliary domains from arm-smmu to return the address of the pagetable
to the leaf driver so that it can set the appropriate pagetable through
it's own means.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 204acd8..2252edc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -129,6 +129,7 @@ enum iommu_attr {
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_SPLIT_TABLES,
+   DOMAIN_ATTR_PTBASE,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/15] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2019-05-21 Thread Jordan Crouse
Add a new domain attribute to enable split pagetable support for devices
devices that support it.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4ef8bd5..204acd8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -128,6 +128,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/15] iommu/arm-smmu: Add split pagetable support for arm-smmu-v2

2019-05-21 Thread Jordan Crouse
Add support for a split pagetable (TTBR0/TTBR1) scheme for arm-smmu-v2.
If split pagetables are enabled, create a pagetable for TTBR1 and set
up the sign extension bit so that all IOVAs with that bit set are mapped
and translated from the TTBR1 pagetable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-regs.h  |  19 +
 drivers/iommu/arm-smmu.c   | 179 ++---
 drivers/iommu/io-pgtable-arm.c |   3 +-
 3 files changed, 186 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index e9132a9..23f27c2 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -195,7 +195,26 @@ enum arm_smmu_s2cr_privcfg {
 #define RESUME_RETRY   (0 << 0)
 #define RESUME_TERMINATE   (1 << 0)
 
+#define TTBCR_EPD1 (1 << 23)
+#define TTBCR_T0SZ_SHIFT   0
+#define TTBCR_T1SZ_SHIFT   16
+#define TTBCR_IRGN1_SHIFT  24
+#define TTBCR_ORGN1_SHIFT  26
+#define TTBCR_RGN_WBWA 1
+#define TTBCR_SH1_SHIFT28
+#define TTBCR_SH_IS3
+
+#define TTBCR_TG1_16K  (1 << 30)
+#define TTBCR_TG1_4K   (2 << 30)
+#define TTBCR_TG1_64K  (3 << 30)
+
 #define TTBCR2_SEP_SHIFT   15
+#define TTBCR2_SEP_31  (0x0 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_35  (0x1 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_39  (0x2 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_41  (0x3 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_43  (0x4 << TTBCR2_SEP_SHIFT)
+#define TTBCR2_SEP_47  (0x5 << TTBCR2_SEP_SHIFT)
 #define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
 #define TTBCR2_AS  (1 << 4)
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a795ada..e09c0e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -152,6 +152,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   unsigned long   split_table_mask;
 };
 
 struct arm_smmu_master_cfg {
@@ -253,13 +254,14 @@ enum arm_smmu_domain_stage {
 
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
-   struct io_pgtable_ops   *pgtbl_ops;
+   struct io_pgtable_ops   *pgtbl_ops[2];
const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
+   u32 attributes;
struct iommu_domain domain;
 };
 
@@ -621,6 +623,85 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
*dev)
return IRQ_HANDLED;
 }
 
+/* Adjust the context bank settings to support TTBR1 */
+static void arm_smmu_init_ttbr1(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct arm_smmu_cfg *cfg = _domain->cfg;
+   struct arm_smmu_cb *cb = _domain->smmu->cbs[cfg->cbndx];
+   int pgsize = 1 << __ffs(pgtbl_cfg->pgsize_bitmap);
+
+   /* Enable speculative walks through the TTBR1 */
+   cb->tcr[0] &= ~TTBCR_EPD1;
+
+   cb->tcr[0] |= TTBCR_SH_IS << TTBCR_SH1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_IRGN1_SHIFT;
+   cb->tcr[0] |= TTBCR_RGN_WBWA << TTBCR_ORGN1_SHIFT;
+
+   switch (pgsize) {
+   case SZ_4K:
+   cb->tcr[0] |= TTBCR_TG1_4K;
+   break;
+   case SZ_16K:
+   cb->tcr[0] |= TTBCR_TG1_16K;
+   break;
+   case SZ_64K:
+   cb->tcr[0] |= TTBCR_TG1_64K;
+   break;
+   }
+
+   /*
+* Outside of the special 49 bit UBS case that has a dedicated sign
+* extension bit, setting the SEP for any other va_size will force us to
+* shrink the size of the T0/T1 regions by one bit to accommodate the
+* SEP
+*/
+   if (smmu->va_size != 48) {
+   /* Replace the T0 size */
+   cb->tcr[0] &= ~(0x3f << TTBCR_T0SZ_SHIFT);
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T0SZ_SHIFT;
+   /* Set the T1 size */
+   cb->tcr[0] |= (64ULL - smmu->va_size - 1) << TTBCR_T1SZ_SHIFT;
+   } else {
+   /* Set the T1 size to the full available UBS */
+   cb->tcr[0] |= (64ULL - smmu->va_size) << TTBCR_T1SZ_SHIFT;
+   }
+
+   /* Clear the existing SEP configuration */
+   cb->tcr[1] &= ~TTBCR2_SEP_UPSTREAM;
+
+   /* 

[PATCH v2 00/15] drm/msm: Per-instance pagetable support

2019-05-21 Thread Jordan Crouse
This is a refresh of the per-instance pagetable support for arm-smmu-v2 and the
MSM GPU driver. I think this is pretty mature at this point, so I've dropped the
RFC designation and ask for consideration for 5.3.

Per-instance pagetables allow the target GPU driver to create and manage
an individual pagetable for each file descriptor instance and switch
between them asynchronously using the GPU to reprogram the pagetable
registers on the fly.

Most of the heavy lifting for this is done in the arm-smmu-v2 driver by
taking advantage of the newly added multiple domain API. The first patch in the
series allows opted-in clients to create a default identity domain when the
IOMMU group for the SMMU device is created. This bypasses the DMA domain
creation in the IOMMU core which serves several purposes for the GPU by skipping
the otherwise  unused DMA domain and also keeping context bank 0 unused on the
hardware (for better or worse, the GPU is hardcoded to only use context bank 0
for switching).

The next two patches enable split pagetable support. This is used to map
global buffers for the GPU so we can safely switch the TTBR0 pagetable for the
instance.

The last two arm-smmu-v2 patches enable auxillary domain support. Again the
SMMU client can opt-in to allow auxiliary domains, and if enabled will create
a pagetable but not otherwise touch the hardware. The client can get the address
of the pagetable through an attribute to perform its own switching.

After the arm-smmu-v2 patches are more than several msm/gpu patches to allow
for target specific address spaces, enable 64 bit virtual addressing and
implement the mechanics of pagetable switching.

For the purposes of merging all the patches between

drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets

and

drm/msm: Add support to create target specific address spaces

can be merged to the msm-next tree without dependencies on the IOMMU changes.
Only the last three patches will require coordination between the two areas.

Jordan Crouse (15):
  iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add split pagetable support for arm-smmu-v2
  iommu: Add DOMAIN_ATTR_PTBASE
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets
  drm/msm: Print all 64 bits of the faulting IOMMU address
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Move address space setup to the GPU targets
  drm/msm: Add a helper function for a per-instance address space
  drm/msm/gpu: Add ttbr0 to the memptrs
  drm/msm: Add support to create target specific address spaces
  drm/msm: Add support for IOMMU auxiliary domains
  drm/msm/a6xx: Support per-instance pagetables
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c |  37 +++-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c |  50 +++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c |  51 +++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 163 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h |  19 ++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  70 --
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 166 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   7 -
 drivers/gpu/drm/msm/msm_drv.c |  25 ++-
 drivers/gpu/drm/msm/msm_drv.h |   5 +
 drivers/gpu/drm/msm/msm_gem.h |   2 +
 drivers/gpu/drm/msm/msm_gem_submit.c  |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c |  53 +++--
 drivers/gpu/drm/msm/msm_gpu.c |  59 +
 drivers/gpu/drm/msm/msm_gpu.h |   3 +
 drivers/gpu/drm/msm/msm_iommu.c   |  99 -
 drivers/gpu/drm/msm/msm_mmu.h |   4 +
 drivers/gpu/drm/msm/msm_ringbuffer.h  |   1 +
 drivers/iommu/arm-smmu-regs.h |  19 ++
 drivers/iommu/arm-smmu.c  | 352 +++---
 drivers/iommu/io-pgtable-arm.c|   3 +-
 drivers/iommu/iommu.c |  29 ++-
 include/linux/iommu.h |   5 +
 24 files changed, 1052 insertions(+), 184 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/6] dt-bindings: gpu: add bus clock for Mali Midgard GPUs

2019-05-21 Thread Clément Péron
From: Icenowy Zheng 

Some SoCs adds a bus clock gate to the Mali Midgard GPU.

Add the binding for the bus clock.

Signed-off-by: Icenowy Zheng 
Signed-off-by: Clément Péron 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt 
b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
index 1b1a74129141..2e8bbce35695 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
@@ -31,6 +31,12 @@ Optional properties:
 
 - clocks : Phandle to clock for the Mali Midgard device.
 
+- clock-names : Specify the names of the clocks specified in clocks
+  when multiple clocks are present.
+* core: clock driving the GPU itself (When only one clock is present,
+  assume it's this clock.)
+* bus: bus clock for the GPU
+
 - mali-supply : Phandle to regulator for the Mali device. Refer to
   Documentation/devicetree/bindings/regulator/regulator.txt for details.
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 5/6] arm64: dts: allwinner: Add ARM Mali GPU node for H6

2019-05-21 Thread Clément Péron
Add the mali gpu node to the H6 device-tree.

Signed-off-by: Clément Péron 
---
 arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi 
b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
index 16c5c3d0fd81..6aad06095c40 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
@@ -157,6 +157,20 @@
allwinner,sram = <_sram 1>;
};
 
+   gpu: gpu@180 {
+   compatible = "allwinner,sun50i-h6-mali",
+"arm,mali-t720";
+   reg = <0x0180 0x4000>;
+   interrupts = ,
+,
+;
+   interrupt-names = "job", "mmu", "gpu";
+   clocks = < CLK_GPU>, < CLK_BUS_GPU>;
+   clock-names = "core", "bus";
+   resets = < RST_BUS_GPU>;
+   status = "disabled";
+   };
+
syscon: syscon@300 {
compatible = "allwinner,sun50i-h6-system-control",
 "allwinner,sun50i-a64-system-control";
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 6/6] arm64: dts: allwinner: Add mali GPU supply for H6 boards

2019-05-21 Thread Clément Péron
Enable and add supply to the Mali GPU node on all the
H6 boards.

Regarding the datasheet the maximum time for supply to reach
its voltage is 32ms.

Signed-off-by: Clément Péron 
---
 arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts | 6 ++
 arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi-3.dts  | 6 ++
 arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi.dtsi   | 6 ++
 arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts| 6 ++
 4 files changed, 24 insertions(+)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts 
b/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts
index 0dc33c90dd60..fe36c6588d8e 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-beelink-gs1.dts
@@ -70,6 +70,11 @@
status = "okay";
 };
 
+ {
+   mali-supply = <_dcdcc>;
+   status = "okay";
+};
+
  {
status = "okay";
 };
@@ -206,6 +211,7 @@
};
 
reg_dcdcc: dcdcc {
+   regulator-enable-ramp-delay = <32000>;
regulator-min-microvolt = <81>;
regulator-max-microvolt = <108>;
regulator-name = "vdd-gpu";
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi-3.dts 
b/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi-3.dts
index 17d496990108..ea4866b0fa7a 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi-3.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi-3.dts
@@ -58,6 +58,11 @@
status = "okay";
 };
 
+ {
+   mali-supply = <_dcdcc>;
+   status = "okay";
+};
+
  {
vmmc-supply = <_cldo1>;
cd-gpios = < 5 6 GPIO_ACTIVE_LOW>; /* PF6 */
@@ -165,6 +170,7 @@
};
 
reg_dcdcc: dcdcc {
+   regulator-enable-ramp-delay = <32000>;
regulator-min-microvolt = <81>;
regulator-max-microvolt = <108>;
regulator-name = "vdd-gpu";
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi.dtsi 
b/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi.dtsi
index 62e27948a3fa..ec770f07aa82 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-orangepi.dtsi
@@ -55,6 +55,11 @@
status = "okay";
 };
 
+ {
+   mali-supply = <_dcdcc>;
+   status = "okay";
+};
+
  {
vmmc-supply = <_cldo1>;
cd-gpios = < 5 6 GPIO_ACTIVE_LOW>;
@@ -163,6 +168,7 @@
};
 
reg_dcdcc: dcdcc {
+   regulator-enable-ramp-delay = <32000>;
regulator-min-microvolt = <81>;
regulator-max-microvolt = <108>;
regulator-name = "vdd-gpu";
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts 
b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts
index 4802902e128f..625a29a25c52 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6-pine-h64.dts
@@ -85,6 +85,11 @@
status = "okay";
 };
 
+ {
+   mali-supply = <_dcdcc>;
+   status = "okay";
+};
+
  {
status = "okay";
 };
@@ -215,6 +220,7 @@
};
 
reg_dcdcc: dcdcc {
+   regulator-enable-ramp-delay = <32000>;
regulator-min-microvolt = <81>;
regulator-max-microvolt = <108>;
regulator-name = "vdd-gpu";
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 4/6] dt-bindings: gpu: mali-midgard: Add H6 mali gpu compatible

2019-05-21 Thread Clément Péron
This add the H6 mali compatible in the dt-bindings to later support
specific implementation.

Signed-off-by: Clément Péron 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/gpu/arm,mali-midgard.txt | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt 
b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
index 2e8bbce35695..4bf17e1cf555 100644
--- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
+++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt
@@ -15,6 +15,7 @@ Required properties:
 + "arm,mali-t860"
 + "arm,mali-t880"
   * which must be preceded by one of the following vendor specifics:
++ "allwinner,sun50i-h6-mali"
 + "amlogic,meson-gxm-mali"
 + "rockchip,rk3288-mali"
 + "rockchip,rk3399-mali"
@@ -49,9 +50,15 @@ Vendor-specific bindings
 
 
 The Mali GPU is integrated very differently from one SoC to
-another. In order to accomodate those differences, you have the option
+another. In order to accommodate those differences, you have the option
 to specify one more vendor-specific compatible, among:
 
+- "allwinner,sun50i-h6-mali"
+  Required properties:
+  - clocks : phandles to core and bus clocks
+  - clock-names : must contain "core" and "bus"
+  - resets: phandle to GPU reset line
+
 - "amlogic,meson-gxm-mali"
   Required properties:
   - resets : Should contain phandles of :
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 2/6] iommu: io-pgtable: fix sanity check for non 48-bit mali iommu

2019-05-21 Thread Clément Péron
Allwinner H6 SoC has a Mali T720MP2 with a 33-bit iommu which
trig the sanity check during the alloc of the pgtable.

Change the sanity check to allow non 48-bit configuration.

Suggested-by: Robin Murphy 
Signed-off-by: Clément Péron 
---
 drivers/iommu/io-pgtable-arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4e21efbc4459..74f2ce802e6f 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1016,7 +1016,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
 {
struct io_pgtable *iop;
 
-   if (cfg->ias != 48 || cfg->oas > 40)
+   if (cfg->ias > 48 || cfg->oas > 40)
return NULL;
 
cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-05-21 Thread Clément Péron
Hi,

The Allwinner H6 has a Mali-T720 MP2 which should be supported by
the new panfrost driver. This series fix two issues and introduce the
dt-bindings but a simple benchmark show that it's still NOT WORKING.

I'm pushing it in case someone want to continue the work.

This has been tested with Mesa3D 19.1.0-RC2 and a GPU bitness patch[1].

One patch is from Icenowy Zheng where I changed the order as required
by Rob Herring[2].

Thanks,
Clement

[1] https://gitlab.freedesktop.org/kszaq/mesa/tree/panfrost_64_32
[2] https://patchwork.kernel.org/patch/10699829/


[  345.204813] panfrost 180.gpu: mmu irq status=1
[  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
0x02400400
[  345.209617] Reason: TODO
[  345.209617] raw fault status: 0x82C1
[  345.209617] decoded fault status: SLAVE FAULT
[  345.209617] exception type 0xC1: TRANSLATION_FAULT_LEVEL1
[  345.209617] access type 0x2: READ
[  345.209617] source id 0x8000
[  345.729957] panfrost 180.gpu: gpu sched timeout, js=0,
status=0x8, head=0x2400400, tail=0x2400400, sched_job=9e204de9
[  346.055876] panfrost 180.gpu: mmu irq status=1
[  346.060680] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
0x02C00A00
[  346.060680] Reason: TODO
[  346.060680] raw fault status: 0x810002C1
[  346.060680] decoded fault status: SLAVE FAULT
[  346.060680] exception type 0xC1: TRANSLATION_FAULT_LEVEL1
[  346.060680] access type 0x2: READ
[  346.060680] source id 0x8100
[  346.561955] panfrost 180.gpu: gpu sched timeout, js=1,
status=0x8, head=0x2c00a00, tail=0x2c00a00, sched_job=b55a9a85
[  346.573913] panfrost 180.gpu: mmu irq status=1
[  346.578707] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
0x02C00B80

Change in v5:
 - Remove fix indent

Changes in v4:
 - Add bus_clock probe
 - Fix sanity check in io-pgtable
 - Add vramp-delay
 - Merge all boards into one patch
 - Remove upstreamed Neil A. patch

Change in v3 (Thanks to Maxime Ripard):
 - Reauthor Icenowy for her path

Changes in v2 (Thanks to Maxime Ripard):
 - Drop GPU OPP Table
 - Add clocks and clock-names in required

Clément Péron (5):
  drm: panfrost: add optional bus_clock
  iommu: io-pgtable: fix sanity check for non 48-bit mali iommu
  dt-bindings: gpu: mali-midgard: Add H6 mali gpu compatible
  arm64: dts: allwinner: Add ARM Mali GPU node for H6
  arm64: dts: allwinner: Add mali GPU supply for H6 boards

Icenowy Zheng (1):
  dt-bindings: gpu: add bus clock for Mali Midgard GPUs

 .../bindings/gpu/arm,mali-midgard.txt | 15 -
 .../dts/allwinner/sun50i-h6-beelink-gs1.dts   |  6 +
 .../dts/allwinner/sun50i-h6-orangepi-3.dts|  6 +
 .../dts/allwinner/sun50i-h6-orangepi.dtsi |  6 +
 .../boot/dts/allwinner/sun50i-h6-pine-h64.dts |  6 +
 arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi  | 14 
 drivers/gpu/drm/panfrost/panfrost_device.c| 22 +++
 drivers/gpu/drm/panfrost/panfrost_device.h|  1 +
 drivers/iommu/io-pgtable-arm.c|  2 +-
 9 files changed, 76 insertions(+), 2 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 1/6] drm: panfrost: add optional bus_clock

2019-05-21 Thread Clément Péron
Allwinner H6 has an ARM Mali-T720 MP2 which required a bus_clock.

Add an optional bus_clock at the init of the panfrost driver.

Signed-off-by: Clément Péron 
---
 drivers/gpu/drm/panfrost/panfrost_device.c | 22 ++
 drivers/gpu/drm/panfrost/panfrost_device.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
b/drivers/gpu/drm/panfrost/panfrost_device.c
index 3b2bced1b015..ccb8eb2a518c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.c
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -55,11 +55,33 @@ static int panfrost_clk_init(struct panfrost_device *pfdev)
if (err)
return err;
 
+   pfdev->bus_clock = devm_clk_get_optional(pfdev->dev, "bus");
+   if (IS_ERR(pfdev->bus_clock)) {
+   dev_err(pfdev->dev, "get bus_clock failed %ld\n",
+   PTR_ERR(pfdev->bus_clock));
+   return PTR_ERR(pfdev->bus_clock);
+   }
+
+   if (pfdev->bus_clock) {
+   rate = clk_get_rate(pfdev->bus_clock);
+   dev_info(pfdev->dev, "bus_clock rate = %lu\n", rate);
+
+   err = clk_prepare_enable(pfdev->bus_clock);
+   if (err)
+   goto disable_clock;
+   }
+
return 0;
+
+disable_clock:
+   clk_disable_unprepare(pfdev->clock);
+
+   return err;
 }
 
 static void panfrost_clk_fini(struct panfrost_device *pfdev)
 {
+   clk_disable_unprepare(pfdev->bus_clock);
clk_disable_unprepare(pfdev->clock);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 56f452dfb490..8074f221034b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -66,6 +66,7 @@ struct panfrost_device {
 
void __iomem *iomem;
struct clk *clock;
+   struct clk *bus_clock;
struct regulator *regulator;
struct reset_control *rstc;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function

2019-05-21 Thread Jean-Philippe Brucker
On 20/05/2019 20:22, Jacob Pan wrote:
> On Thu, 16 May 2019 09:14:29 -0700
> Jacob Pan  wrote:
> 
>> On Thu, 16 May 2019 15:14:40 +0100
>> Jean-Philippe Brucker  wrote:
>>
>>> Hi Jacob,
>>>
>>> On 03/05/2019 23:32, Jacob Pan wrote:  
 +/**
 + * struct gpasid_bind_data - Information about device and guest
 PASID binding
 + * @gcr3: Guest CR3 value from guest mm
 + * @pasid:Process address space ID used for the guest mm
 + * @addr_width:   Guest address width. Paging mode can also
 be derived.
 + */
 +struct gpasid_bind_data {
 +  __u64 gcr3;
 +  __u32 pasid;
 +  __u32 addr_width;
 +  __u32 flags;
 +#define   IOMMU_SVA_GPASID_SREBIT(0) /* supervisor
 request */
 +  __u8 padding[4];
 +};
>>>
>>> Could you wrap this structure into a generic one like we now do for
>>> bind_pasid_table? It would make the API easier to extend, because if
>>> we ever add individual PASID bind on Arm (something I'd like to do
>>> for virtio-iommu, eventually) it will have different parameters, as
>>> our PASID table entry has a lot of fields describing the page table
>>> format.
>>>
>>> Maybe something like the following would do?
>>>
>>> struct gpasid_bind_data {
>>> #define IOMMU_GPASID_BIND_VERSION_1 1
>>> __u32 version;
>>> #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD  1
>>> __u32 format;
>>> union {
>>> // the current gpasid_bind_data:
>>> struct gpasid_bind_intel_vtd vtd;
>>> };
>>> };
>>>   
> 
> Could you review the struct below? I am trying to extract the
> common fileds as much as possible. Didn't do exactly as you suggested
> to keep vendor specific data in separate struct under the same union.

Thanks, it looks good and I think we can reuse it for SMMUv2 and v3.
Some comments below.

> 
> Also, can you review the v3 ioasid allocator common code patches? I am
> hoping we can get the common code in v5.3 so that we can focus on the
> vendor specific part. The common code should include bind_guest_pasid
> and ioasid allocator.
> https://lkml.org/lkml/2019/5/3/787
> https://lkml.org/lkml/2019/5/3/780
> 
> Thanks,
> 
> Jacob
> 
> 
> /**
>  * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
>  * SVA binding.
>  *
>  * @flags:VT-d PASID table entry attributes
>  * @pat:  Page attribute table data to compute effective memory type
>  * @emt:  Extended memory type
>  *
>  * Only guest vIOMMU selectable and effective options are passed down to
>  * the host IOMMU.
>  */
> struct gpasid_bind_data_vtd {
> #define   IOMMU_SVA_VTD_GPASID_SREBIT(0) /* supervisor request */
> #define   IOMMU_SVA_VTD_GPASID_EAFE   BIT(1) /* extended access 
> enable */
> #define   IOMMU_SVA_VTD_GPASID_PCDBIT(2) /* page-level cache 
> disable */
> #define   IOMMU_SVA_VTD_GPASID_PWTBIT(3) /* page-level write 
> through */
> #define   IOMMU_SVA_VTD_GPASID_EMTE   BIT(4) /* extended memory type 
> enable */
> #define   IOMMU_SVA_VTD_GPASID_CD BIT(5) /* PASID-level cache 
> disable */

It doesn't seem like the BIT() macro is exported to userspace, so we
can't use it here

>   __u64 flags;
>   __u32 pat;
>   __u32 emt;
> };
> 
> /**
>  * struct gpasid_bind_data - Information about device and guest PASID binding
>  * @version:  Version of this data structure
>  * @format:   PASID table entry format
>  * @flags:Additional information on guest bind request
>  * @gpgd: Guest page directory base of the guest mm to bind
>  * @hpasid:   Process address space ID used for the guest mm in host IOMMU
>  * @gpasid:   Process address space ID used for the guest mm in guest IOMMU

Trying to understand the full flow:
* @gpasid is the one allocated by the guest using a virtual command. The
guest writes @gpgd into the virtual PASID table at index @gpasid, then
sends an invalidate command to QEMU.
* QEMU issues a gpasid_bind ioctl (on the mdev or its container?). VFIO
forwards. The IOMMU driver installs @gpgd into the PASID table using
@hpasid, which is associated with the auxiliary domain.

But why do we need the @hpasid field here? Does userspace know about it
at all, and does VFIO need to pass it to the IOMMU driver?

>  * @addr_width:   Guest address width. Paging mode can also be derived.

What does the last sentence mean? @addr_width should probably be in @vtd
if it provides implicit information.

>  * @vtd:  Intel VT-d specific data
>  */
> struct gpasid_bind_data {
> #define IOMMU_GPASID_BIND_VERSION_1   1
>   __u32 version;
> #define IOMMU_PASID_FORMAT_INTEL_VTD  1
>   __u32 format;
> #define   IOMMU_SVA_GPASID_VALBIT(1) /* guest PASID valid */

(There are tabs between define and name here, as well as in the VT-d
specific data)

>   __u64 flags;
>   __u64 gpgd;
>   __u64 hpasid;
>   __u64 gpasid;
>   __u32 addr_width;

I think the union has to be aligned on 64-bit, otherwise a compiler

Re: [RFC/PATCH 0/4] Initial support for modular IOMMU drivers

2019-05-21 Thread Robin Murphy

On 17/05/2019 19:47, Isaac J. Manjarres wrote:

This series adds initial support for being able to use the ARM
SMMU driver as a loadable kernel module. The series also adds
to the IOMMU framework, so that it can defer probing for devices
that depend on an IOMMU driver that may be a loadable module.

The primary reason behind these changes is that having the ARM
SMMU driver as a module allows for the same kernel image to be
used across different platforms. For example, if one platform
contains an IOMMU that implements one version of the ARM SMMU
specification, and another platform simply does not have an
IOMMU, the only way that these platforms can share the same
kernel image is if the ARM SMMU driver is compiled into the
kernel image.

This solution is not scalable, as it will lead to bloating the
kernel image with support for several future versions of the
SMMU specification to maintain a common kernel image that works
across all platforms.


There are currently two versions of the SMMU spec: v2 (which forms a 
superset of v1), and v3 which is the current architecture. Given how 
closely I work with the SMMU architecture team, I'm particularly 
interested to hear more about these "future versions"... :)



Having the ARM SMMU driver as a module allows
for a common kernel image to be supported across all platforms,
while yielding a smaller kernel image size, since the correct
SMMU driver can be loaded at runtime, if necessary.


arm-smmu and arm-smmu-v3 aren't *all* that much bigger than any of the 
other IOMMU drivers that are also present in a multiplatform build, and 
already share quite a bit of common code, so while I can guess at what 
you might really mean, it's a pretty weak argument as stated.



Patchset Summary:

1. Since the ARM SMMU driver depends on symbols being exported from
several subsystems, the first three patches are dedicated to exporting
the necessary symbols.

2. Similar to how the pinctrl framework handles deferring probes,
the subsequent patch makes it so that the IOMMU framework will defer
probes indefinitely if there is a chance that the IOMMU driver that a
device is waiting for is a module. Otherwise, it upholds the current
behavior of stopping probe deferrals once all of the builtin drivers
have finished probing.

The ARM SMMU driver currently has support for the deprecated
"mmu-masters" binding, which relies on the notion of initcall
ordering for setting the bus ops to ensure that all SMMU devices
have been bound to the driver. This poses a problem with
making the driver a module, as there is no such notion with
loadable modules. Will support for this be completely deprecated?
If not, might it be useful to leverage the device tree ordering,
and assign a property to the last SMMU device, and set the bus ops
at that point? Or perhaps have some deferred timer based approach
to know when to set the bus ops?


Unfortunately, I believe the old binding is still deployed in production 
firmwares which may well never get updated, and thus needs to remain 
functional (I've already had one report of the default bypass behaviour 
breaking it in 5.2 which I need to fix somehow...)


Rather than just the trivial "export a bunch of symbols which won't 
actually be needed yet", from the title I was hoping to see some patches 
really making drivers modular and proposing solutions to those difficult 
problems of making it work robustly. It's very easy to make it 'work' as 
a proof-of-concept (locally I still have a patch dated 2016 based on the 
original probe-deferral work), but those questions really want answering 
to some degree before it's worth doing any of this in mainline.


Robin.

(now starting to wonder whether this might be my own fault for 
mentioning it at LPC... :P)




Thanks,
Isaac

Isaac J. Manjarres (4):
   of: Export of_phandle_iterator_args() to modules
   PCI: Export PCI ACS and DMA searching functions to modules
   iommu: Export core IOMMU functions to kernel modules
   iommu: Add probe deferral support for IOMMU kernel modules

  drivers/iommu/iommu-sysfs.c | 3 +++
  drivers/iommu/iommu.c   | 6 ++
  drivers/iommu/of_iommu.c| 8 ++--
  drivers/of/base.c   | 1 +
  drivers/pci/pci.c   | 1 +
  drivers/pci/search.c| 1 +
  6 files changed, 18 insertions(+), 2 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC/PATCH 0/4] Initial support for modular IOMMU drivers

2019-05-21 Thread Jean-Philippe Brucker
Hi Isaac,

On 17/05/2019 19:47, Isaac J. Manjarres wrote:
> This series adds initial support for being able to use the ARM
> SMMU driver as a loadable kernel module. The series also adds
> to the IOMMU framework, so that it can defer probing for devices
> that depend on an IOMMU driver that may be a loadable module.
> 
> The primary reason behind these changes is that having the ARM
> SMMU driver as a module allows for the same kernel image to be
> used across different platforms. For example, if one platform
> contains an IOMMU that implements one version of the ARM SMMU
> specification, and another platform simply does not have an
> IOMMU, the only way that these platforms can share the same
> kernel image is if the ARM SMMU driver is compiled into the
> kernel image.
> 
> This solution is not scalable, as it will lead to bloating the
> kernel image with support for several future versions of the
> SMMU specification to maintain a common kernel image that works
> across all platforms. Having the ARM SMMU driver as a module allows
> for a common kernel image to be supported across all platforms,
> while yielding a smaller kernel image size, since the correct
> SMMU driver can be loaded at runtime, if necessary.

It can also be useful if IOMMU drivers want to rely on components that
distros usually build as modules. I have that problem with virtio-iommu,
where the whole virtio transport is usually modular.

> Patchset Summary:
> 
> 1. Since the ARM SMMU driver depends on symbols being exported from
> several subsystems, the first three patches are dedicated to exporting
> the necessary symbols.
> 
> 2. Similar to how the pinctrl framework handles deferring probes,
> the subsequent patch makes it so that the IOMMU framework will defer
> probes indefinitely if there is a chance that the IOMMU driver that a
> device is waiting for is a module. Otherwise, it upholds the current
> behavior of stopping probe deferrals once all of the builtin drivers
> have finished probing.
> 
> The ARM SMMU driver currently has support for the deprecated
> "mmu-masters" binding, which relies on the notion of initcall
> ordering for setting the bus ops to ensure that all SMMU devices
> have been bound to the driver. This poses a problem with
> making the driver a module, as there is no such notion with
> loadable modules. Will support for this be completely deprecated?
> If not, might it be useful to leverage the device tree ordering,
> and assign a property to the last SMMU device, and set the bus ops
> at that point? Or perhaps have some deferred timer based approach
> to know when to set the bus ops? 

Another problem is module unloading: if the user calls rmmod on an IOMMU
module, we have to ensure that endpoints aren't performing DMA anymore.
It could be solved by declaring consumers of an IOMMU with
device_link_add(), so that device drivers are unbound before the IOMMU
module is unloaded.

Thanks,
Jean

> 
> Thanks,
> Isaac
> 
> Isaac J. Manjarres (4):
>   of: Export of_phandle_iterator_args() to modules
>   PCI: Export PCI ACS and DMA searching functions to modules
>   iommu: Export core IOMMU functions to kernel modules
>   iommu: Add probe deferral support for IOMMU kernel modules
> 
>  drivers/iommu/iommu-sysfs.c | 3 +++
>  drivers/iommu/iommu.c   | 6 ++
>  drivers/iommu/of_iommu.c| 8 ++--
>  drivers/of/base.c   | 1 +
>  drivers/pci/pci.c   | 1 +
>  drivers/pci/search.c| 1 +
>  6 files changed, 18 insertions(+), 2 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Pierre Morel

On 21/05/2019 16:59, Alex Williamson wrote:

On Tue, 21 May 2019 11:14:38 +0200
Pierre Morel  wrote:


On 20/05/2019 20:23, Alex Williamson wrote:

On Mon, 20 May 2019 18:31:08 +0200
Pierre Morel  wrote:
   

On 20/05/2019 16:27, Cornelia Huck wrote:

On Mon, 20 May 2019 13:19:23 +0200
Pierre Morel  wrote:
  

On 17/05/2019 20:04, Pierre Morel wrote:

On 17/05/2019 18:41, Alex Williamson wrote:

On Fri, 17 May 2019 18:16:50 +0200
Pierre Morel  wrote:


We implement the capability interface for VFIO_IOMMU_GET_INFO.

When calling the ioctl, the user must specify
VFIO_IOMMU_INFO_CAPABILITIES to retrieve the capabilities and
must check in the answer if capabilities are supported.

The iommu get_attr callback will be used to retrieve the specific
attributes and fill the capabilities.

Currently two Z-PCI specific capabilities will be queried and
filled by the underlying Z specific s390_iommu:
VFIO_IOMMU_INFO_CAP_QFN for the PCI query function attributes
and
VFIO_IOMMU_INFO_CAP_QGRP for the PCI query function group.

Other architectures may add new capabilities in the same way
after enhancing the architecture specific IOMMU driver.

Signed-off-by: Pierre Morel 
---
     drivers/vfio/vfio_iommu_type1.c | 122
+++-
     1 file changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c
b/drivers/vfio/vfio_iommu_type1.c
index d0f731c..9435647 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1658,6 +1658,97 @@ static int
vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
     return ret;
     }
+static int vfio_iommu_type1_zpci_fn(struct iommu_domain *domain,
+    struct vfio_info_cap *caps, size_t size)
+{
+    struct vfio_iommu_type1_info_pcifn *info_fn;
+    int ret;
+
+    info_fn = kzalloc(size, GFP_KERNEL);
+    if (!info_fn)
+    return -ENOMEM;
+
+    ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_ZPCI_FN,
+    _fn->response);


What ensures that the 'struct clp_rsp_query_pci' returned from this
get_attr remains consistent with a 'struct vfio_iommu_pci_function'?
Why does the latter contains so many reserved fields (beyond simply
alignment) for a user API?  What fields of these structures are
actually useful to userspace?  Should any fields not be exposed to the
user?  Aren't BAR sizes redundant to what's available through the vfio
PCI API?  I'm afraid that simply redefining an internal structure as
the API leaves a lot to be desired too.  Thanks,

Alex


Hi Alex,

I simply used the structure returned by the firmware to be sure to be
consistent with future evolutions and facilitate the copy from CLP and
to userland.

If you prefer, and I understand that this is the case, I can define a
specific VFIO_IOMMU structure with only the fields relevant to the user,
leaving future enhancement of the user's interface being implemented in
another kernel patch when the time has come.


TBH, I had no idea that CLP is an s390 firmware interface and this is
just dumping that to userspace.  The cover letter says:

Using the PCI VFIO interface allows userland, a.k.a. QEMU, to
retrieve ZPCI specific information without knowing Z specific
identifiers like the function ID or the function handle of the zPCI
function hidden behind the PCI interface.

But what does this allow userland to do and what specific pieces of
information do they need?  We do have a case already where Intel
graphics devices have a table (OpRegion) living in host system memory
that we expose via a vfio region, so it wouldn't be unprecedented to do
something like this, but as Connie suggests, if we knew what was being
consumed here and why, maybe we could generalize it into something
useful for others.


OK, sorry I try to explain better.

1) A short description, of zPCI functions and groups

IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
We access PCI cards through 2 ways:
- dedicated PCI instructions (pci_load/pci_store/pci/store_block)
- DMA
We receive events through
- Adapter interrupts
- CHSC events

The adapter propose an IOMMU to protect the DMA
and the interrupt handling goes through a MSIX like interface handled by
the adapter.

The architecture specific PCI do the interface between the standard PCI
level and the zPCI function (PCI + DMA/IOMMU/Interrupt)

To handle the communication through the "zPCI way" the CLP interface
provides instructions to retrieve informations from the adapters.

There are different group of functions having same functionalities.

clp_list give us a list from zPCI functions
clp_query_pci_function returns informations specific to a function
clp_query_group returns information on a function group


2) Why do we need it in the guest

We need to provide the guest with information on the adapters and zPCI
functions returned by the clp_query instruction so that the guest's
driver gets the right information on how the way to 

Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Alex Williamson
On Tue, 21 May 2019 11:14:38 +0200
Pierre Morel  wrote:

> On 20/05/2019 20:23, Alex Williamson wrote:
> > On Mon, 20 May 2019 18:31:08 +0200
> > Pierre Morel  wrote:
> >   
> >> On 20/05/2019 16:27, Cornelia Huck wrote:  
> >>> On Mon, 20 May 2019 13:19:23 +0200
> >>> Pierre Morel  wrote:
> >>>  
>  On 17/05/2019 20:04, Pierre Morel wrote:  
> > On 17/05/2019 18:41, Alex Williamson wrote:  
> >> On Fri, 17 May 2019 18:16:50 +0200
> >> Pierre Morel  wrote:
> >>
> >>> We implement the capability interface for VFIO_IOMMU_GET_INFO.
> >>>
> >>> When calling the ioctl, the user must specify
> >>> VFIO_IOMMU_INFO_CAPABILITIES to retrieve the capabilities and
> >>> must check in the answer if capabilities are supported.
> >>>
> >>> The iommu get_attr callback will be used to retrieve the specific
> >>> attributes and fill the capabilities.
> >>>
> >>> Currently two Z-PCI specific capabilities will be queried and
> >>> filled by the underlying Z specific s390_iommu:
> >>> VFIO_IOMMU_INFO_CAP_QFN for the PCI query function attributes
> >>> and
> >>> VFIO_IOMMU_INFO_CAP_QGRP for the PCI query function group.
> >>>
> >>> Other architectures may add new capabilities in the same way
> >>> after enhancing the architecture specific IOMMU driver.
> >>>
> >>> Signed-off-by: Pierre Morel 
> >>> ---
> >>>     drivers/vfio/vfio_iommu_type1.c | 122
> >>> +++-
> >>>     1 file changed, 121 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c
> >>> b/drivers/vfio/vfio_iommu_type1.c
> >>> index d0f731c..9435647 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -1658,6 +1658,97 @@ static int
> >>> vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>     return ret;
> >>>     }
> >>> +static int vfio_iommu_type1_zpci_fn(struct iommu_domain *domain,
> >>> +    struct vfio_info_cap *caps, size_t size)
> >>> +{
> >>> +    struct vfio_iommu_type1_info_pcifn *info_fn;
> >>> +    int ret;
> >>> +
> >>> +    info_fn = kzalloc(size, GFP_KERNEL);
> >>> +    if (!info_fn)
> >>> +    return -ENOMEM;
> >>> +
> >>> +    ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_ZPCI_FN,
> >>> +    _fn->response);  
> >>
> >> What ensures that the 'struct clp_rsp_query_pci' returned from this
> >> get_attr remains consistent with a 'struct vfio_iommu_pci_function'?
> >> Why does the latter contains so many reserved fields (beyond simply
> >> alignment) for a user API?  What fields of these structures are
> >> actually useful to userspace?  Should any fields not be exposed to the
> >> user?  Aren't BAR sizes redundant to what's available through the vfio
> >> PCI API?  I'm afraid that simply redefining an internal structure as
> >> the API leaves a lot to be desired too.  Thanks,
> >>
> >> Alex
> >>
> > Hi Alex,
> >
> > I simply used the structure returned by the firmware to be sure to be
> > consistent with future evolutions and facilitate the copy from CLP and
> > to userland.
> >
> > If you prefer, and I understand that this is the case, I can define a
> > specific VFIO_IOMMU structure with only the fields relevant to the user,
> > leaving future enhancement of the user's interface being implemented in
> > another kernel patch when the time has come.  
> > 
> > TBH, I had no idea that CLP is an s390 firmware interface and this is
> > just dumping that to userspace.  The cover letter says:
> > 
> >Using the PCI VFIO interface allows userland, a.k.a. QEMU, to
> >retrieve ZPCI specific information without knowing Z specific
> >identifiers like the function ID or the function handle of the zPCI
> >function hidden behind the PCI interface.
> > 
> > But what does this allow userland to do and what specific pieces of
> > information do they need?  We do have a case already where Intel
> > graphics devices have a table (OpRegion) living in host system memory
> > that we expose via a vfio region, so it wouldn't be unprecedented to do
> > something like this, but as Connie suggests, if we knew what was being
> > consumed here and why, maybe we could generalize it into something
> > useful for others.  
> 
> OK, sorry I try to explain better.
> 
> 1) A short description, of zPCI functions and groups
> 
> IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
> We access PCI cards through 2 ways:
> - dedicated PCI instructions (pci_load/pci_store/pci/store_block)
> - DMA
> We receive events through
> - Adapter interrupts
> - CHSC events
> 
> The adapter propose an IOMMU to protect the DMA
> and the interrupt handling goes through a MSIX like interface handled by 

Re: [PATCH 1/2] dma-mapping: truncate dma masks to what dma_addr_t can hold

2019-05-21 Thread Christoph Hellwig
On Tue, May 21, 2019 at 02:04:37PM +0100, Russell King - ARM Linux admin wrote:
> So how does the driver negotiation for >32bit addresses work if we don't
> fail for large masks?
> 
> I'm thinking about all those PCI drivers that need DAC cycles for >32bit
> addresses, such as e1000, which negotiate via (eg):
> 
> /* there is a workaround being applied below that limits
>  * 64-bit DMA addresses to 64-bit hardware.  There are some
>  * 32-bit adapters that Tx hang when given 64-bit DMA addresses
>  */
> pci_using_dac = 0;
> if ((hw->bus_type == e1000_bus_type_pcix) &&
> !dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64))) {
> pci_using_dac = 1;
> } else {
> err = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
> if (err) {
> pr_err("No usable DMA config, aborting\n");
> goto err_dma;
> }
> }
> 
> and similar.  If we blindly trunate the 64-bit to 32-bit, aren't we
> going to end up with PCI cards using DAC cycles to a host bridge that
> do not support DAC cycles?

In general PCI devices just use DAC cycles when they need it.  I only
know of about a handful of devices that need to negotiate their
addressing mode, and those already use the proper API for that, which
is dma_get_required_mask.

The e1000 example is a good case of how the old API confused people.
First it only sets the 64-bit mask for devices which can support it,
which is good, but then it sets the NETIF_F_HIGHDMA flag only if we
set a 64-bit mask, which is completely unrelated to the DMA mask,
it just means the driver can handle sk_buff fragments that do not
have a kernel mapping, which really is a driver and not a hardware
issue.

So what this driver really should do is something like:


diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 551de8c2fef2..d9236083da94 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -925,7 +925,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
static int cards_found;
static int global_quad_port_a; /* global ksp3 port a indication */
-   int i, err, pci_using_dac;
+   int i, err;
u16 eeprom_data = 0;
u16 tmp = 0;
u16 eeprom_apme_mask = E1000_EEPROM_APME;
@@ -996,16 +996,11 @@ static int e1000_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 * 64-bit DMA addresses to 64-bit hardware.  There are some
 * 32-bit adapters that Tx hang when given 64-bit DMA addresses
 */
-   pci_using_dac = 0;
-   if ((hw->bus_type == e1000_bus_type_pcix) &&
-   !dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64))) {
-   pci_using_dac = 1;
-   } else {
-   err = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
-   if (err) {
-   pr_err("No usable DMA config, aborting\n");
-   goto err_dma;
-   }
+   err = dma_set_mask_and_coherent(>dev,
+   DMA_BIT_MASK(hw->bus_type == e1000_bus_type_pcix ? 64 : 32));
+   if (err) {
+   pr_err("No usable DMA config, aborting\n");
+   goto err_dma;
}
 
netdev->netdev_ops = _netdev_ops;
@@ -1047,19 +1042,15 @@ static int e1000_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
netdev->priv_flags |= IFF_SUPP_NOFCS;
 
-   netdev->features |= netdev->hw_features;
+   netdev->features |= netdev->hw_features | NETIF_F_HIGHDMA;
netdev->hw_features |= (NETIF_F_RXCSUM |
NETIF_F_RXALL |
NETIF_F_RXFCS);
 
-   if (pci_using_dac) {
-   netdev->features |= NETIF_F_HIGHDMA;
-   netdev->vlan_features |= NETIF_F_HIGHDMA;
-   }
-
netdev->vlan_features |= (NETIF_F_TSO |
  NETIF_F_HW_CSUM |
- NETIF_F_SG);
+ NETIF_F_SG |
+ NETIF_F_HIGHDMA);
 
/* Do not set IFF_UNICAST_FLT for VMWare's 82545EM */
if (hw->device_id != E1000_DEV_ID_82545EM_COPPER ||

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] ARM: dma-mapping: allow larger DMA mask than supported

2019-05-21 Thread Russell King - ARM Linux admin
On Tue, May 21, 2019 at 02:47:29PM +0200, Christoph Hellwig wrote:
> Since Linux 5.1 we allow drivers to just set the largest DMA mask they
> support instead of falling back to smaller ones.

This doesn't make sense.  "they" is confusing - why would a driver set
a DMA mask larger than the driver supports?  Or is "they" not
referring to the drivers (in which case, what is it referring to?)

> When fixing up all the dma ops instances to allow for this behavior
> the arm direct mapping code was missed.  Fix it up by removing the
> sanity check, as all the actual mapping code handles this case just
> fine.
> 
> Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/mm/dma-mapping.c | 20 +---
>  1 file changed, 1 insertion(+), 19 deletions(-)
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 0a75058c11f3..bdf0d236aaee 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -219,25 +219,7 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
>  
>  static int __dma_supported(struct device *dev, u64 mask, bool warn)
>  {
> - unsigned long max_dma_pfn;
> -
> - /*
> -  * If the mask allows for more memory than we can address,
> -  * and we actually have that much memory, then we must
> -  * indicate that DMA to this device is not supported.
> -  */
> - if (sizeof(mask) != sizeof(dma_addr_t) &&
> - mask > (dma_addr_t)~0 &&
> - dma_to_pfn(dev, ~0) < max_pfn - 1) {
> - if (warn) {
> - dev_warn(dev, "Coherent DMA mask %#llx is larger than 
> dma_addr_t allows\n",
> -  mask);
> - dev_warn(dev, "Driver did not use or check the return 
> value from dma_set_coherent_mask()?\n");
> - }
> - return 0;
> - }

The point of this check is to trap the case where we have, for example,
8GB of memory, but dma_addr_t is 32-bit.  We can allocate in the high
4GB, but we can't represent the address in a dma_addr_t.

> -
> - max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
> + unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
>  
>   /*
>* Translate the device's DMA mask to a PFN limit.  This
> -- 
> 2.20.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: truncate dma masks to what dma_addr_t can hold

2019-05-21 Thread Russell King - ARM Linux admin
On Tue, May 21, 2019 at 02:47:28PM +0200, Christoph Hellwig wrote:
> The dma masks in struct device are always 64-bits wide.  But for builds
> using a 32-bit dma_addr_t we need to ensure we don't store an
> unsupportable value.  Before Linux 5.0 this was handled at least by
> the ARM dma mapping code by never allowing to set a larger dma_mask,
> but these days we allow the driver to just set the largest supported
> value and never fall back to a smaller one.  Ensure this always works
> by truncating the value.

So how does the driver negotiation for >32bit addresses work if we don't
fail for large masks?

I'm thinking about all those PCI drivers that need DAC cycles for >32bit
addresses, such as e1000, which negotiate via (eg):

/* there is a workaround being applied below that limits
 * 64-bit DMA addresses to 64-bit hardware.  There are some
 * 32-bit adapters that Tx hang when given 64-bit DMA addresses
 */
pci_using_dac = 0;
if ((hw->bus_type == e1000_bus_type_pcix) &&
!dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(64))) {
pci_using_dac = 1;
} else {
err = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
if (err) {
pr_err("No usable DMA config, aborting\n");
goto err_dma;
}
}

and similar.  If we blindly trunate the 64-bit to 32-bit, aren't we
going to end up with PCI cards using DAC cycles to a host bridge that
do not support DAC cycles?

> 
> Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/mapping.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index f7afdadb6770..1f628e7ac709 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -317,6 +317,12 @@ void arch_dma_set_mask(struct device *dev, u64 mask);
>  
>  int dma_set_mask(struct device *dev, u64 mask)
>  {
> + /*
> +  * Truncate the mask to the actually supported dma_addr_t width to
> +  * avoid generating unsupportable addresses.
> +  */
> + mask = (dma_addr_t)mask;
> +
>   if (!dev->dma_mask || !dma_supported(dev, mask))
>   return -EIO;
>  
> @@ -330,6 +336,12 @@ EXPORT_SYMBOL(dma_set_mask);
>  #ifndef CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK
>  int dma_set_coherent_mask(struct device *dev, u64 mask)
>  {
> + /*
> +  * Truncate the mask to the actually supported dma_addr_t width to
> +  * avoid generating unsupportable addresses.
> +  */
> + mask = (dma_addr_t)mask;
> +
>   if (!dma_supported(dev, mask))
>   return -EIO;
>  
> -- 
> 2.20.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] ARM: dma-mapping: allow larger DMA mask than supported

2019-05-21 Thread Christoph Hellwig
On Tue, May 21, 2019 at 02:00:47PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, May 21, 2019 at 02:47:29PM +0200, Christoph Hellwig wrote:
> > Since Linux 5.1 we allow drivers to just set the largest DMA mask they
> > support instead of falling back to smaller ones.
> 
> This doesn't make sense.  "they" is confusing - why would a driver set
> a DMA mask larger than the driver supports?  Or is "they" not
> referring to the drivers (in which case, what is it referring to?)

The current plan is:

 - the driver sets whatever the device supports, and unless that mask
   is too small to be supportable it will always succeed

Which replaces the previous scheme of:

 - the driver tries to set whatever the device supports.  If there are
   addressing limitation outside the device (e.g. the PCIe root port,
   or the AXI interconnect) that will fail, and the device will set
   a 32-bit mask instead which it assumes will generally work.

> The point of this check is to trap the case where we have, for example,
> 8GB of memory, but dma_addr_t is 32-bit.  We can allocate in the high
> 4GB, but we can't represent the address in a dma_addr_t.

Yep, and that is what patch 1/2 should handle by truncating the
dma mask to something that can work.  I don't actually have hardware
I could test this scenario on, though.


Re: [PATCH v7 1/1] iommu: enhance IOMMU dma mode build options

2019-05-21 Thread John Garry

On 20/05/2019 14:59, Zhen Lei wrote:

First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the three config options in an choice, make people can only choose one of
the three at a time.

The default IOMMU dma modes on each ARCHs have no change.

Signed-off-by: Zhen Lei 


Apart from more minor comments, FWIW:

Reviewed-by: John Garry 


---
 arch/ia64/kernel/pci-dma.c|  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
 arch/s390/pci/pci_dma.c   |  2 +-
 arch/x86/kernel/pci-dma.c |  7 ++---
 drivers/iommu/Kconfig | 44 ++-
 drivers/iommu/amd_iommu_init.c|  3 ++-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c |  3 ++-
 8 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index fe988c49f01ce6a..655511dbf3c3b34 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -22,7 +22,7 @@
 int force_iommu __read_mostly;
 #endif

-int iommu_pass_through;
+int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);


As commented privately, I could never see this set for ia64, and it 
seems to exist just to keep the linker happy. Anyway, I am not sure if 
ever suitable to be set.




 static int __init pci_iommu_init(void)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3ead4c237ed0ec9..383e082a9bb985c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char 
*level,
va_end(args);
 }

-static bool pnv_iommu_bypass_disabled __read_mostly;
+static bool pnv_iommu_bypass_disabled __read_mostly =
+   !IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
 static bool pci_reset_phbs __read_mostly;

 static int __init iommu_setup(char *str)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 9e52d1527f71495..784ad1e0acecfb1 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -17,7 +17,7 @@

 static struct kmem_cache *dma_region_table_cache;
 static struct kmem_cache *dma_page_table_cache;
-static int s390_iommu_strict;
+static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);

 static int zpci_refresh_global(struct zpci_dev *zdev)
 {
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d460998ae828514..fb2bab42a0a3173 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -43,11 +43,8 @@
  * It is also possible to disable by default in kernel config, and enable with
  * iommu=nopt at boot time.
  */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
+int iommu_pass_through __read_mostly =
+   IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816c64..8a1f1793cde76b4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,17 +74,47 @@ config IOMMU_DEBUGFS
  debug/iommu directory, and then populate a subdirectory with
  entries as required.

-config IOMMU_DEFAULT_PASSTHROUGH
-   bool "IOMMU passthrough by default"
+choice
+   prompt "IOMMU default DMA mode"
depends on IOMMU_API
-help
- Enable passthrough by default, removing the need to pass in
- iommu.passthrough=on or iommu=pt through command line. If this
- is enabled, you can still disable with iommu.passthrough=off
- or iommu=nopt depending on the architecture.
+   default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
+   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows IOMMU DMA mode to be chose at build time, to


I'd say /s/allows IOMMU/allows an IOMMU/, /s/chose/chosen/


+ override the default DMA mode of each ARCHs, removing the need to


ARCHs should be singular


+ pass in kernel parameters through command line. You can still use
+ ARCHs specific boot options to override this option again.
+
+config IOMMU_DEFAULT_PASSTHROUGH
+   bool "passthrough"
+   help
+ In this mode, the DMA access through IOMMU without any addresses
+ translation. That means, the wrong or illegal DMA access can not
+ be caught, no error information will be reported.

  If unsure, say N here.

+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation 

[PATCH 1/2] dma-mapping: truncate dma masks to what dma_addr_t can hold

2019-05-21 Thread Christoph Hellwig
The dma masks in struct device are always 64-bits wide.  But for builds
using a 32-bit dma_addr_t we need to ensure we don't store an
unsupportable value.  Before Linux 5.0 this was handled at least by
the ARM dma mapping code by never allowing to set a larger dma_mask,
but these days we allow the driver to just set the largest supported
value and never fall back to a smaller one.  Ensure this always works
by truncating the value.

Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/mapping.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f7afdadb6770..1f628e7ac709 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -317,6 +317,12 @@ void arch_dma_set_mask(struct device *dev, u64 mask);
 
 int dma_set_mask(struct device *dev, u64 mask)
 {
+   /*
+* Truncate the mask to the actually supported dma_addr_t width to
+* avoid generating unsupportable addresses.
+*/
+   mask = (dma_addr_t)mask;
+
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
 
@@ -330,6 +336,12 @@ EXPORT_SYMBOL(dma_set_mask);
 #ifndef CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK
 int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
+   /*
+* Truncate the mask to the actually supported dma_addr_t width to
+* avoid generating unsupportable addresses.
+*/
+   mask = (dma_addr_t)mask;
+
if (!dma_supported(dev, mask))
return -EIO;
 
-- 
2.20.1



fixups for the dma_set_mask beahvior change in 5.1

2019-05-21 Thread Christoph Hellwig
Hi all,

in 5.1 I fixed up the DMA mapping API to always accept larger than
required DMA mask.  Except that I forgot about a check in the arm
code that is not required any more, and about the case where a
architecture only supports a 32-bit dma mask, but could potentially
generate large addresses due to offsets for the DMA address.

These two patches should fix up these issues (which were only found by
code inspection).


[PATCH 2/2] ARM: dma-mapping: allow larger DMA mask than supported

2019-05-21 Thread Christoph Hellwig
Since Linux 5.1 we allow drivers to just set the largest DMA mask they
support instead of falling back to smaller ones.

When fixing up all the dma ops instances to allow for this behavior
the arm direct mapping code was missed.  Fix it up by removing the
sanity check, as all the actual mapping code handles this case just
fine.

Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a75058c11f3..bdf0d236aaee 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -219,25 +219,7 @@ EXPORT_SYMBOL(arm_coherent_dma_ops);
 
 static int __dma_supported(struct device *dev, u64 mask, bool warn)
 {
-   unsigned long max_dma_pfn;
-
-   /*
-* If the mask allows for more memory than we can address,
-* and we actually have that much memory, then we must
-* indicate that DMA to this device is not supported.
-*/
-   if (sizeof(mask) != sizeof(dma_addr_t) &&
-   mask > (dma_addr_t)~0 &&
-   dma_to_pfn(dev, ~0) < max_pfn - 1) {
-   if (warn) {
-   dev_warn(dev, "Coherent DMA mask %#llx is larger than 
dma_addr_t allows\n",
-mask);
-   dev_warn(dev, "Driver did not use or check the return 
value from dma_set_coherent_mask()?\n");
-   }
-   return 0;
-   }
-
-   max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
+   unsigned long max_dma_pfn = min(max_pfn, arm_dma_pfn_limit);
 
/*
 * Translate the device's DMA mask to a PFN limit.  This
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Pierre Morel

On 21/05/2019 13:11, Cornelia Huck wrote:

On Tue, 21 May 2019 11:14:38 +0200
Pierre Morel  wrote:


1) A short description, of zPCI functions and groups

IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
We access PCI cards through 2 ways:
- dedicated PCI instructions (pci_load/pci_store/pci/store_block)
- DMA


Quick question: What about the new pci instructions? Anything that
needs to be considered there?


No and yes.

No because they should be used when pci_{load,stor,store_block} are 
interpreted AFAIU.

And currently we only use interception.

Yes, because, the CLP part, use to setup the translations IIUC, (do not 
ask me for details now), will need to be re-issued by the kernel after 
some modifications and this will also need a way from QEMU S390 PCI down 
to the ZPCI driver.

Way that I try to setup with this patch.

So answer is not now but we should keep in mind that we will 
definitively need a way down to the zpci low level in the host.





We receive events through
- Adapter interrupts


Note for the non-s390 folks: These are (I/O) interrupts that are not
tied to a specific device. MSI-X is mapped to this.


- CHSC events


Another note for the non-s390 folks: This is a notification mechanism
that is using machine check interrupts; more information is retrieved
via a special instruction (chsc).



thanks, it is yes better to explain better :)



The adapter propose an IOMMU to protect the DMA
and the interrupt handling goes through a MSIX like interface handled by
the adapter.

The architecture specific PCI do the interface between the standard PCI
level and the zPCI function (PCI + DMA/IOMMU/Interrupt)

To handle the communication through the "zPCI way" the CLP interface
provides instructions to retrieve informations from the adapters.

There are different group of functions having same functionalities.

clp_list give us a list from zPCI functions
clp_query_pci_function returns informations specific to a function
clp_query_group returns information on a function group


2) Why do we need it in the guest

We need to provide the guest with information on the adapters and zPCI
functions returned by the clp_query instruction so that the guest's
driver gets the right information on how the way to the zPCI function
has been built in the host.


When a guest issues the CLP instructions we intercept the clp command in
QEMU and we need to feed the response with the right values for the guest.
The "right" values are not the raw CLP response values:

- some identifier must be virtualized, like UID and FID,

- some values must match what the host received from the CLP response,
like the size of the transmited blocks, the DMA Address Space Mask,
number of interrupt, MSIA

- some other must match what the host handled with the adapter and
function, the start and end of DMA,

- some what the host IOMMU driver supports (frame size),



3) We have three different way to get This information:

The PCI Linux interface is a standard PCI interface and some Z specific
information is available in sysfs.
Not all the information needed to be returned inside the CLP response is
available.
So we can not use the sysfs interface to get all the information.

There is a CLP ioctl interface but this interface is not secure in that
it returns the information for all adapters in the system.

The VFIO interface offers the advantage to point to a single PCI
function, so more secure than the clp ioctl interface.
Coupled with the s390_iommu we get access to the zPCI CLP instruction
and to the values handled by the zPCI driver.


4) Until now we used to fill the CLP response to the guest inside QEMU
with fixed values corresponding to the only PCI card we supported.
To support new cards we need to get the right values from the kernel out.


IIRC, the current code fills in values that make sense for one specific
type of card only, right?


yes, right


We also use the same values for emulated
cards (virtio); I assume that they are not completely weird for that
case...



No they are not.

For emulated cards, all is done inside QEMU, we do not need kernel 
access, the emulated cards get a specific emulation function and group 
assigned with pre-defined values.


I sent a QEMU patch related to this.
Even the kernel interface will change with the changes in the kernel 
patch, the emulation should continue in this way.


Regards,
Pierre










--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany



Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Cornelia Huck
On Tue, 21 May 2019 11:14:38 +0200
Pierre Morel  wrote:

> 1) A short description, of zPCI functions and groups
> 
> IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
> We access PCI cards through 2 ways:
> - dedicated PCI instructions (pci_load/pci_store/pci/store_block)
> - DMA

Quick question: What about the new pci instructions? Anything that
needs to be considered there?

> We receive events through
> - Adapter interrupts

Note for the non-s390 folks: These are (I/O) interrupts that are not
tied to a specific device. MSI-X is mapped to this.

> - CHSC events

Another note for the non-s390 folks: This is a notification mechanism
that is using machine check interrupts; more information is retrieved
via a special instruction (chsc).

> 
> The adapter propose an IOMMU to protect the DMA
> and the interrupt handling goes through a MSIX like interface handled by 
> the adapter.
> 
> The architecture specific PCI do the interface between the standard PCI 
> level and the zPCI function (PCI + DMA/IOMMU/Interrupt)
> 
> To handle the communication through the "zPCI way" the CLP interface 
> provides instructions to retrieve informations from the adapters.
> 
> There are different group of functions having same functionalities.
> 
> clp_list give us a list from zPCI functions
> clp_query_pci_function returns informations specific to a function
> clp_query_group returns information on a function group
> 
> 
> 2) Why do we need it in the guest
> 
> We need to provide the guest with information on the adapters and zPCI 
> functions returned by the clp_query instruction so that the guest's 
> driver gets the right information on how the way to the zPCI function 
> has been built in the host.
> 
> 
> When a guest issues the CLP instructions we intercept the clp command in 
> QEMU and we need to feed the response with the right values for the guest.
> The "right" values are not the raw CLP response values:
> 
> - some identifier must be virtualized, like UID and FID,
> 
> - some values must match what the host received from the CLP response, 
> like the size of the transmited blocks, the DMA Address Space Mask, 
> number of interrupt, MSIA
> 
> - some other must match what the host handled with the adapter and 
> function, the start and end of DMA,
> 
> - some what the host IOMMU driver supports (frame size),
> 
> 
> 
> 3) We have three different way to get This information:
> 
> The PCI Linux interface is a standard PCI interface and some Z specific 
> information is available in sysfs.
> Not all the information needed to be returned inside the CLP response is 
> available.
> So we can not use the sysfs interface to get all the information.
> 
> There is a CLP ioctl interface but this interface is not secure in that 
> it returns the information for all adapters in the system.
> 
> The VFIO interface offers the advantage to point to a single PCI 
> function, so more secure than the clp ioctl interface.
> Coupled with the s390_iommu we get access to the zPCI CLP instruction 
> and to the values handled by the zPCI driver.
> 
> 
> 4) Until now we used to fill the CLP response to the guest inside QEMU 
> with fixed values corresponding to the only PCI card we supported.
> To support new cards we need to get the right values from the kernel out.

IIRC, the current code fills in values that make sense for one specific
type of card only, right? We also use the same values for emulated
cards (virtio); I assume that they are not completely weird for that
case...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 04/16] ioasid: Add custom IOASID allocator

2019-05-21 Thread Auger Eric
Hi Jacob,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> Sometimes, IOASID allocation must be handled by platform specific
> code. The use cases are guest vIOMMU and pvIOMMU where IOASIDs need
> to be allocated by the host via enlightened or paravirt interfaces.
> 
> This patch adds an extension to the IOASID allocator APIs such that
> platform drivers can register a custom allocator, possibly at boot
> time, to take over the allocation. Xarray is still used for tracking
> and searching purposes internal to the IOASID code. Private data of
> an IOASID can also be set after the allocation.
> 
> There can be multiple custom allocators registered but only one is
> used at a time. In case of hot removal of devices that provides the
> allocator, all IOASIDs must be freed prior to unregistering the
> allocator. Default XArray based allocator cannot be mixed with
> custom allocators, i.e. custom allocators will not be used if there
> are outstanding IOASIDs allocated by the default XA allocator.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 125 
> +
>  1 file changed, 125 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 99f5e0a..ed2915a 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -17,6 +17,100 @@ struct ioasid_data {
>  };
>  
>  static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +static DEFINE_MUTEX(ioasid_allocator_lock);
> +static struct ioasid_allocator *active_custom_allocator;
> +
> +static LIST_HEAD(custom_allocators);
> +/*
> + * A flag to track if ioasid default allocator is in use, this will
> + * prevent custom allocator from being used. The reason is that custom 
> allocator
> + * must have unadulterated space to track private data with xarray, there 
> cannot
> + * be a mix been default and custom allocated IOASIDs.
> + */
> +static int default_allocator_active;
> +
> +/**
> + * ioasid_register_allocator - register a custom allocator
> + * @allocator: the custom allocator to be registered
> + *
> + * Custom allocators take precedence over the default xarray based allocator.
> + * Private data associated with the ASID are managed by ASID common code
> + * similar to data stored in xa.
> + *
> + * There can be multiple allocators registered but only one is active. In 
> case
> + * of runtime removal of a custom allocator, the next one is activated based
> + * on the registration ordering.
> + */
> +int ioasid_register_allocator(struct ioasid_allocator *allocator)
> +{
> + struct ioasid_allocator *pallocator;
> + int ret = 0;
> +
> + if (!allocator)
> + return -EINVAL;
is it really necessary? Sin't it the caller responsibility?
> +
> + mutex_lock(_allocator_lock);
> + /*
> +  * No particular preference since all custom allocators end up calling
> +  * the host to allocate IOASIDs. We activate the first one and keep
> +  * the later registered allocators in a list in case the first one gets
> +  * removed due to hotplug.
> +  */
> + if (list_empty(_allocators))
> + active_custom_allocator = allocator;> + else {
> + /* Check if the allocator is already registered */
> + list_for_each_entry(pallocator, _allocators, list) {
> + if (pallocator == allocator) {
> + pr_err("IOASID allocator already registered\n");
> + ret = -EEXIST;
> + goto out_unlock;
> + }
> + }
> + }
> + list_add_tail(>list, _allocators);
> +
> +out_unlock:
> + mutex_unlock(_allocator_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> +
> +/**
> + * ioasid_unregister_allocator - Remove a custom IOASID allocator
> + * @allocator: the custom allocator to be removed
> + *
> + * Remove an allocator from the list, activate the next allocator in
> + * the order it was registered.
> + */
> +void ioasid_unregister_allocator(struct ioasid_allocator *allocator)
> +{
> + if (!allocator)
> + return;
is it really necessary?
> +
> + if (list_empty(_allocators)) {
> + pr_warn("No custom IOASID allocators active!\n");
> + return;
> + }
> +
> + mutex_lock(_allocator_lock);
> + list_del(>list);
> + if (list_empty(_allocators)) {
> + pr_info("No custom IOASID allocators\n")> + /*
> +  * All IOASIDs should have been freed before the last custom
> +  * allocator is unregistered. Unless default allocator is in
> +  * use.
> +  */
> + BUG_ON(!xa_empty(_xa) && !default_allocator_active);
> + active_custom_allocator = NULL;
> + } else if (allocator == active_custom_allocator) {
In case you are removing the active custom allocator don't you also need
to check that all ioasids were freed. Otherwise you are likely to switch
to 

Re: [PATCH v3 03/16] iommu: Add I/O ASID allocator

2019-05-21 Thread Auger Eric
Hi,

On 5/4/19 12:32 AM, Jacob Pan wrote:
> From: Jean-Philippe Brucker 
> 
> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space ID)
> allows to share process address spaces with devices (SVA), partition a
> device into VM-assignable entities (VFIO mdev) or simply provide
> multiple DMA address space to kernel drivers. Add a global PASID
> allocator usable by different drivers at the same time. Name it I/O ASID
> to avoid confusion with ASIDs allocated by arch code, which are usually
> a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare occasions
> drivers would like to allocate PASIDs for devices that aren't managed by
> an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Link: https://lkml.org/lkml/2019/4/26/462
> ---
>  drivers/iommu/Kconfig  |   6 +++
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 140 
> +
>  include/linux/ioasid.h |  67 +++
>  4 files changed, 214 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 6f07f3b..75e7f97 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -2,6 +2,12 @@
>  config IOMMU_IOVA
>   tristate
>  
> +config IOASID
> + bool
> + help
> +   Enable the I/O Address Space ID allocator. A single ID space shared
> +   between different users.
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>   bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15..0efac6f 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)   += of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index 000..99f5e0a
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space, split 
> into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct ioasid_data {
> + ioasid_t id;
> + struct ioasid_set *set;
> + void *private;
> + struct rcu_head rcu;
> +};
> +
> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> + struct ioasid_data *ioasid_data;
> + int ret = 0;
> +
> + ioasid_data = xa_load(_xa, ioasid);
> + if (ioasid_data)
> + ioasid_data->private = data;
> + else
> + ret = -ENOENT;
> +
> + /* getter may use the private data */
> + synchronize_rcu();
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max (or %0 and %INT_MAX). Return the
> + * allocated ID on success, or INVALID_IOASID on failure. The @private 
> pointer
> + * is stored internally and can be retrieved with ioasid_find().
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +   void *private)
> +{
> + int id = INVALID_IOASID;
> + struct ioasid_data *data;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data)
> + return INVALID_IOASID;
> +
> + data->set = set;
> + data->private = private;
> +
> + if (xa_alloc(_xa, , data, XA_LIMIT(min, max), GFP_KERNEL)) {
> + pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
> + goto exit_free;
> + }
> + data->id = id;
> +
> +exit_free:
> + if (id < 0 || id == INVALID_IOASID) {
> + kfree(data);
> 

Re: [PATCH v2 4/4] vfio: vfio_iommu_type1: implement VFIO_IOMMU_INFO_CAPABILITIES

2019-05-21 Thread Pierre Morel

On 20/05/2019 20:23, Alex Williamson wrote:

On Mon, 20 May 2019 18:31:08 +0200
Pierre Morel  wrote:


On 20/05/2019 16:27, Cornelia Huck wrote:

On Mon, 20 May 2019 13:19:23 +0200
Pierre Morel  wrote:
   

On 17/05/2019 20:04, Pierre Morel wrote:

On 17/05/2019 18:41, Alex Williamson wrote:

On Fri, 17 May 2019 18:16:50 +0200
Pierre Morel  wrote:
 

We implement the capability interface for VFIO_IOMMU_GET_INFO.

When calling the ioctl, the user must specify
VFIO_IOMMU_INFO_CAPABILITIES to retrieve the capabilities and
must check in the answer if capabilities are supported.

The iommu get_attr callback will be used to retrieve the specific
attributes and fill the capabilities.

Currently two Z-PCI specific capabilities will be queried and
filled by the underlying Z specific s390_iommu:
VFIO_IOMMU_INFO_CAP_QFN for the PCI query function attributes
and
VFIO_IOMMU_INFO_CAP_QGRP for the PCI query function group.

Other architectures may add new capabilities in the same way
after enhancing the architecture specific IOMMU driver.

Signed-off-by: Pierre Morel 
---
    drivers/vfio/vfio_iommu_type1.c | 122
+++-
    1 file changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c
b/drivers/vfio/vfio_iommu_type1.c
index d0f731c..9435647 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1658,6 +1658,97 @@ static int
vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
    return ret;
    }
+static int vfio_iommu_type1_zpci_fn(struct iommu_domain *domain,
+    struct vfio_info_cap *caps, size_t size)
+{
+    struct vfio_iommu_type1_info_pcifn *info_fn;
+    int ret;
+
+    info_fn = kzalloc(size, GFP_KERNEL);
+    if (!info_fn)
+    return -ENOMEM;
+
+    ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_ZPCI_FN,
+    _fn->response);


What ensures that the 'struct clp_rsp_query_pci' returned from this
get_attr remains consistent with a 'struct vfio_iommu_pci_function'?
Why does the latter contains so many reserved fields (beyond simply
alignment) for a user API?  What fields of these structures are
actually useful to userspace?  Should any fields not be exposed to the
user?  Aren't BAR sizes redundant to what's available through the vfio
PCI API?  I'm afraid that simply redefining an internal structure as
the API leaves a lot to be desired too.  Thanks,

Alex
 

Hi Alex,

I simply used the structure returned by the firmware to be sure to be
consistent with future evolutions and facilitate the copy from CLP and
to userland.

If you prefer, and I understand that this is the case, I can define a
specific VFIO_IOMMU structure with only the fields relevant to the user,
leaving future enhancement of the user's interface being implemented in
another kernel patch when the time has come.


TBH, I had no idea that CLP is an s390 firmware interface and this is
just dumping that to userspace.  The cover letter says:

   Using the PCI VFIO interface allows userland, a.k.a. QEMU, to
   retrieve ZPCI specific information without knowing Z specific
   identifiers like the function ID or the function handle of the zPCI
   function hidden behind the PCI interface.

But what does this allow userland to do and what specific pieces of
information do they need?  We do have a case already where Intel
graphics devices have a table (OpRegion) living in host system memory
that we expose via a vfio region, so it wouldn't be unprecedented to do
something like this, but as Connie suggests, if we knew what was being
consumed here and why, maybe we could generalize it into something
useful for others.


OK, sorry I try to explain better.

1) A short description, of zPCI functions and groups

IN Z, PCI cards, leave behind an adapter between subchannels and PCI.
We access PCI cards through 2 ways:
- dedicated PCI instructions (pci_load/pci_store/pci/store_block)
- DMA
We receive events through
- Adapter interrupts
- CHSC events

The adapter propose an IOMMU to protect the DMA
and the interrupt handling goes through a MSIX like interface handled by 
the adapter.


The architecture specific PCI do the interface between the standard PCI 
level and the zPCI function (PCI + DMA/IOMMU/Interrupt)


To handle the communication through the "zPCI way" the CLP interface 
provides instructions to retrieve informations from the adapters.


There are different group of functions having same functionalities.

clp_list give us a list from zPCI functions
clp_query_pci_function returns informations specific to a function
clp_query_group returns information on a function group


2) Why do we need it in the guest

We need to provide the guest with information on the adapters and zPCI 
functions returned by the clp_query instruction so that the guest's 
driver gets the right information on how the way to the zPCI function 
has been built in the host.



When a guest issues the CLP instructions we intercept the clp 

[PATCH 1/2] iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock

2019-05-21 Thread Lu Baolu
From: Dave Jiang 

Lockdep debug reported lock inversion related with the iommu code
caused by dmar_insert_one_dev_info() grabbing the iommu->lock and
the device_domain_lock out of order versus the code path in
iommu_flush_dev_iotlb(). Expanding the scope of the iommu->lock and
reversing the order of lock acquisition fixes the issue.

[   76.238180] dsa_bus wq0.0: dsa wq wq0.0 disabled
[   76.248706]
[   76.250486] 
[   76.257113] WARNING: possible irq lock inversion dependency detected
[   76.263736] 5.1.0-rc5+ #162 Not tainted
[   76.267854] 
[   76.274485] systemd-journal/521 just changed the state of lock:
[   76.280685] 55b330f5 (device_domain_lock){..-.}, at: 
iommu_flush_dev_iotlb.part.63+0x29/0x90
[   76.290099] but this lock took another, SOFTIRQ-unsafe lock in the past:
[   76.297093]  (&(>lock)->rlock){+.+.}
[   76.297094]
[   76.297094]
[   76.297094] and interrupts could create inverse lock ordering between them.
[   76.297094]
[   76.314257]
[   76.314257] other info that might help us debug this:
[   76.321448]  Possible interrupt unsafe locking scenario:
[   76.321448]
[   76.328907]CPU0CPU1
[   76.333777]
[   76.338642]   lock(&(>lock)->rlock);
[   76.343165]local_irq_disable();
[   76.349422]lock(device_domain_lock);
[   76.356116]lock(&(>lock)->rlock);
[   76.363154]   
[   76.366134] lock(device_domain_lock);
[   76.370548]
[   76.370548]  *** DEADLOCK ***

Fixes: 745f2586e78e ("iommu/vt-d: Simplify function get_domain_for_dev()")
Signed-off-by: Dave Jiang 
Reviewed-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..91f4912c09c6 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2512,6 +2512,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
}
}
 
+   spin_lock(>lock);
spin_lock_irqsave(_domain_lock, flags);
if (dev)
found = find_domain(dev);
@@ -2527,17 +2528,16 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
 
if (found) {
spin_unlock_irqrestore(_domain_lock, flags);
+   spin_unlock(>lock);
free_devinfo_mem(info);
/* Caller must free the original domain */
return found;
}
 
-   spin_lock(>lock);
ret = domain_attach_iommu(domain, iommu);
-   spin_unlock(>lock);
-
if (ret) {
spin_unlock_irqrestore(_domain_lock, flags);
+   spin_unlock(>lock);
free_devinfo_mem(info);
return NULL;
}
@@ -2547,6 +2547,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev)
dev->archdata.iommu = info;
spin_unlock_irqrestore(_domain_lock, flags);
+   spin_unlock(>lock);
 
/* PASID table is mandatory for a PCI device in scalable mode. */
if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu: Use right function to get group for device

2019-05-21 Thread Lu Baolu
The iommu_group_get_for_dev() will allocate a group for a
device if it isn't in any group. This isn't the use case
in iommu_request_dm_for_dev(). Let's use iommu_group_get()
instead.

Fixes: d290f1e70d85a ("iommu: Introduce iommu_request_dm_for_dev()")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..3fa025f849e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1915,9 +1915,9 @@ int iommu_request_dm_for_dev(struct device *dev)
int ret;
 
/* Device must already be in a group before calling this function */
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   group = iommu_group_get(dev);
+   if (!group)
+   return -EINVAL;
 
mutex_lock(>mutex);
 
-- 
2.17.1