[PATCH v3] iommu/vt-d: consider real PCI device when checking if mapping is needed

2020-02-17 Thread Daniel Drake
From: Jon Derrick 

The PCI devices handled by intel-iommu may have a DMA requester on
another bus, such as VMD subdevices needing to use the VMD endpoint.

The real DMA device is now used for the DMA mapping, but one case was
missed earlier: if the VMD device (and hence subdevices too) are under
IOMMU_DOMAIN_IDENTITY, mappings do not work.

Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by
creating an iommu DMA mapping, and fall back on dma_direct_map_page()
for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY
case is broken when intel_page_page() handles a subdevice.

We observe that at iommu attach time, dmar_insert_one_dev_info() for
the subdevices will never set dev->archdata.iommu. This is because
that function uses find_domain() to check if there is already an IOMMU
for the device, and find_domain() then defers to the real DMA device
which does have one. Thus dmar_insert_one_dev_info() returns without
assigning dev->archdata.iommu.

Then, later:

1. intel_map_page() checks if an IOMMU mapping is needed by calling
   iommu_need_mapping() on the subdevice. identity_mapping() returns
   false because dev->archdata.iommu is NULL, so this function
   returns false indicating that mapping is needed.
2. __intel_map_single() is called to create the mapping.
3. __intel_map_single() calls find_domain(). This function now returns
   the IDENTITY domain corresponding to the real DMA device.
4. __intel_map_single() calls domain_get_iommu() on this "real" domain.
   A failure is hit and the entire operation is aborted, because this
   codepath is not intended to handle IDENTITY mappings:
   if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
   return NULL;

Fix this by using the real DMA device when checking if a mapping is
needed, while also considering the subdevice DMA mask.
The IDENTITY case will then directly fall back on dma_direct_map_page().

Reported-by: Daniel Drake 
Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
Signed-off-by: Jon Derrick 
Signed-off-by: Daniel Drake 
---
 drivers/iommu/intel-iommu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc37672bf89..7ffd252bf835 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3582,12 +3582,16 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap process.*/
 static bool iommu_need_mapping(struct device *dev)
 {
+   struct device *dma_dev = dev;
int ret;
 
if (iommu_dummy(dev))
return false;
 
-   ret = identity_mapping(dev);
+   if (dev_is_pci(dev))
+   dma_dev = &pci_real_dma_dev(to_pci_dev(dev))->dev;
+
+   ret = identity_mapping(dma_dev);
if (ret) {
u64 dma_mask = *dev->dma_mask;
 
@@ -3601,19 +3605,19 @@ static bool iommu_need_mapping(struct device *dev)
 * 32 bit DMA is removed from si_domain and fall back to
 * non-identity mapping.
 */
-   dmar_remove_one_dev_info(dev);
-   ret = iommu_request_dma_domain_for_dev(dev);
+   dmar_remove_one_dev_info(dma_dev);
+   ret = iommu_request_dma_domain_for_dev(dma_dev);
if (ret) {
struct iommu_domain *domain;
struct dmar_domain *dmar_domain;
 
-   domain = iommu_get_domain_for_dev(dev);
+   domain = iommu_get_domain_for_dev(dma_dev);
if (domain) {
dmar_domain = to_dmar_domain(domain);
dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
}
-   dmar_remove_one_dev_info(dev);
-   get_private_domain_for_dev(dev);
+   dmar_remove_one_dev_info(dma_dev);
+   get_private_domain_for_dev(dma_dev);
}
 
dev_info(dev, "32bit DMA uses non-identity mapping\n");
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)

2020-02-17 Thread Liu, Yi L
> From: Liu, Yi L 
> Sent: Friday, January 31, 2020 8:41 PM
> To: Alex Williamson 
> Subject: RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> > > +unsigned int pasid)
> > > +{
> > > + struct vfio_mm *vmm = iommu->vmm;
> > > + int ret = 0;
> > > +
> > > + mutex_lock(&iommu->lock);
> > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> >
> > But we could have been IOMMU backed when the pasid was allocated, did we
> just
> > leak something?  In fact, I didn't spot anything in this series that handles
> > a container with pasids allocated losing iommu backing.
> > I'd think we want to release all pasids when that happens since permission 
> > for
> > the user to hold pasids goes along with having an iommu backed device.
> 
> oh, yes. If a container lose iommu backend, then needs to reclaim the 
> allocated
> PASIDs. right? I'll add it. :-)

Hi Alex,

I went through the flow again. Maybe current series has already covered
it. There is vfio_mm which is used to track allocated PASIDs. Its life
cycle is type1 driver open and release. If I understand it correctly,
type1 driver release happens when there is no more iommu backed groups
in a container.

static void __vfio_group_unset_container(struct vfio_group *group)
{
[...]

/* Detaching the last group deprivileges a container, remove iommu */
if (driver && list_empty(&container->group_list)) {
driver->ops->release(container->iommu_data);
module_put(driver->ops->owner);
container->iommu_driver = NULL;
container->iommu_data = NULL;
}
[...]
}

Regards,
Yi Liu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()

2020-02-17 Thread Lu Baolu

Hi Joerg,

Thanks for doing this.

On 2020/2/18 3:38, Joerg Roedel wrote:

From: Joerg Roedel 

The attachment of deferred devices needs to happen before the check
whether the device is identity mapped or not. Otherwise the check will
return wrong results, cause warnings boot failures in kdump kernels, like

WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 
domain_get_iommu+0x61/0x70

[...]

 Call Trace:
  __intel_map_single+0x55/0x190
  intel_alloc_coherent+0xac/0x110
  dmam_alloc_attrs+0x50/0xa0
  ahci_port_start+0xfb/0x1f0 [libahci]
  ata_host_start.part.39+0x104/0x1e0 [libata]

With the earlier check the kdump boot succeeds and a crashdump is written.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/intel-iommu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 42cdcce1602e..32f43695a22b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2541,9 +2541,6 @@ static void do_deferred_attach(struct device *dev)
  
  static struct dmar_domain *deferred_attach_domain(struct device *dev)

  {
-   if (unlikely(attach_deferred(dev)))
-   do_deferred_attach(dev);
-


This should also be moved to the call place of deferred_attach_domain()
in bounce_map_single().

bounce_map_single() assumes that devices always use DMA domain, so it
doesn't call iommu_need_mapping(). We could do_deferred_attach() there
manually.

Best regards,
baolu


return find_domain(dev);
  }
  
@@ -3595,6 +3592,9 @@ static bool iommu_need_mapping(struct device *dev)

if (iommu_dummy(dev))
return false;
  
+	if (unlikely(attach_deferred(dev)))

+   do_deferred_attach(dev);
+
ret = identity_mapping(dev);
if (ret) {
u64 dma_mask = *dev->dma_mask;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] iommu/vt-d: Simplify check in identity_mapping()

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Joerg Roedel wrote:

From: Joerg Roedel 

The function only has one call-site and there it is never called with
dummy or deferred devices. Simplify the check in the function to
account for that.

Signed-off-by: Joerg Roedel 


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Joerg Roedel wrote:

From: Joerg Roedel 

The attachment of deferred devices needs to happen before the check
whether the device is identity mapped or not. Otherwise the check will
return wrong results, cause warnings boot failures in kdump kernels, like

WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 
domain_get_iommu+0x61/0x70

[...]

 Call Trace:
  __intel_map_single+0x55/0x190
  intel_alloc_coherent+0xac/0x110
  dmam_alloc_attrs+0x50/0xa0
  ahci_port_start+0xfb/0x1f0 [libahci]
  ata_host_start.part.39+0x104/0x1e0 [libata]

With the earlier check the kdump boot succeeds and a crashdump is written.

Signed-off-by: Joerg Roedel 


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/vt-d: Remove deferred_attach_domain()

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Joerg Roedel wrote:

From: Joerg Roedel 

The function is now only a wrapper around find_domain(). Remove the
function and call find_domain() directly at the call-sites.

Signed-off-by: Joerg Roedel 


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/vt-d: Move deferred device attachment into helper function

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Joerg Roedel wrote:

From: Joerg Roedel 

Move the code that does the deferred device attachment into a separate
helper function.

Signed-off-by: Joerg Roedel 


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/vt-d: Add attach_deferred() helper

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Joerg Roedel wrote:

From: Joerg Roedel 

Implement a helper function to check whether a device's attach process
is deferred.

Signed-off-by: Joerg Roedel 


Reviewed-by: Jerry Snitselaar 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] iommu/vt-d: Remove deferred_attach_domain()

2020-02-17 Thread Joerg Roedel
From: Joerg Roedel 

The function is now only a wrapper around find_domain(). Remove the
function and call find_domain() directly at the call-sites.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 32f43695a22b..16e47ca505eb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2539,11 +2539,6 @@ static void do_deferred_attach(struct device *dev)
intel_iommu_attach_device(domain, dev);
 }
 
-static struct dmar_domain *deferred_attach_domain(struct device *dev)
-{
-   return find_domain(dev);
-}
-
 static inline struct device_domain_info *
 dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
 {
@@ -3643,7 +3638,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
 
BUG_ON(dir == DMA_NONE);
 
-   domain = deferred_attach_domain(dev);
+   domain = find_domain(dev);
if (!domain)
return DMA_MAPPING_ERROR;
 
@@ -3863,7 +3858,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
if (!iommu_need_mapping(dev))
return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
 
-   domain = deferred_attach_domain(dev);
+   domain = find_domain(dev);
if (!domain)
return 0;
 
@@ -3958,7 +3953,7 @@ bounce_map_single(struct device *dev, phys_addr_t paddr, 
size_t size,
int prot = 0;
int ret;
 
-   domain = deferred_attach_domain(dev);
+   domain = find_domain(dev);
if (WARN_ON(dir == DMA_NONE || !domain))
return DMA_MAPPING_ERROR;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()

2020-02-17 Thread Joerg Roedel
From: Joerg Roedel 

The attachment of deferred devices needs to happen before the check
whether the device is identity mapped or not. Otherwise the check will
return wrong results, cause warnings boot failures in kdump kernels, like

WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 
domain_get_iommu+0x61/0x70

[...]

 Call Trace:
  __intel_map_single+0x55/0x190
  intel_alloc_coherent+0xac/0x110
  dmam_alloc_attrs+0x50/0xa0
  ahci_port_start+0xfb/0x1f0 [libahci]
  ata_host_start.part.39+0x104/0x1e0 [libata]

With the earlier check the kdump boot succeeds and a crashdump is written.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 42cdcce1602e..32f43695a22b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2541,9 +2541,6 @@ static void do_deferred_attach(struct device *dev)
 
 static struct dmar_domain *deferred_attach_domain(struct device *dev)
 {
-   if (unlikely(attach_deferred(dev)))
-   do_deferred_attach(dev);
-
return find_domain(dev);
 }
 
@@ -3595,6 +3592,9 @@ static bool iommu_need_mapping(struct device *dev)
if (iommu_dummy(dev))
return false;
 
+   if (unlikely(attach_deferred(dev)))
+   do_deferred_attach(dev);
+
ret = identity_mapping(dev);
if (ret) {
u64 dma_mask = *dev->dma_mask;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] iommu/vt-d: Move deferred device attachment into helper function

2020-02-17 Thread Joerg Roedel
From: Joerg Roedel 

Move the code that does the deferred device attachment into a separate
helper function.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 80f2332a5466..42cdcce1602e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2529,16 +2529,20 @@ struct dmar_domain *find_domain(struct device *dev)
return NULL;
 }
 
-static struct dmar_domain *deferred_attach_domain(struct device *dev)
+static void do_deferred_attach(struct device *dev)
 {
-   if (unlikely(attach_deferred(dev))) {
-   struct iommu_domain *domain;
+   struct iommu_domain *domain;
 
-   dev->archdata.iommu = NULL;
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain)
-   intel_iommu_attach_device(domain, dev);
-   }
+   dev->archdata.iommu = NULL;
+   domain = iommu_get_domain_for_dev(dev);
+   if (domain)
+   intel_iommu_attach_device(domain, dev);
+}
+
+static struct dmar_domain *deferred_attach_domain(struct device *dev)
+{
+   if (unlikely(attach_deferred(dev)))
+   do_deferred_attach(dev);
 
return find_domain(dev);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] iommu/vt-d: Fix kdump boot with VT-d enabled

2020-02-17 Thread Joerg Roedel
Hi,

booting into a crashdump kernel with Intel IOMMU enabled and
configured into passthrough mode does not succeed with the current
kernel. The reason is that the check for identity mappings happen
before the check for deferred device attachments. That results in
wrong results returned from iommu_need_mapping() and subsequently in a
wrong domain-type used in __intel_map_single(). A stripped oops is in
the commit-message of patch 3.

The patch-set fixes the issue and does a few code cleanups along the
way. I have not yet researched the stable and fixes tags, but when the
patches are fine I will add the tags before applying the patches.

Please review.

Thanks,

Joerg

Joerg Roedel (5):
  iommu/vt-d: Add attach_deferred() helper
  iommu/vt-d: Move deferred device attachment into helper function
  iommu/vt-d: Do deferred attachment in iommu_need_mapping()
  iommu/vt-d: Remove deferred_attach_domain()
  iommu/vt-d: Simplify check in identity_mapping()

 drivers/iommu/intel-iommu.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] iommu/vt-d: Simplify check in identity_mapping()

2020-02-17 Thread Joerg Roedel
From: Joerg Roedel 

The function only has one call-site and there it is never called with
dummy or deferred devices. Simplify the check in the function to
account for that.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 16e47ca505eb..0b5a0fadbc0c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2916,7 +2916,7 @@ static int identity_mapping(struct device *dev)
struct device_domain_info *info;
 
info = dev->archdata.iommu;
-   if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != 
DEFER_DEVICE_DOMAIN_INFO)
+   if (info)
return (info->domain == si_domain);
 
return 0;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] iommu/vt-d: Add attach_deferred() helper

2020-02-17 Thread Joerg Roedel
From: Joerg Roedel 

Implement a helper function to check whether a device's attach process
is deferred.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9dc37672bf89..80f2332a5466 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -762,6 +762,11 @@ static int iommu_dummy(struct device *dev)
return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
+static bool attach_deferred(struct device *dev)
+{
+   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+}
+
 /**
  * is_downstream_to_pci_bridge - test if a device belongs to the PCI
  *  sub-hierarchy of a candidate PCI-PCI bridge
@@ -2510,8 +2515,7 @@ struct dmar_domain *find_domain(struct device *dev)
 {
struct device_domain_info *info;
 
-   if (unlikely(dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO ||
-dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO))
+   if (unlikely(attach_deferred(dev) || iommu_dummy(dev)))
return NULL;
 
if (dev_is_pci(dev))
@@ -2527,7 +2531,7 @@ struct dmar_domain *find_domain(struct device *dev)
 
 static struct dmar_domain *deferred_attach_domain(struct device *dev)
 {
-   if (unlikely(dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO)) {
+   if (unlikely(attach_deferred(dev))) {
struct iommu_domain *domain;
 
dev->archdata.iommu = NULL;
@@ -6133,7 +6137,7 @@ intel_iommu_aux_get_pasid(struct iommu_domain *domain, 
struct device *dev)
 static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain,
   struct device *dev)
 {
-   return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
+   return attach_deferred(dev);
 }
 
 static int
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

2020-02-17 Thread Robin Murphy

On 13/02/2020 9:49 pm, Rob Herring wrote:

On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy  wrote:


On 30/01/2020 3:06 pm, Auger Eric wrote:

Hi Rob,
On 1/17/20 10:16 PM, Rob Herring wrote:

Arm SMMUv3.2 adds support for TLB range invalidate operations.
Support for range invalidate is determined by the RIL bit in the IDR3
register.

The range invalidate is in units of the leaf page size and operates on
1-32 chunks of a power of 2 multiple pages. First, we determine from the
size what power of 2 multiple we can use. Then we calculate how many
chunks (1-31) of the power of 2 size for the range on the iteration. On
each iteration, we move up in size by at least 5 bits.

Cc: Eric Auger 
Cc: Jean-Philippe Brucker 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Joerg Roedel 
Signed-off-by: Rob Herring 




@@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, 
size_t size,
   {
  u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
  struct arm_smmu_device *smmu = smmu_domain->smmu;
-unsigned long start = iova, end = iova + size;
+unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0;
  int i = 0;
  struct arm_smmu_cmdq_ent cmd = {
  .tlbi = {
@@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, 
size_t size,
  cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
  }

+if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
+/* Get the leaf page size */
+tg = __ffs(smmu_domain->domain.pgsize_bitmap);
+
+/* Convert page size of 12,14,16 (log2) to 1,2,3 */
+cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1;


Given the comment, I think "(tg - 10) / 2" would suffice ;)


Well, duh...




+
+/* Determine what level the granule is at */
+cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));


Is it possible to rephrase that with logs and shifts to avoid the division?


Well, with a loop is the only other way I came up with:

bpl = tg - 3;
ttl = 3;
mask = BIT(bpl) - 1;
while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0)
 ttl--;

Doesn't seem like much improvement to me given we have h/w divide...


Sure, it doesn't take too many extra instructions to start matching 
typical IDIV latency, so there's no point being silly if there really 
isn't a clean alternative.


Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually 
be close enough, but perhaps that's a bit too cheeky.





+
+num_pages = size / (1UL << tg);


Similarly, in my experience GCC has always seemed too cautious to elide
non-constant division even in a seemingly-obvious case like this, so
explicit "size >> tg" might be preferable.


My experience has been gcc is smart enough. I checked and there's only
one divide and it is the prior one. But I'll change it anyways.


Now that I think about it, the case that frustrated me may have had the 
shift and divide in separate statements - that's probably a lot harder 
to analyse than a single expression. Either way, the simple right shift 
is easier to read IMO.



+}
+
  while (iova < end) {
  if (i == CMDQ_BATCH_ENTRIES) {
  arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, false);
  i = 0;
  }

+if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
+/*
+ * On each iteration of the loop, the range is 5 bits
+ * worth of the aligned size remaining.
+ * The range in pages is:
+ *
+ * range = (num_pages & (0x1f << __ffs(num_pages)))
+ */
+unsigned long scale, num;
+
+/* Determine the power of 2 multiple number of pages */
+scale = __ffs(num_pages);
+cmd.tlbi.scale = scale;
+
+/* Determine how many chunks of 2^scale size we have */
+num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX;
+cmd.tlbi.num = num - 1;
+
+/* range is num * 2^scale * pgsize */
+granule = num << (scale + tg);
+
+/* Clear out the lower order bits for the next iteration */
+num_pages -= num << scale;

Regarding the 2 options given in
https://lore.kernel.org/linux-arm-kernel/CAL_JsqKABoE+0crGwyZdNogNgEoG=moopf6deqgh6s73c0u...@mail.gmail.com/raw,

I understand you implemented 2) but I still do not understand why you
preferred that one against 1).

In your case of 1023*4k pages this will invalidate by 31 32*2^0*4K +
31*2^0*4K pages
whereas you could achieve that with 10 invalidations with the 1st algo.
I did not get the case where it is more efficient. Please can you detail.


I guess essentially we want to solve for two variables to get a range as
close to size as possible. There might be a more elegant numerical
method, but for the n

Re: [PATCH v2] iommu/arm-smmu-v3: Batch ATC invalidation commands

2020-02-17 Thread Jean-Philippe Brucker
On Thu, Feb 13, 2020 at 02:56:00PM -0600, Rob Herring wrote:
> Similar to commit 2af2e72b18b4 ("iommu/arm-smmu-v3: Defer TLB
> invalidation until ->iotlb_sync()"), build up a list of ATC invalidation
> commands and submit them all at once to the command queue instead of
> one-by-one.
> 
> As there is only one caller of arm_smmu_atc_inv_master() left, we can
> simplify it and avoid passing in struct arm_smmu_cmdq_ent.
> 
> Cc: Jean-Philippe Brucker 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Joerg Roedel 
> Signed-off-by: Rob Herring 

Reviewed-by: Jean-Philippe Brucker 


Since I'm adding a third user of cmdq batching [1], I had a go at
factoring them. I can send the attached patch with my next version, if it
looks OK.

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20200213101435.229932-4-jean-phili...@linaro.org/

>From b304f322e6293be4ec8b5a01e2ef67e8fa34143c Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker 
Date: Mon, 17 Feb 2020 17:42:54 +0100
Subject: [PATCH] iommu/arm-smmu-v3: Factor command queue batching

Factor the code for command queue batching, which is now repeated three
times for TLB, ATC and CFG invalidations.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 66 +++--
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 45da5c251b65..04c3077589be 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -548,6 +548,11 @@ struct arm_smmu_cmdq {
atomic_tlock;
 };
 
+struct arm_smmu_cmdq_batch {
+   u64 cmds[CMDQ_BATCH_ENTRIES * 
CMDQ_ENT_DWORDS];
+   int num;
+};
+
 struct arm_smmu_evtq {
struct arm_smmu_queue   q;
u32 max_stalls;
@@ -1482,15 +1487,33 @@ static int arm_smmu_cmdq_issue_sync(struct 
arm_smmu_device *smmu)
return arm_smmu_cmdq_issue_cmdlist(smmu, NULL, 0, true);
 }
 
+static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
+   struct arm_smmu_cmdq_batch *cmds,
+   struct arm_smmu_cmdq_ent *cmd)
+{
+   if (cmds->num == CMDQ_BATCH_ENTRIES) {
+   arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
+   cmds->num = 0;
+   }
+   arm_smmu_cmdq_build_cmd(&cmds->cmds[cmds->num * CMDQ_ENT_DWORDS], cmd);
+   cmds->num++;
+}
+
+static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
+ struct arm_smmu_cmdq_batch *cmds)
+{
+   return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
+}
+
+
 /* Context descriptor manipulation functions */
 static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain,
 int ssid, bool leaf)
 {
size_t i;
-   int cmdn = 0;
unsigned long flags;
struct arm_smmu_master *master;
-   u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
+   struct arm_smmu_cmdq_batch cmds = {};
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cmdq_ent cmd = {
.opcode = CMDQ_OP_CFGI_CD,
@@ -1503,19 +1526,13 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
*smmu_domain,
spin_lock_irqsave(&smmu_domain->devices_lock, flags);
list_for_each_entry(master, &smmu_domain->devices, domain_head) {
for (i = 0; i < master->num_sids; i++) {
-   if (cmdn == CMDQ_BATCH_ENTRIES) {
-   arm_smmu_cmdq_issue_cmdlist(smmu, cmds, cmdn, 
false);
-   cmdn = 0;
-   }
-
cmd.cfgi.sid = master->sids[i];
-   arm_smmu_cmdq_build_cmd(&cmds[cmdn * CMDQ_ENT_DWORDS], 
&cmd);
-   cmdn++;
+   arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd);
}
}
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-   arm_smmu_cmdq_issue_cmdlist(smmu, cmds, cmdn, true);
+   arm_smmu_cmdq_batch_submit(smmu, &cmds);
 }
 
 static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
@@ -2160,11 +2177,11 @@ static int arm_smmu_atc_inv_master(struct 
arm_smmu_master *master)
 static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
   int ssid, unsigned long iova, size_t size)
 {
-   int i, cmdn = 0;
+   int i;
unsigned long flags;
struct arm_smmu_cmdq_ent cmd;
struct arm_smmu_master *master;
-   u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
+   struct arm_smmu_cmdq_batch cmds = {};
 
if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
return 0;
@@ -2194,20 +2211,13 @@ static int arm_smmu_atc_inv_domain(struct 
arm_smmu_domain *smmu_domain,
  

Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment

2020-02-17 Thread Robin Murphy

On 14/02/2020 8:30 pm, Liam Mark wrote:


When the IOVA framework applies IOVA alignment it aligns all
IOVAs to the smallest PAGE_SIZE order which is greater than or
equal to the requested IOVA size.

We support use cases that requires large buffers (> 64 MB in
size) to be allocated and mapped in their stage 1 page tables.
However, with this alignment scheme we find ourselves running
out of IOVA space for 32 bit devices, so we are proposing this
config, along the similar vein as CONFIG_CMA_ALIGNMENT for CMA
allocations.


As per [1], I'd really like to better understand the allocation patterns 
that lead to such a sparsely-occupied address space to begin with, given 
that the rbtree allocator is supposed to try to maintain locality as far 
as possible, and the rcaches should further improve on that. Are you 
also frequently cycling intermediate-sized buffers which are smaller 
than 64MB but still too big to be cached?  Are there a lot of 
non-power-of-two allocations?



Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of
IOVAs to some desired PAGE_SIZE order, specified by
CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of
fragmentation caused by the current IOVA alignment scheme, and
gives better IOVA space utilization.


Even if the general change did prove reasonable, this IOVA allocator is 
not owned by the DMA API, so entirely removing the option of strict 
size-alignment feels a bit uncomfortable. Personally I'd replace the 
bool argument with an actual alignment value to at least hand the 
authority out to individual callers.


Furthermore, even in DMA API terms, is anyone really ever going to 
bother tuning that config? Since iommu-dma is supposed to be a 
transparent layer, arguably it shouldn't behave unnecessarily 
differently from CMA, so simply piggy-backing off CONFIG_CMA_ALIGNMENT 
would seem logical.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/1581721602-17010-1-git-send-email-isa...@codeaurora.org/



Signed-off-by: Liam Mark 
---
  drivers/iommu/Kconfig | 31 +++
  drivers/iommu/iova.c  | 20 +++-
  2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d2fade984999..9684a153cc72 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,37 @@
  config IOMMU_IOVA
tristate
  
+if IOMMU_IOVA

+
+config IOMMU_LIMIT_IOVA_ALIGNMENT
+   bool "Limit IOVA alignment"
+   help
+ When the IOVA framework applies IOVA alignment it aligns all
+ IOVAs to the smallest PAGE_SIZE order which is greater than or
+ equal to the requested IOVA size. This works fine for sizes up
+ to several MiB, but for larger sizes it results in address
+ space wastage and fragmentation. For example drivers with a 4
+ GiB IOVA space might run out of IOVA space when allocating
+ buffers great than 64 MiB.
+
+ Enable this option to impose a limit on the alignment of IOVAs.
+
+ If unsure, say N.
+
+config IOMMU_IOVA_ALIGNMENT
+   int "Maximum PAGE_SIZE order of alignment for IOVAs"
+   depends on IOMMU_LIMIT_IOVA_ALIGNMENT
+   range 4 9
+   default 9
+   help
+ With this parameter you can specify the maximum PAGE_SIZE order for
+ IOVAs. Larger IOVAs will be aligned only to this specified order.
+ The order is expressed a power of two multiplied by the PAGE_SIZE.
+
+ If unsure, leave the default value "9".
+
+endif
+
  # The IOASID library may also be used by non-IOMMU_API users
  config IOASID
tristate
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a9536eca6..259884c8dbd1 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -177,6 +177,24 @@ int init_iova_flush_queue(struct iova_domain *iovad,
rb_insert_color(&iova->node, root);
  }
  
+#ifdef CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT

+static unsigned long limit_align_shift(struct iova_domain *iovad,
+  unsigned long shift)
+{
+   unsigned long max_align_shift;
+
+   max_align_shift = CONFIG_IOMMU_IOVA_ALIGNMENT + PAGE_SHIFT
+   - iova_shift(iovad);
+   return min_t(unsigned long, max_align_shift, shift);
+}
+#else
+static unsigned long limit_align_shift(struct iova_domain *iovad,
+  unsigned long shift)
+{
+   return shift;
+}
+#endif
+
  static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
@@ -188,7 +206,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
unsigned long align_mask = ~0UL;
  
  	if (size_aligned)

-   align_mask <<= fls_long(size - 1);
+   align_mask <<= limit_align_shift(iovad, fls_long(size - 1));
  
  	/* Walk the tree backwards */

spin_

Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm

2020-02-17 Thread Robin Murphy

On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote:

From: Liam Mark 

Using the best-fit algorithm, instead of the first-fit
algorithm, may reduce fragmentation when allocating
IOVAs.


What kind of pathological allocation patterns make that a serious 
problem? Is there any scope for simply changing the order of things in 
the callers? Do these drivers also run under other DMA API backends 
(e.g. 32-bit Arm)?


More generally, if a driver knows enough to want to second-guess a 
generic DMA API allocator, that's a reasonable argument that it should 
perhaps be properly IOMMU-aware and managing its own address space 
anyway. Perhaps this effort might be better spent finishing off the DMA 
ops bypass stuff to make that approach more robust and welcoming.


Robin.


Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/dma-iommu.c | 17 +++
  drivers/iommu/iova.c  | 73 +--
  include/linux/dma-iommu.h |  7 +
  include/linux/iova.h  |  1 +
  4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5..af08770 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct device *dev,
if (unlikely(ops->is_attach_deferred &&
ops->is_attach_deferred(domain, dev)))
return iommu_attach_device(domain, dev);
+   return 0;
+}
+
+/*
+ * Should be called prior to using dma-apis.
+ */
+int iommu_dma_enable_best_fit_algo(struct device *dev)
+{
+   struct iommu_domain *domain;
+   struct iova_domain *iovad;
+
+   domain = iommu_get_domain_for_dev(dev);
+   if (!domain || !domain->iova_cookie)
+   return -EINVAL;
  
+	iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;

+   iovad->best_fit = true;
return 0;
  }
+EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo);
  
  /**

   * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 0e6a953..716b05f 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+   iovad->best_fit = false;
init_iova_rcaches(iovad);
  }
  EXPORT_SYMBOL_GPL(init_iova_domain);
@@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
  }
  
+static int __alloc_and_insert_iova_best_fit(struct iova_domain *iovad,

+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new, bool size_aligned)
+{
+   struct rb_node *curr, *prev;
+   struct iova *curr_iova, *prev_iova;
+   unsigned long flags;
+   unsigned long align_mask = ~0UL;
+   struct rb_node *candidate_rb_parent;
+   unsigned long new_pfn, candidate_pfn = ~0UL;
+   unsigned long gap, candidate_gap = ~0UL;
+
+   if (size_aligned)
+   align_mask <<= limit_align(iovad, fls_long(size - 1));
+
+   /* Walk the tree backwards */
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   curr = &iovad->anchor.node;
+   prev = rb_prev(curr);
+   for (; prev; curr = prev, prev = rb_prev(curr)) {
+   curr_iova = rb_entry(curr, struct iova, node);
+   prev_iova = rb_entry(prev, struct iova, node);
+
+   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
+   new_pfn = (limit_pfn - size) & align_mask;
+   gap = curr_iova->pfn_lo - prev_iova->pfn_hi - 1;
+   if ((limit_pfn >= size) && (new_pfn > prev_iova->pfn_hi)
+   && (gap < candidate_gap)) {
+   candidate_gap = gap;
+   candidate_pfn = new_pfn;
+   candidate_rb_parent = curr;
+   if (gap == size)
+   goto insert;
+   }
+   }
+
+   curr_iova = rb_entry(curr, struct iova, node);
+   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
+   new_pfn = (limit_pfn - size) & align_mask;
+   gap = curr_iova->pfn_lo - iovad->start_pfn;
+   if (limit_pfn >= size && new_pfn >= iovad->start_pfn &&
+   gap < candidate_gap) {
+   candidate_gap = gap;
+   candidate_pfn = new_pfn;
+   candidate_rb_parent = curr;
+   }
+
+insert:
+   if (candidate_pfn == ~0UL) {
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+
+   /* pfn_lo will point to size aligned address if size_aligned is set */
+   new->pfn_lo = candidate_pfn;
+  

Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-17 Thread Robin Murphy

On 17/02/2020 8:01 am, Christoph Hellwig wrote:

On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote:

From: Liam Mark 

Some devices have a memory map which contains gaps or holes.
In order for the device to have as much IOVA space as possible,
allow its driver to inform the DMA-IOMMU layer that it should
not allocate addresses from these holes.


Layering violation.  dma-iommu is the translation layer between the
DMA API and the IOMMU API.  And calls into it from drivers performing
DMA mappings need to go through the DMA API (and be documented there).


+1

More than that, though, we already have "holes in the address space" 
support for the sake of PCI host bridge windows - assuming this is the 
same kind of thing (i.e. the holes are between memory regions and other 
resources in PA space, so are only relevant once address translation 
comes into the picture), then this is IOMMU API level stuff, so even a 
DMA API level interface would be inappropriate.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/11] PCI: Add ats_supported host bridge flag

2020-02-17 Thread Jean-Philippe Brucker
On Sat, Feb 15, 2020 at 03:10:47PM -0600, Bjorn Helgaas wrote:
> On Thu, Feb 13, 2020 at 05:50:40PM +0100, Jean-Philippe Brucker wrote:
> > Each vendor has their own way of describing whether a host bridge
> > supports ATS.  The Intel and AMD ACPI tables selectively enable or
> > disable ATS per device or sub-tree, while Arm has a single bit for each
> > host bridge.  For those that need it, add an ats_supported bit to the
> > host bridge structure.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/pci/probe.c | 7 +++
> >  include/linux/pci.h | 1 +
> >  2 files changed, 8 insertions(+)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 512cb4312ddd..75c0a25af44e 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -598,6 +598,13 @@ static void pci_init_host_bridge(struct 
> > pci_host_bridge *bridge)
> > bridge->native_shpc_hotplug = 1;
> > bridge->native_pme = 1;
> > bridge->native_ltr = 1;
> > +
> > +   /*
> > +* Some systems may disable ATS at the host bridge (ACPI IORT,
> > +* device-tree), other filter it with a smaller granularity (ACPI DMAR
> > +* and IVRS).
> > +*/
> > +   bridge->ats_supported = 1;
> 
> The cover letter says it's important to enable ATS only if the host
> bridge supports it.  From the other patches, it looks like we learn if
> the host bridge supports ATS from either a DT "ats-supported" property
> or an ACPI IORT table.  If that's the case, shouldn't the default here
> be "ATS is *not* supported"?

The ACPI IVRS table (AMD) doesn't have a property for the host bridge, it
can only deselect ATS for a sub-range of devices. Similarly the DMAR table
(Intel) declares that ATS is supported either by the whole PCIe domain or
for sub-ranges of devices. I selected ats_supported at the bridge by
default since IVRS needs it and DMAR has its own fine-grained ATS support
configuration.

I'm still not sure this is the right approach, given that the
ats_supported bridge property doesn't exactly correspond to a firmware
property on all platforms. Maybe the device-tree implementation should
follow the IORT one where each device carries a fwspec property stating
"root-complex supports ATS". But it isn't nice either so I tried a cleaner
implementation (as discussed with Robin back on the ATS-with-SMMUv3 series
[1]).

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/c10c7adb-c7f6-f8c6-05cc-f4f143427...@arm.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1

2020-02-17 Thread Jerry Snitselaar

On Mon Feb 17 20, Robin Murphy wrote:

On 16/02/2020 10:11 pm, Jerry Snitselaar wrote:

On Fri Feb 14 20, Robin Murphy wrote:

Hi Jerry,

On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:

Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x80136000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x801000392000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x80118000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x80100036, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x801ba000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x8013e000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7
[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context 
fault: fsr=0x8402, iova=0x801000304000, fsynr=0x70091, 
cbfrsynra=0x9000, cb=7


Those "IOVAs" don't look much like IOVAs from the DMA allocator - 
if they were physical addresses, would they correspond to an 
expected region of the physical memory map?


I would suspect that this is most likely misbehaviour in the NVMe 
driver (issuing a write to a non-DMA-mapped address), and the SMMU 
is just doing its job in blocking and reporting it.


I also reproduced with 5.5-rc7, and will check 5.6-rc1 later 
today. I couldn't narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to 
start digging into this?


fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
-name=mytest -numjobs=32


Just to clarify, do other tests work OK on the same device?

Thanks,
Robin.



I was able to get back on the system today. I think I know what the 
problem is:


[    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu 
passthrough mode

[    6.324282] iommu: Default domain type: Translated

So the new default domain code in 5.4 overrides the iommu quirk code 
setting default
passthrough. Testing a quick patch that tracks whether the default 
domain was set
in the quirk code, and leaves it alone if it was. So far it seems to 
be working.


Ah, OK. Could you point me at that quirk code? I can't seem to track 
it down in mainline, and seeing this much leaves me dubious that it's 
even correct - matching a particular board implies that it's a 
firmware issue (as far as I'm aware the SMMUs in CN88xx SoCs are 
usable in general), but if the firmware description is wrong to the 
point that DMA ops translation doesn't work, then no other translation 
(e.g. VFIO) is likely to work either. In that case it's simply not 
safe to enable the SMMU at all, and fudging the default domain type 
merely hides one symptom of the problem.


Robin.



Ugh. It is a RHEL only patch, but for some reason it is applied to the
ark kernel builds as well. Sorry for the noise.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Robin Murphy

On 17/02/2020 1:31 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote:

On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:

On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:

On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
drivers/iommu/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for
virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
right?

Robin.


In fact does not this work on any platform now?


There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
has been converted recently [1] but VT-d still implements its own DMA ops
(conversion patches are on the list [2]). On Arm the arch Kconfig selects
IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
complete. Until then I can add a "if X86" here for clarity.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
[2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/


What about others? E.g. PPC?


That was the point I was getting at - while iommu-dma should build just fine
for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code
to correctly wire up iommu_dma_ops to devices. Thus there's currently no
point pulling it in and pretending it's anything more than a waste of space
for architectures other than arm64 and x86. It's merely a historical
artefact of the x86 DMA API implementation that when the IOMMU drivers were
split out to form drivers/iommu they took some of their relevant arch code
with them.

Robin.



Rather than white-listing architectures, how about making the
architectures in question set some kind of symbol, and depend on it?


Umm, that's basically what we have already? Architectures that use 
iommu_dma_ops select IOMMU_DMA.


The only issue is the oddity of x86 treating IOMMU drivers as part of 
its arch code, which has never come up against a cross-architecture 
driver until now. Hence the options of either maintaining that paradigm 
and having the 'x86 arch code' aspect of this driver "select IOMMU_DMA 
if x86" such that it works out equivalent to AMD_IOMMU, or a more 
involved cleanup to move that responsibility out of 
drivers/iommu/Kconfig entirely and have arch/x86/Kconfig do something 
like "select IOMMU_DMA if IOMMU_API", as Jean suggested up-thread.


In the specific context of IOMMU_DMA we're not talking about any kind of 
white-list, merely a one-off special case for one particular architecture.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote:
> On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:
> > On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> > > On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > > > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > > > With the built-in topology description in place, x86 platforms can 
> > > > > > now
> > > > > > use the virtio-iommu.
> > > > > > 
> > > > > > Signed-off-by: Jean-Philippe Brucker 
> > > > > > ---
> > > > > >drivers/iommu/Kconfig | 3 ++-
> > > > > >1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > > > index 068d4e0e3541..adcbda44d473 100644
> > > > > > --- a/drivers/iommu/Kconfig
> > > > > > +++ b/drivers/iommu/Kconfig
> > > > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > > > >config VIRTIO_IOMMU
> > > > > > bool "Virtio IOMMU driver"
> > > > > > depends on VIRTIO=y
> > > > > > -   depends on ARM64
> > > > > > +   depends on (ARM64 || X86)
> > > > > > select IOMMU_API
> > > > > > +   select IOMMU_DMA
> > > > > 
> > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > > > virtio-iommu itself (and really shouldn't be), but is merely to 
> > > > > satisfy the
> > > > > x86 arch code's expectation that IOMMU drivers bring their own DMA 
> > > > > ops,
> > > > > right?
> > > > > 
> > > > > Robin.
> > > > 
> > > > In fact does not this work on any platform now?
> > > 
> > > There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> > > has been converted recently [1] but VT-d still implements its own DMA ops
> > > (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> > > IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> > > complete. Until then I can add a "if X86" here for clarity.
> > > 
> > > Thanks,
> > > Jean
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> > > [2] 
> > > https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/
> > 
> > What about others? E.g. PPC?
> 
> That was the point I was getting at - while iommu-dma should build just fine
> for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code
> to correctly wire up iommu_dma_ops to devices. Thus there's currently no
> point pulling it in and pretending it's anything more than a waste of space
> for architectures other than arm64 and x86. It's merely a historical
> artefact of the x86 DMA API implementation that when the IOMMU drivers were
> split out to form drivers/iommu they took some of their relevant arch code
> with them.
> 
> Robin.


Rather than white-listing architectures, how about making the
architectures in question set some kind of symbol, and depend on it?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Robin Murphy

On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:

On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:

On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
   drivers/iommu/Kconfig | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
   config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for
virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
right?

Robin.


In fact does not this work on any platform now?


There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
has been converted recently [1] but VT-d still implements its own DMA ops
(conversion patches are on the list [2]). On Arm the arch Kconfig selects
IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
complete. Until then I can add a "if X86" here for clarity.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
[2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/


What about others? E.g. PPC?


That was the point I was getting at - while iommu-dma should build just 
fine for the likes of PPC, s390, 32-bit Arm, etc., they have no 
architecture code to correctly wire up iommu_dma_ops to devices. Thus 
there's currently no point pulling it in and pretending it's anything 
more than a waste of space for architectures other than arm64 and x86. 
It's merely a historical artefact of the x86 DMA API implementation that 
when the IOMMU drivers were split out to form drivers/iommu they took 
some of their relevant arch code with them.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1

2020-02-17 Thread Robin Murphy

On 16/02/2020 10:11 pm, Jerry Snitselaar wrote:

On Fri Feb 14 20, Robin Murphy wrote:

Hi Jerry,

On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:

Hi Will,

On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:

[  637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7
[  637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: 
fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, 
cb=7


Those "IOVAs" don't look much like IOVAs from the DMA allocator - if 
they were physical addresses, would they correspond to an expected 
region of the physical memory map?


I would suspect that this is most likely misbehaviour in the NVMe 
driver (issuing a write to a non-DMA-mapped address), and the SMMU is 
just doing its job in blocking and reporting it.


I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I 
couldn't narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to start 
digging into this?


fio test that is being run is:

#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite 
-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting 
-name=mytest -numjobs=32


Just to clarify, do other tests work OK on the same device?

Thanks,
Robin.



I was able to get back on the system today. I think I know what the 
problem is:


[    0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu 
passthrough mode

[    6.324282] iommu: Default domain type: Translated

So the new default domain code in 5.4 overrides the iommu quirk code 
setting default
passthrough. Testing a quick patch that tracks whether the default 
domain was set
in the quirk code, and leaves it alone if it was. So far it seems to be 
working.


Ah, OK. Could you point me at that quirk code? I can't seem to track it 
down in mainline, and seeing this much leaves me dubious that it's even 
correct - matching a particular board implies that it's a firmware issue 
(as far as I'm aware the SMMUs in CN88xx SoCs are usable in general), 
but if the firmware description is wrong to the point that DMA ops 
translation doesn't work, then no other translation (e.g. VFIO) is 
likely to work either. In that case it's simply not safe to enable the 
SMMU at all, and fudging the default domain type merely hides one 
symptom of the problem.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:
> On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > > With the built-in topology description in place, x86 platforms can now
> > > > use the virtio-iommu.
> > > > 
> > > > Signed-off-by: Jean-Philippe Brucker 
> > > > ---
> > > >   drivers/iommu/Kconfig | 3 ++-
> > > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > > index 068d4e0e3541..adcbda44d473 100644
> > > > --- a/drivers/iommu/Kconfig
> > > > +++ b/drivers/iommu/Kconfig
> > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > > >   config VIRTIO_IOMMU
> > > > bool "Virtio IOMMU driver"
> > > > depends on VIRTIO=y
> > > > -   depends on ARM64
> > > > +   depends on (ARM64 || X86)
> > > > select IOMMU_API
> > > > +   select IOMMU_DMA
> > > 
> > > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > > virtio-iommu itself (and really shouldn't be), but is merely to satisfy 
> > > the
> > > x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> > > right?
> > > 
> > > Robin.
> > 
> > In fact does not this work on any platform now?
> 
> There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
> has been converted recently [1] but VT-d still implements its own DMA ops
> (conversion patches are on the list [2]). On Arm the arch Kconfig selects
> IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
> complete. Until then I can add a "if X86" here for clarity.
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
> [2] 
> https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/

What about others? E.g. PPC?

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/11] PCI: OF: Check whether the host bridge supports ATS

2020-02-17 Thread Jean-Philippe Brucker
On Thu, Feb 13, 2020 at 12:26:46PM -0600, Rob Herring wrote:
> On Thu, Feb 13, 2020 at 10:52 AM Jean-Philippe Brucker
>  wrote:
> >
> > Copy the ats-supported flag into the pci_host_bridge structure.
> >
> > Signed-off-by: Jean-Philippe Brucker 
> > ---
> >  drivers/pci/controller/pci-host-common.c | 1 +
> >  drivers/pci/of.c | 9 +
> >  include/linux/of_pci.h   | 3 +++
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-host-common.c 
> > b/drivers/pci/controller/pci-host-common.c
> > index 250a3fc80ec6..a6ac927be291 100644
> > --- a/drivers/pci/controller/pci-host-common.c
> > +++ b/drivers/pci/controller/pci-host-common.c
> > @@ -92,6 +92,7 @@ int pci_host_common_probe(struct platform_device *pdev,
> > return ret;
> > }
> >
> > +   of_pci_host_check_ats(bridge);
> > platform_set_drvdata(pdev, bridge->bus);
> > return 0;
> >  }
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index 81ceeaa6f1d5..4b8a877f1e9f 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -576,6 +576,15 @@ int pci_parse_request_of_pci_ranges(struct device *dev,
> >  }
> >  EXPORT_SYMBOL_GPL(pci_parse_request_of_pci_ranges);
> >
> > +void of_pci_host_check_ats(struct pci_host_bridge *bridge)
> > +{
> > +   struct device_node *np = bridge->bus->dev.of_node;
> > +
> > +   if (!np)
> > +   return;
> > +
> > +   bridge->ats_supported = of_property_read_bool(np, "ats-supported");
> > +}
> 
> Not really any point in a common function if we expect this to be only
> for ECAM hosts which it seems to be based on the binding.

I'll move this to pci-host-common.c

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm64 iommu groups issue

2020-02-17 Thread John Garry


Right, and even worse is that it relies on the port driver even 
existing at all.


All this iommu group assignment should be taken outside device driver 
probe paths.


However we could still consider device links for sync'ing the SMMU and 
each device probing.


Yes, we should get that for DT now thanks to the of_devlink stuff, but 
cooking up some equivalent for IORT might be worthwhile.


It doesn't solve this problem, but at least we could remove the 
iommu_ops check in iort_iommu_xlate().


We would need to carve out a path from pci_device_add() or even 
device_add() to solve all cases.





Another thought that crosses my mind is that when pci_device_group()
walks up to the point of ACS isolation and doesn't find an existing
group, it can still infer that everything it walked past *should* be put
in the same group it's then eventually going to return. Unfortunately I
can't see an obvious way for it to act on that knowledge, though, since
recursive iommu_probe_device() is unlikely to end well.




[...]

And this looks to be the reason for which current 
iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.


Of course, just adding a 'correct' add_device replay without the 
of_xlate process doesn't help at all. No wonder this looked suspiciously 
simpler than where the first idea left off...


(on reflection, the core of this idea seems to be recycling the existing 
iommu_bus_init walk rather than building up a separate "waiting list", 
while forgetting that that wasn't the difficult part of the original 
idea anyway)


We could still use a bus walk to add the group per iommu, but we would 
need an additional check to ensure the device is associated with the IOMMU.




On this current code mentioned, the principle of this seems wrong to 
me - we call bus_for_each_device(..., add_iommu_group) for the first 
SMMU in the system which probes, but we attempt to add_iommu_group() 
for all devices on the bus, even though the SMMU for that device may 
yet to have probed.


Yes, iommu_bus_init() is one of the places still holding a 
deeply-ingrained assumption that the ops go live for all IOMMU instances 
at once, which is what warranted the further replay in 
of_iommu_configure() originally. Moving that out of 
of_platform_device_create() to support probe deferral is where the 
trouble really started.


I'm not too familiar with the history here, but could this be reverted 
now with the introduction of of_devlink stuff?


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/mediatek: add support for MT8167

2020-02-17 Thread Yong Wu
Hi Fabien,

Thanks very much for your patch.

On Mon, 2020-02-17 at 09:15 +0800, CK Hu wrote:
> +Yong.Wu.
> 
> On Fri, 2020-01-03 at 17:26 +0100, Fabien Parent wrote:
> > Add support for the IOMMU on MT8167
> > 
> > Signed-off-by: Fabien Parent 
> > ---
> >  drivers/iommu/mtk_iommu.c | 11 ++-
> >  drivers/iommu/mtk_iommu.h |  1 +
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 6fc1f5ecf91e..5fc6178a82dc 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -569,7 +569,8 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> > F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
> > writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL);
> >  
> > -   if (data->plat_data->m4u_plat == M4U_MT8173)
> > +   if (data->plat_data->m4u_plat == M4U_MT8173 ||
> > +   data->plat_data->m4u_plat == M4U_MT8167)

I didn't know mt8167 will do upstream. In my original thought, there is
only mt8173 use this setting and the later SoC won't use this, So I used
the "m4u_plat" directly here.

If we also need support mt8167, then CK's suggestion is reasonable. we
could add a new variable like "legacy_ivrp_paddr" from its register name
in a seperated patch, then support mt8167 in a new patch.

> > regval = (data->protect_base >> 1) | (data->enable_4GB << 31);
> > else
> > regval = lower_32_bits(data->protect_base) |
> > @@ -782,6 +783,13 @@ static const struct mtk_iommu_plat_data mt2712_data = {
> > .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
> >  };
> >  
> > +static const struct mtk_iommu_plat_data mt8167_data = {
> > +   .m4u_plat = M4U_MT8167,
> > +   .has_4gb_mode = true,
> > +   .reset_axi= true,
> > +   .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
> > +};
> > +
> >  static const struct mtk_iommu_plat_data mt8173_data = {
> > .m4u_plat = M4U_MT8173,
> > .has_4gb_mode = true,
> > @@ -798,6 +806,7 @@ static const struct mtk_iommu_plat_data mt8183_data = {
> >  
> >  static const struct of_device_id mtk_iommu_of_ids[] = {
> > { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data},
> > +   { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data},
> > { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data},
> > { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data},
> > {}
> > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> > index ea949a324e33..cb8fd5970cd4 100644
> > --- a/drivers/iommu/mtk_iommu.h
> > +++ b/drivers/iommu/mtk_iommu.h
> > @@ -30,6 +30,7 @@ struct mtk_iommu_suspend_reg {
> >  enum mtk_iommu_plat {
> > M4U_MT2701,
> > M4U_MT2712,
> > +   M4U_MT8167,
> > M4U_MT8173,
> > M4U_MT8183,
> >  };
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] PCI: Add DMA configuration for virtual platforms

2020-02-17 Thread Jean-Philippe Brucker
On Fri, Feb 14, 2020 at 05:03:16PM +, Robin Murphy wrote:
> On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > Hardware platforms usually describe the IOMMU topology using either
> > device-tree pointers or vendor-specific ACPI tables.  For virtual
> > platforms that don't provide a device-tree, the virtio-iommu device
> > contains a description of the endpoints it manages.  That information
> > allows us to probe endpoints after the IOMMU is probed (possibly as late
> > as userspace modprobe), provided it is discovered early enough.
> > 
> > Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the
> > endpoint is managed by a vIOMMU that will be loaded later, or 0 in any
> > other case to avoid disturbing the normal DMA configuration methods.
> > When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to
> > virt_dma_configure() is compiled out.
> > 
> > As long as the information is consistent, platforms can provide both a
> > device-tree and a built-in topology, and the IOMMU infrastructure is
> > able to deal with multiple DMA configuration methods.
> 
> Urgh, it's already been established[1] that having IOMMU setup tied to DMA
> configuration at driver probe time is not just conceptually wrong but
> actually broken, so the concept here worries me a bit. In a world where
> of_iommu_configure() and friends are being called much earlier around
> iommu_probe_device() time, how badly will this fall apart?

If present the DT configuration should take precedence over this built-in
method, so the earlier it is called the better. virt_dma_configure()
currently gives up if the device already has iommu_ops (well, still calls
setup_dma_ops() which is safe enough, but I think I'll change that to have
virt_iommu_setup() return NULL if iommu_ops are present).

I don't have the full picture of the changes you intend for
{of,acpi}_iommu_configure(), do you think checking the validity of
dev->iommu_fwspec will remain sufficient to have both methods coexist?

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Jean-Philippe Brucker
On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:
> On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:
> > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:
> > > With the built-in topology description in place, x86 platforms can now
> > > use the virtio-iommu.
> > > 
> > > Signed-off-by: Jean-Philippe Brucker 
> > > ---
> > >   drivers/iommu/Kconfig | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > index 068d4e0e3541..adcbda44d473 100644
> > > --- a/drivers/iommu/Kconfig
> > > +++ b/drivers/iommu/Kconfig
> > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU
> > >   config VIRTIO_IOMMU
> > >   bool "Virtio IOMMU driver"
> > >   depends on VIRTIO=y
> > > - depends on ARM64
> > > + depends on (ARM64 || X86)
> > >   select IOMMU_API
> > > + select IOMMU_DMA
> > 
> > Can that have an "if X86" for clarity? AIUI it's not necessary for
> > virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
> > x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
> > right?
> > 
> > Robin.
> 
> In fact does not this work on any platform now?

There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
has been converted recently [1] but VT-d still implements its own DMA ops
(conversion patches are on the list [2]). On Arm the arch Kconfig selects
IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
complete. Until then I can add a "if X86" here for clarity.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
[2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 14, 2020 at 03:06:42PM -0800, Isaac J. Manjarres wrote:
> From: Liam Mark 
> 
> Using the best-fit algorithm, instead of the first-fit
> algorithm, may reduce fragmentation when allocating
> IOVAs.

As we like to say in standards groups:  may also implies may not.
Please provide numbers showing that this helps, and preferably and
explanation how it doesn't hurt as well.

> + * Should be called prior to using dma-apis.
> + */
> +int iommu_dma_enable_best_fit_algo(struct device *dev)
> +{
> + struct iommu_domain *domain;
> + struct iova_domain *iovad;
> +
> + domain = iommu_get_domain_for_dev(dev);
> + if (!domain || !domain->iova_cookie)
> + return -EINVAL;
>  
> + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
> + iovad->best_fit = true;
>   return 0;
>  }
> +EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo);

Who is going to call this?  Patches adding exports always need a user
that goes along with the export.  Also drivers have no business calling
directly into dma-iommu.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote:
> From: Liam Mark 
> 
> Some devices have a memory map which contains gaps or holes.
> In order for the device to have as much IOVA space as possible,
> allow its driver to inform the DMA-IOMMU layer that it should
> not allocate addresses from these holes.

Layering violation.  dma-iommu is the translation layer between the
DMA API and the IOMMU API.  And calls into it from drivers performing
DMA mappings need to go through the DMA API (and be documented there).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu