RE: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM
> Subject: [PATCH v4 03/12] iommu/vt-d: Move page table helpers into header
> 
> So that they could also be used in other source files.

Just a refine. :)
"This patch moves page table helpers to header file, so that other source files
can use them."

Thanks,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM
> Subject: [PATCH v4 10/12] iommu/vt-d: Add first level page table interface
> 
> This adds an interface to setup the PASID entries for first

This patch adds interface to setup the PASID entries for first. :)

> level page table translation.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Ashok Raj 
> ---
>  drivers/iommu/intel-pasid.c | 81 +
>  drivers/iommu/intel-pasid.h | 11 +
>  include/linux/intel-iommu.h |  1 +
>  3 files changed, 93 insertions(+)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 69530317c323..d8ca1e6a8e5e 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -10,6 +10,7 @@
>  #define pr_fmt(fmt)  "DMAR: " fmt
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -388,6 +389,26 @@ static inline void pasid_set_page_snoop(struct 
> pasid_entry
> *pe, bool value)
>   pasid_set_bits(>val[1], 1 << 23, value);
>  }
> 
> +/*
> + * Setup the First Level Page table Pointer field (Bit 140~191)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_flptr(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(>val[2], VTD_PAGE_MASK, value);
> +}
> +
> +/*
> + * Setup the First Level Paging Mode field (Bit 130~131) of a
> + * scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_flpm(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2);
> +}
> +
>  static void
>  pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
>   u16 did, int pasid)
> @@ -458,6 +479,66 @@ void intel_pasid_tear_down_entry(struct intel_iommu
> *iommu,
>   devtlb_invalidation_with_pasid(iommu, dev, pasid);
>  }
> 
> +/*
> + * Set up the scalable mode pasid table entry for first only
> + * translation type.
> + */
> +int intel_pasid_setup_first_level(struct intel_iommu *iommu,
> +   struct device *dev, pgd_t *pgd,
> +   int pasid, int flags)
> +{
> + u16 did = FLPT_DEFAULT_DID;
> + struct pasid_entry *pte;

aha, same comment with previous patch. may be better us pt_entry
or pasid_entry.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM
> Subject: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping
> 
> So that the pasid related info, such as the pasid table and the
> maximum of pasid could be used during setting up scalable mode
> context.

A little bit refine. Wish it helps. :)
"This patch passes the pasid related info(e.g. the pasid table and the
maximum of pasid) to context mapping, so that pasid related fields
can be setup accordingly in scalable mode context entry."

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM
> 
> This adds the interfaces to setup or tear down the structures
> for second level page table translations. This includes types
> of second level only translation and pass through.

A little bit refining to the description:)
"This patch adds interfaces for setup or tear down second level
translation in PASID granularity. Translation type includes second
level only type and pass-through type."

> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Signed-off-by: Sanjay Kumar 

[...]

> +
> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> +  struct device *dev, int pasid)
> +{
> + struct pasid_entry *pte;

pte is confusing as it is similar with pte in paging structures. may use
pt_entry or just pasid_entry. This comment applies to other "pte"s in
this patch.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM
> Subject: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes
> 
> Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
> entry for first-level or pass-through translation should be
> programmed with a domain id different from those used for
> second-level or nested translation. It is recommended that
> software could use a same domain id for all first-only and
> pass-through translations.
> 
> This reserves a domain id for first-level and pass-through
> translations.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Cc: Sanjay Kumar 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 10 ++
>  drivers/iommu/intel-pasid.h |  6 ++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 9331240c70b8..2f7455ee4e7a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1618,6 +1618,16 @@ static int iommu_init_domains(struct intel_iommu
> *iommu)
>*/
>   set_bit(0, iommu->domain_ids);
> 
> + /*
> +  * Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
> +  * entry for first-level or pass-through translation modes should
> +  * be programmed with a domain id different from those used for
> +  * second-level or nested translation. We reserve a domain id for
> +  * this purpose.
> +  */
> + if (sm_supported(iommu))
> + set_bit(FLPT_DEFAULT_DID, iommu->domain_ids);

"FLPT_DEFAULT_DID" looks very likely for first level translation. How about
"PT_FL_DEFAULT_DID"?

>   return 0;
>  }
> 
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 12f480c2bb8b..03c1612d173c 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -17,6 +17,12 @@
>  #define PDE_PFN_MASK PAGE_MASK
>  #define PASID_PDE_SHIFT  6
> 
> +/*
> + * Domain ID reserved for pasid entries programmed for first-level
> + * only and pass-through transfer modes.
> + */
> +#define FLPT_DEFAULT_DID 1

Would be helpful to elaborate why DID 1 is selected in the patch
description.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-06 Thread Liu, Yi L
Hi Baolu,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, November 5, 2018 1:32 PM

[...]

> ---
>  drivers/iommu/dmar.c| 83 +++--
>  drivers/iommu/intel-svm.c   | 76 --
>  drivers/iommu/intel_irq_remapping.c |  6 ++-
>  include/linux/intel-iommu.h |  9 +++-
>  4 files changed, 115 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index d9c748b6f9e4..ec10427b98ac 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int
> index)
>   int head, tail;
>   struct q_inval *qi = iommu->qi;
>   int wait_index = (index + 1) % QI_LENGTH;
> + int shift = qi_shift(iommu);
> 
>   if (qi->desc_status[wait_index] == QI_ABORT)
>   return -EAGAIN;
> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu,
> int index)
>*/
>   if (fault & DMA_FSTS_IQE) {
>   head = readl(iommu->reg + DMAR_IQH_REG);
> - if ((head >> DMAR_IQ_SHIFT) == index) {
> + if ((head >> shift) == index) {
> + struct qi_desc *desc = qi->desc + head;
> +
>   pr_err("VT-d detected invalid descriptor: "
>   "low=%llx, high=%llx\n",
> - (unsigned long long)qi->desc[index].low,
> - (unsigned long long)qi->desc[index].high);
> - memcpy(>desc[index], >desc[wait_index],
> - sizeof(struct qi_desc));
> + (unsigned long long)desc->qw0,
> + (unsigned long long)desc->qw1);

Still missing qw2 and qw3. May make the print differ based on if smts is 
configed.

> + memcpy(desc, qi->desc + (wait_index << shift),

Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index << shift)," be
more safe?

> +1 << shift);
>   writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG);
>   return -EINVAL;
>   }
> @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu,
> int index)
>*/
>   if (fault & DMA_FSTS_ITE) {
>   head = readl(iommu->reg + DMAR_IQH_REG);
> - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
> + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>   head |= 1;
>   tail = readl(iommu->reg + DMAR_IQT_REG);
> - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
> + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> 
>   writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
> 
> @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>  {
>   int rc;
>   struct q_inval *qi = iommu->qi;
> - struct qi_desc *hw, wait_desc;
> + int offset, shift, length;
> + struct qi_desc wait_desc;
>   int wait_index, index;
>   unsigned long flags;
> 
>   if (!qi)
>   return 0;
> 
> - hw = qi->desc;
> -
>  restart:
>   rc = 0;
> 
> @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
> 
>   index = qi->free_head;
>   wait_index = (index + 1) % QI_LENGTH;
> + shift = qi_shift(iommu);
> + length = 1 << shift;
> 
>   qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
> 
> - hw[index] = *desc;
> -
> - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
> + offset = index << shift;
> + memcpy(qi->desc + offset, desc, length);
> + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>   QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
> - wait_desc.high = virt_to_phys(>desc_status[wait_index]);
> + wait_desc.qw1 = virt_to_phys(>desc_status[wait_index]);
> + wait_desc.qw2 = 0;
> + wait_desc.qw3 = 0;
> 
> - hw[wait_index] = wait_desc;
> + offset = wait_index << shift;
> + memcpy(qi->desc + offset, _desc, length);

same question with above one.

Thanks,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/6] drivers core: Add I/O ASID allocator

2018-11-06 Thread Lu Baolu

Hi,

On 10/20/18 2:11 AM, Jean-Philippe Brucker wrote:

+static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid)
+{
+   return -ESRCH;


return NULL;

Best regards,
Lu Baolu


+}
+#endif /* CONFIG_IOASID */
+#endif /* __LINUX_IOASID_H */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu/vt-d: Group and domain relationship

2018-11-06 Thread Lu Baolu

Hi,

On 11/6/18 6:40 PM, James Sewart wrote:

Hey Lu,

Would you be able to go into more detail about the issues with
allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc?


This door is closed because intel iommu driver does everything for
IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries
for the domain.

Why do we want to open this door? Probably we want the generic iommu
layer to handle these things (it's called default domain).

So we can't just open the door but not cleanup the things right?

I haven't spent time on details. So I cc'ed Jacob for corrections.

Best regards,
Lu Baolu




Cheers,
James.
On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu  wrote:


Hi,

On 10/30/18 10:18 PM, James Sewart via iommu wrote:

Hey,

I’ve been investigating the relationship between iommu groups and domains
on our systems and have a few question. Why does the intel iommu code not
allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain
type has the side effect that the default_domain for an iommu group is not
set, which, when using for e.g. dma_map_ops.map_page, means a domain is
allocated per device.


Intel vt-d driver doesn't implement the default domain and allocates
domain only on demanded. There are lots of things to do before we allow
iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED.

Best regards,
Lu Baolu



This seems to be the opposite behaviour to the AMD iommu code which
supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu
group if a domain is not attached to the device rather than allocating a
new one. On AMD every device in an iommu group will share the same domain.

Appended is what I think a patch to implement domain_alloc for
IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing
shows each device in a group will share a domain by default, it also
allows allocating a new dma domain that can be successfully attached to a
group with iommu_attach_group.

Looking for comment on why the behaviour is how it is currently and if
there are any issues with the solution I’ve been testing.

Cheers,
James.


diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bff2abd6..3a58389f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5170,10 +5170,15 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
   struct dmar_domain *dmar_domain;
   struct iommu_domain *domain;

- if (type != IOMMU_DOMAIN_UNMANAGED)
+ if (type == IOMMU_DOMAIN_UNMANAGED)
+ dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
+ else if(type == IOMMU_DOMAIN_DMA)
+ dmar_domain = alloc_domain(0);
+ else if(type == IOMMU_DOMAIN_IDENTITY)
+ dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
+ else
   return NULL;

- dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
   if (!dmar_domain) {
   pr_err("Can't allocate dmar_domain\n");
   return NULL;
@@ -5186,9 +5191,12 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
   domain_update_iommu_cap(dmar_domain);

   domain = _domain->domain;
- domain->geometry.aperture_start = 0;
- domain->geometry.aperture_end   = __DOMAIN_MAX_ADDR(dmar_domain->gaw);
- domain->geometry.force_aperture = true;
+
+ if (type == IOMMU_DOMAIN_UNMANAGED) {
+ domain->geometry.aperture_start = 0;
+ domain->geometry.aperture_end   = 
__DOMAIN_MAX_ADDR(dmar_domain->gaw);
+ domain->geometry.force_aperture = true;
+ }

   return domain;
   }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device

2018-11-06 Thread Lu Baolu

Hi Alex,

On 11/7/18 7:53 AM, Alex Williamson wrote:

On Mon,  5 Nov 2018 15:34:06 +0800
Lu Baolu  wrote:


A parent device might create different types of mediated
devices. For example, a mediated device could be created
by the parent device with full isolation and protection
provided by the IOMMU. One usage case could be found on
Intel platforms where a mediated device is an assignable
subset of a PCI, the DMA requests on behalf of it are all
tagged with a PASID. Since IOMMU supports PASID-granular
translations (scalable mode in vt-d 3.0), this mediated
device could be individually protected and isolated by an
IOMMU.

This patch adds two new members in struct mdev_device:
* iommu_device
   - This, if set, indicates that the mediated device could
 be fully isolated and protected by IOMMU via attaching
 an iommu domain to this device. If empty, it indicates
 using vendor defined isolation.

* iommu_domain
   - This is a place holder for an iommu domain. A domain
 could be store here for later use once it has been
 attached to the iommu_device of this mdev.

Below helpers are added to set and get above iommu device
and iommu domain pointers.

* mdev_set/get_iommu_device(dev, iommu_device)
   - Set or get the iommu device which represents this mdev
 in IOMMU's device scope. Drivers don't need to set the
 iommu device if it uses vendor defined isolation.

* mdev_set/get_iommu_domain(domain)
   - A iommu domain which has been attached to the iommu
 device in order to protect and isolate the mediated
 device will be kept in the mdev data structure and
 could be retrieved later.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Suggested-by: Kevin Tian 
Suggested-by: Alex Williamson 
Signed-off-by: Lu Baolu 
---
  drivers/vfio/mdev/mdev_core.c| 36 
  drivers/vfio/mdev/mdev_private.h |  2 ++
  include/linux/mdev.h | 23 
  3 files changed, 61 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0ee8aea..5119809225c5 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool 
force_remove)
return 0;
  }
  
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)

+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   mdev->iommu_device = iommu_device;
+
+   return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev)
+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   return mdev->iommu_device;
+}
+EXPORT_SYMBOL(mdev_get_iommu_device);
+
+int mdev_set_iommu_domain(struct device *dev, void *domain)
+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   mdev->iommu_domain = domain;
+
+   return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_domain);
+
+void *mdev_get_iommu_domain(struct device *dev)
+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   return mdev->iommu_domain;
+}
+EXPORT_SYMBOL(mdev_get_iommu_domain);
+
  static int __init mdev_init(void)
  {
return mdev_bus_register();
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index b5819b7d7ef7..c01518068e84 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -34,6 +34,8 @@ struct mdev_device {
struct list_head next;
struct kobject *type_kobj;
bool active;
+   struct device *iommu_device;
+   void *iommu_domain;
  };
  
  #define to_mdev_device(dev)	container_of(dev, struct mdev_device, dev)

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index b6e048e1045f..c46777d3e568 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -14,6 +14,29 @@
  #define MDEV_H
  
  struct mdev_device;

+struct iommu_domain;
+
+/*
+ * Called by the parent device driver to set the PCI device which represents


s/PCI //

There is no requirement or expectation that the device is PCI.



Fair enough.


+ * this mdev in iommu protection scope. By default, the iommu device is NULL,
+ * that indicates using vendor defined isolation.
+ *
+ * @dev: the mediated device that iommu will isolate.
+ * @iommu_device: a pci device which represents the iommu for @dev.
+ *
+ * Return 0 for success, otherwise negative error value.
+ */
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
+
+struct device *mdev_get_iommu_device(struct device *dev);
+
+/*
+ * Called by vfio iommu modules to save the iommu domain after a domain being
+ * attached to the mediated device.
+ */
+int mdev_set_iommu_domain(struct device *dev, void *domain);
+
+void *mdev_get_iommu_domain(struct device *dev);


I can't say I really understand the purpose of this, the cover letter
indicates this is a placeholder, should we add it separately when we
have a requirement for it?


Oh, I am 

Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs

2018-11-06 Thread John Stultz
On Mon, Oct 8, 2018 at 1:04 AM Christoph Hellwig  wrote:
>
> No need to duplicate the code - map_sg is equivalent to map_page
> for each page in the scatterlist.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/swiotlb.c | 34 --
>  1 file changed, 12 insertions(+), 22 deletions(-)

Hey all,

So, I've found this patch seems to break userspace booting on the
HiKey960 board. Initially I thought this was an issue with the mali
drivers, and have worked w/ the mali team to try to find a solution,
but I've since found that booting just the upstream kernel (with no
graphics support) will see userland hang/block unless this patch is
reverted.

When I see the hangs, it seems like the filesystems are stuck or
something, as kernel messages still show up and sometimes I can get to
a shell, but commands that I run in that shell (like ls) just hang. I
don't see any other error messages.

Reverting this patch then gets it work. In order to cleanly revert the
patch, I have to revert the following set:
  "arm64: use the generic swiotlb_dma_ops"
  "swiotlb: add support for non-coherent DMA"
  "swiotlb: don't dip into swiotlb pool for coherent allocations"
  "swiotlb: refactor swiotlb_map_page"
  "swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs"

But at that point if I just re-apply "swiotlb: use swiotlb_map_page in
swiotlb_map_sg_attrs", I reproduce the hangs.

Any suggestions for how to further debug what might be going wrong
would be appreciated!

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

2018-11-06 Thread Nicolin Chen
Hi Robin,

On Tue, Nov 06, 2018 at 06:27:39PM +, Robin Murphy wrote:
> > I re-ran the test to get some accuracy within the function and got:
> > 1) pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
> > // reduced from 422 usec to 56 usec == 366 usec less
> > 2) if (!(prot & IOMMU_CACHE)) {...} //flush routine
> > // reduced from 439 usec to 236 usec == 203 usec less
> > Note: new memset takes about 164 usec, resulting in 400 usec diff
> >for the entire iommu_dma_alloc() function call.
> > 
> > It looks like this might be more than the diff between clear_page
> > and memset, and might be related to mapping and cache. Any idea?
> 
> Hmm, I guess it might not be so much clear_page() itself as all the gubbins
> involved in getting there from prep_new_page(). I could perhaps make some
> vague guesses about how the A57 cores might get tickled by the different
> code patterns, but the Denver cores are well beyond my ability to reason
> about. Out of even further curiosity, how does the quick hack below compare?

I tried out that change. And the results are as followings:
a. Routine (1) reduced from 422 usec to 55 usec
b. Routine (2) increased from 441 usec to 833 usec
c. Overall, it seems to remain the same: 900+ usec

> > > > @@ -581,6 +593,12 @@ struct page **iommu_dma_alloc(struct device *dev, 
> > > > size_t size, gfp_t gfp,
> > > > if (sg_alloc_table_from_pages(, pages, count, 0, size, 
> > > > GFP_KERNEL))
> > > > goto out_free_iova;
> > > > +   if (gfp_zero) {
> > > > +   /* Now zero all the pages in the scatterlist */
> > > > +   for_each_sg(sgt.sgl, s, sgt.orig_nents, i)
> > > > +   memset(sg_virt(s), 0, s->length);
> > > 
> > > What if the pages came from highmem? I know that doesn't happen on arm64
> > > today, but the point of this code *is* to be generic, and other users will
> > > arrive eventually.
> > 
> > Hmm, so it probably should use sg_miter_start/stop() too? Looking
> > at the flush routine doing in PAGE_SIZE for each iteration, would
> > be possible to map and memset contiguous pages together? Actually
> > the flush routine might be also optimized if we can map contiguous
> > pages.
> 
> I suppose the ideal point at which to do it would be after the remapping
> when we have the entire buffer contiguous in vmalloc space and can make best
> use of prefetchers etc. - DMA_ATTR_NO_KERNEL_MAPPING is a bit of a spanner
> in the works, but we could probably accommodate a special case for that. As
> Christoph points out, this isn't really the place to be looking for
> performance anyway (unless it's pathologically bad as per the

I would understand the point. So probably it'd be more plausible
to have the change if it reflects on some practical benchmark. I
might need to re-run some tests with heavier use cases.

> DMA_ATTR_ALLOC_SINGLE_PAGES fun), but if we're looking at pulling the
> remapping out of the arch code, maybe we could aim to rework the zeroing
> completely as part of that.

That'd be nice. I believe it'd be good to have.

Thanks
Nicolin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-06 Thread John Stultz
On Tue, Nov 6, 2018 at 3:54 AM, Robin Murphy  wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
>
> Make sure we only touch it if the DT actually said so.
>
> Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> Reported-by: Aaro Koskinen 
> Reported-by: Jean-Philippe Brucker 
> Signed-off-by: Robin Murphy 
> ---
>
> Sorry about that... I guess I only have test setups that either have
> dma-ranges or where a 32-bit bus mask goes unnoticed :(
>
> The Octeon and SMMU issues sound like they're purely down to this, and
> it's probably related to at least one of John's Hikey woes.

Yep! This does seem to resolve the mali bifrost dma address warn-ons I
was seeing, and makes the board seem to function more consistently, so
that's great!

Tested-by: John Stultz 

Though I still find I have to revert "swiotlb: use swiotlb_map_page in
swiotlb_map_sg_attrs" still to boot to UI successfully with AOSP.
Still not sure whats going on there (its sort of soft hangs where some
userland runs ok, but other bits seem to jam up, even console commands
sometimes hang - almost seems like io stalls).

Anyway, thanks so much again for this one!
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device

2018-11-06 Thread Alex Williamson
On Mon,  5 Nov 2018 15:34:06 +0800
Lu Baolu  wrote:

> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in vt-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
> 
> This patch adds two new members in struct mdev_device:
> * iommu_device
>   - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation.
> 
> * iommu_domain
>   - This is a place holder for an iommu domain. A domain
> could be store here for later use once it has been
> attached to the iommu_device of this mdev.
> 
> Below helpers are added to set and get above iommu device
> and iommu domain pointers.
> 
> * mdev_set/get_iommu_device(dev, iommu_device)
>   - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
> 
> * mdev_set/get_iommu_domain(domain)
>   - A iommu domain which has been attached to the iommu
> device in order to protect and isolate the mediated
> device will be kept in the mdev data structure and
> could be retrieved later.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Kevin Tian 
> Suggested-by: Alex Williamson 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/vfio/mdev/mdev_core.c| 36 
>  drivers/vfio/mdev/mdev_private.h |  2 ++
>  include/linux/mdev.h | 23 
>  3 files changed, 61 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 0212f0ee8aea..5119809225c5 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,42 @@ int mdev_device_remove(struct device *dev, bool 
> force_remove)
>   return 0;
>  }
>  
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_device;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_device);
> +
> +int mdev_set_iommu_domain(struct device *dev, void *domain)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_domain = domain;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_domain);
> +
> +void *mdev_get_iommu_domain(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return mdev->iommu_domain;
> +}
> +EXPORT_SYMBOL(mdev_get_iommu_domain);
> +
>  static int __init mdev_init(void)
>  {
>   return mdev_bus_register();
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index b5819b7d7ef7..c01518068e84 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -34,6 +34,8 @@ struct mdev_device {
>   struct list_head next;
>   struct kobject *type_kobj;
>   bool active;
> + struct device *iommu_device;
> + void *iommu_domain;
>  };
>  
>  #define to_mdev_device(dev)  container_of(dev, struct mdev_device, dev)
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index b6e048e1045f..c46777d3e568 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -14,6 +14,29 @@
>  #define MDEV_H
>  
>  struct mdev_device;
> +struct iommu_domain;
> +
> +/*
> + * Called by the parent device driver to set the PCI device which represents

s/PCI //

There is no requirement or expectation that the device is PCI.

> + * this mdev in iommu protection scope. By default, the iommu device is NULL,
> + * that indicates using vendor defined isolation.
> + *
> + * @dev: the mediated device that iommu will isolate.
> + * @iommu_device: a pci device which represents the iommu for @dev.
> + *
> + * Return 0 for success, otherwise negative error value.
> + */
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device);
> +
> +struct device *mdev_get_iommu_device(struct device *dev);
> +
> +/*
> + * Called by vfio iommu modules to save the iommu domain after a domain being
> + * attached to the mediated device.
> + */
> +int mdev_set_iommu_domain(struct device *dev, void *domain);
> +
> +void *mdev_get_iommu_domain(struct device *dev);

I can't say I really understand 

Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

2018-11-06 Thread Nicolin Chen
Hi Christoph,

On Sun, Nov 04, 2018 at 07:50:01AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 01, 2018 at 02:35:00PM -0700, Nicolin Chen wrote:
> > The __GFP_ZERO will be passed down to the generic page allocation
> > routine which zeros everything page by page. This is safe to be a
> > generic way but not efficient for iommu allocation that organizes
> > contiguous pages using scatterlist.
> > 
> > So this changes drops __GFP_ZERO from the flag, and adds a manual
> > memset after page/sg allocations, using the length of scatterlist.
> > 
> > My test result of a 2.5MB size allocation shows iommu_dma_alloc()
> > takes 46% less time, reduced from averagely 925 usec to 500 usec.
> 
> And in what case does dma_alloc_* performance even matter?

Honestly, this was amplified by running a local iommu benchmark
test. Practically dma_alloc/free() should not be that stressful,
but we cannot say the performance doesn't matter at all, right?
Though many device drivers pre-allocte memory for DMA usage, it
could matter where a driver dynamically allocates and releases.

And actually I have a related question for you: I saw that the
dma_direct_alloc() cancels the __GFP_ZERO flag and does manual
memset() after allocation. Might that be possibly related to a
performance concern? Though I don't see any performance keyword
for that part of code, especially seems that memset() was there
from the beginning.

Thanks
Nicolin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/vt-d: respect max guest address width in agaw

2018-11-06 Thread Jacob Pan
Supported guest address witdh (SGAW) only indicates
what the iommu's capabilities are wrt page table levels
for second level page-tables. IOMMU should pick the
right level depending on the Maximum Guest Address Width
(MGAW).

For pass-through translation type, address width must be
programmed with the largest AGAW supported by the HW.

Reported-by: Ramos Falcon, Ernesto R 
Signed-off-by: Ashok Raj 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 140d6ab..f16db3b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -719,7 +719,11 @@ int iommu_calculate_max_sagaw(struct intel_iommu *iommu)
  */
 int iommu_calculate_agaw(struct intel_iommu *iommu)
 {
-   return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
+   unsigned long mgaw;
+
+   /* Respect Max Guest Address Width */
+   mgaw = min(cap_mgaw(iommu->cap), DEFAULT_DOMAIN_ADDRESS_WIDTH);
+   return __iommu_calculate_agaw(iommu, mgaw);
 }
 
 /* This functionin only returns single iommu in a domain */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU breakage on arm64

2018-11-06 Thread Robin Murphy

Hi Geert,

On 2018-11-06 7:44 pm, Geert Uytterhoeven wrote:

Hi Christoph et al,

On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List
 wrote:

Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12
Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced
Refname:refs/heads/master
Web:
https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12
Author: Christoph Hellwig 
AuthorDate: Thu Sep 20 14:04:08 2018 +0200
Committer:  Christoph Hellwig 
CommitDate: Mon Oct 1 07:28:03 2018 -0700

 dma-direct: implement complete bus_dma_mask handling

 Instead of rejecting devices with a too small bus_dma_mask we can handle
 by taking the bus dma_mask into account for allocations and bounce
 buffering decisions.

 Signed-off-by: Christoph Hellwig 


I  have bisected the following crash to the above commit:


I think that commit mostly just changes the presentation of my 
underlying cockup - see here for what should fix it: 
https://patchwork.kernel.org/patch/10670177/


I have a feeling we've ironed out crash-on-early-domain-free bugs in the 
SMMU drivers already - arm-smmu certainly has an early return in 
arm_smmu_destroy_domain_context() which should behave exactly like your 
diff below, while I think arm-smmu-v3 gets away with it by virtue of 
smmu_domain->cfg being unset, but I'll double-check that when I'm fresh 
tomorrow (Jean-Philippe reported SMMUv3 hitting the DMA thing to me 
internally, but didn't mention any crash).


Thanks for the report,
Robin.


 ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for
IOMMU page tables
 sata_rcar ee30.sata: Unable to initialize IPMMU context
 iommu: Failed to add device ee30.sata to group 0: -22
 Unable to handle kernel NULL pointer dereference at virtual
address 0038
 Mem abort info:
   ESR = 0x9604
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
 [0038] user address but active_mm is swapper
 Internal error: Oops: 9604 [#1] PREEMPT SMP
 CPU: 2 PID: 51 Comm: kworker/2:1 Not tainted
4.20.0-rc1-arm64-renesas-dirty #74
 Hardware name: Renesas Salvator-X 2nd version board based on
r8a7795 ES2.0+ (DT)
 Workqueue: events deferred_probe_work_func
 pstate: 6005 (nZCv daif -PAN -UAO)
 pc : ipmmu_domain_free+0x1c/0xa0
 lr : ipmmu_domain_free+0x14/0xa0
 sp : 09c9b990
 x29: 09c9b990 x28: 
 x27: 08dff000 x26: 
 x25: 08dd9000 x24: 0014
 x23: 8006fffdbb20 x22: 08dff000
 x21: 8006f898e680 x20: 8006f8fa6c00
 x19: 8006f8fa6c08 x18: 0037
 x17: 0020 x16: 08a8f780
 x15: 8006fb096f10 x14: 8006f93731c8
 x13:  x12: 8006fb096f10
 x11: 8006f9372fe8 x10: 08dff708
 x9 : 8006fb096f50 x8 : 08dff708
 x7 : 089f1858 x6 : 
 x5 :  x4 : 8006f89a3000
 x3 : 8006f7131000 x2 : 8006fb2b5700
 x1 :  x0 : 0028
 Process kworker/2:1 (pid: 51, stack limit = 0x(ptrval))
 Call trace:
  ipmmu_domain_free+0x1c/0xa0
  iommu_group_release+0x48/0x68
  kobject_put+0x74/0xe8
  kobject_del.part.0+0x3c/0x50
  kobject_put+0x60/0xe8
  iommu_group_get_for_dev+0xa8/0x1f0
  ipmmu_add_device+0x1c/0x40
  of_iommu_configure+0x118/0x190
  of_dma_configure+0xcc/0x1f0
  platform_dma_configure+0x18/0x28
  really_probe+0x94/0x2a8
  driver_probe_device+0x58/0x100
  __device_attach_driver+0x90/0xd0
  bus_for_each_drv+0x64/0xc8
  __device_attach+0xd8/0x130
  device_initial_probe+0x10/0x18
  bus_probe_device+0x98/0xa0
  deferred_probe_work_func+0x74/0xb0
  process_one_work+0x294/0x6f0
  worker_thread+0x238/0x460
  kthread+0x120/0x128
  ret_from_fork+0x10/0x1c
 Code: aa0003f3 97ffeb1a d1002274 f85f8261 (f9401c20)
 ---[ end trace 4c46c7fd7cd07245 ]---

Reproducing on v4.20-rc1 requires CONFIG_IPMMU_VMSA=y, and whitelisting
SATA for IOMMU use (https://patchwork.kernel.org/patch/10544059/).
For older versions you also have to backport commit 3a0832d093693ede ("arm64:
dts: renesas: salvator-xs: enable SATA").

Actually there are two issues at hand:

1) drivers/iommu/io-pgtable-arm.c:__arm_lpae_alloc_pages() allocates
memory (above 4 GiB) and maps it for DMA use, but it is rejected due
to:

dma_map_single(dev, pages, size, DMA_TO_DEVICE) != virt_to_phys(pages)


--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -27,7 +27,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size)
 if (!dev->dma_mask)
 return false;

-   return addr + 

IOMMU breakage on arm64 (was: Re: dma-direct: implement complete bus_dma_mask handling)

2018-11-06 Thread Geert Uytterhoeven
Hi Christoph et al,

On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List
 wrote:
> Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12
> Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced
> Refname:refs/heads/master
> Web:
> https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12
> Author: Christoph Hellwig 
> AuthorDate: Thu Sep 20 14:04:08 2018 +0200
> Committer:  Christoph Hellwig 
> CommitDate: Mon Oct 1 07:28:03 2018 -0700
>
> dma-direct: implement complete bus_dma_mask handling
>
> Instead of rejecting devices with a too small bus_dma_mask we can handle
> by taking the bus dma_mask into account for allocations and bounce
> buffering decisions.
>
> Signed-off-by: Christoph Hellwig 

I  have bisected the following crash to the above commit:

ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for
IOMMU page tables
sata_rcar ee30.sata: Unable to initialize IPMMU context
iommu: Failed to add device ee30.sata to group 0: -22
Unable to handle kernel NULL pointer dereference at virtual
address 0038
Mem abort info:
  ESR = 0x9604
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x0004
  CM = 0, WnR = 0
[0038] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
CPU: 2 PID: 51 Comm: kworker/2:1 Not tainted
4.20.0-rc1-arm64-renesas-dirty #74
Hardware name: Renesas Salvator-X 2nd version board based on
r8a7795 ES2.0+ (DT)
Workqueue: events deferred_probe_work_func
pstate: 6005 (nZCv daif -PAN -UAO)
pc : ipmmu_domain_free+0x1c/0xa0
lr : ipmmu_domain_free+0x14/0xa0
sp : 09c9b990
x29: 09c9b990 x28: 
x27: 08dff000 x26: 
x25: 08dd9000 x24: 0014
x23: 8006fffdbb20 x22: 08dff000
x21: 8006f898e680 x20: 8006f8fa6c00
x19: 8006f8fa6c08 x18: 0037
x17: 0020 x16: 08a8f780
x15: 8006fb096f10 x14: 8006f93731c8
x13:  x12: 8006fb096f10
x11: 8006f9372fe8 x10: 08dff708
x9 : 8006fb096f50 x8 : 08dff708
x7 : 089f1858 x6 : 
x5 :  x4 : 8006f89a3000
x3 : 8006f7131000 x2 : 8006fb2b5700
x1 :  x0 : 0028
Process kworker/2:1 (pid: 51, stack limit = 0x(ptrval))
Call trace:
 ipmmu_domain_free+0x1c/0xa0
 iommu_group_release+0x48/0x68
 kobject_put+0x74/0xe8
 kobject_del.part.0+0x3c/0x50
 kobject_put+0x60/0xe8
 iommu_group_get_for_dev+0xa8/0x1f0
 ipmmu_add_device+0x1c/0x40
 of_iommu_configure+0x118/0x190
 of_dma_configure+0xcc/0x1f0
 platform_dma_configure+0x18/0x28
 really_probe+0x94/0x2a8
 driver_probe_device+0x58/0x100
 __device_attach_driver+0x90/0xd0
 bus_for_each_drv+0x64/0xc8
 __device_attach+0xd8/0x130
 device_initial_probe+0x10/0x18
 bus_probe_device+0x98/0xa0
 deferred_probe_work_func+0x74/0xb0
 process_one_work+0x294/0x6f0
 worker_thread+0x238/0x460
 kthread+0x120/0x128
 ret_from_fork+0x10/0x1c
Code: aa0003f3 97ffeb1a d1002274 f85f8261 (f9401c20)
---[ end trace 4c46c7fd7cd07245 ]---

Reproducing on v4.20-rc1 requires CONFIG_IPMMU_VMSA=y, and whitelisting
SATA for IOMMU use (https://patchwork.kernel.org/patch/10544059/).
For older versions you also have to backport commit 3a0832d093693ede ("arm64:
dts: renesas: salvator-xs: enable SATA").

Actually there are two issues at hand:

1) drivers/iommu/io-pgtable-arm.c:__arm_lpae_alloc_pages() allocates
   memory (above 4 GiB) and maps it for DMA use, but it is rejected due
   to:

   dma_map_single(dev, pages, size, DMA_TO_DEVICE) != virt_to_phys(pages)

> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -27,7 +27,8 @@ static inline bool dma_capable(struct device *dev, 
> dma_addr_t addr, size_t size)
> if (!dev->dma_mask)
> return false;
>
> -   return addr + size - 1 <= *dev->dma_mask;
> +   return addr + size - 1 <=
> +   min_not_zero(*dev->dma_mask, dev->bus_dma_mask);

   *dev->dma_mask = 0xff (40-bit)
   dev->bus_dma_mask = 0x (32-bit)

   Hence before, we had (in __arm_lpae_alloc_pages()):

   arm-lpae io-pgtable: pages = 8006f88f3000
   arm-lpae io-pgtable: dma = 0x0007388f3000
   arm-lpae io-pgtable: virt_to_phys(pages) = 0x0007388f3000

   After this change, we have:

   arm-lpae io-pgtable: pages = 8006f882b000
   arm-lpae io-pgtable: dma = 0x74009000
   arm-lpae io-pgtable: virt_to_phys(pages) = 0x00073882b000

   And SATA runs without using the IOMMU.

2) The Renesas IPMMU driver doesn't handle the above 

Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-06 Thread Aaro Koskinen
Hi,

On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
> 
> Make sure we only touch it if the DT actually said so.
> 
> Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> Reported-by: Aaro Koskinen 
> Reported-by: Jean-Philippe Brucker 
> Signed-off-by: Robin Murphy 

Tested-by: Aaro Koskinen 

This fixes the MMC driver DMA mask issue on OCTEON.

Thanks,

A.

> ---
> 
> Sorry about that... I guess I only have test setups that either have
> dma-ranges or where a 32-bit bus mask goes unnoticed :(
> 
> The Octeon and SMMU issues sound like they're purely down to this, and
> it's probably related to at least one of John's Hikey woes.
> 
> Robin.
> 
>  drivers/of/device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index 0f27fad9fe94..757ae867674f 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct 
> device_node *np, bool force_dma)
>* set by the driver.
>*/
>   mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
> - dev->bus_dma_mask = mask;
> + if (!ret)
> + dev->bus_dma_mask = mask;
>   dev->coherent_dma_mask &= mask;
>   *dev->dma_mask &= mask;
>  
> -- 
> 2.19.1.dirty
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

2018-11-06 Thread Robin Murphy

On 02/11/2018 23:36, Nicolin Chen wrote:

On Fri, Nov 02, 2018 at 04:54:07PM +, Robin Murphy wrote:

On 01/11/2018 21:35, Nicolin Chen wrote:

The __GFP_ZERO will be passed down to the generic page allocation
routine which zeros everything page by page. This is safe to be a
generic way but not efficient for iommu allocation that organizes
contiguous pages using scatterlist.

So this changes drops __GFP_ZERO from the flag, and adds a manual
memset after page/sg allocations, using the length of scatterlist.

My test result of a 2.5MB size allocation shows iommu_dma_alloc()
takes 46% less time, reduced from averagely 925 usec to 500 usec.


Assuming this is for arm64, I'm somewhat surprised that memset() could be
that much faster than clear_page(), since they should effectively amount to
the same thing (a DC ZVA loop). What hardware is this on? Profiling to try


I am running with tegra186-p2771-.dtb so it's arm64 yes.


and see exactly where the extra time goes would be interesting too.


I re-ran the test to get some accuracy within the function and got:
1) pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
// reduced from 422 usec to 56 usec == 366 usec less
2) if (!(prot & IOMMU_CACHE)) {...} //flush routine
// reduced from 439 usec to 236 usec == 203 usec less
Note: new memset takes about 164 usec, resulting in 400 usec diff
   for the entire iommu_dma_alloc() function call.

It looks like this might be more than the diff between clear_page
and memset, and might be related to mapping and cache. Any idea?


Hmm, I guess it might not be so much clear_page() itself as all the 
gubbins involved in getting there from prep_new_page(). I could perhaps 
make some vague guesses about how the A57 cores might get tickled by the 
different code patterns, but the Denver cores are well beyond my ability 
to reason about. Out of even further curiosity, how does the quick hack 
below compare?



@@ -568,6 +571,15 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES)
alloc_sizes = min_size;
+   /*
+* The generic zeroing in a length of one page size is slow,
+* so do it manually in a length of scatterlist size instead
+*/
+   if (gfp & __GFP_ZERO) {
+   gfp &= ~__GFP_ZERO;
+   gfp_zero = true;
+   }


Or just mask it out in __iommu_dma_alloc_pages()?


Yea, the change here would be neater then.


@@ -581,6 +593,12 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;
+   if (gfp_zero) {
+   /* Now zero all the pages in the scatterlist */
+   for_each_sg(sgt.sgl, s, sgt.orig_nents, i)
+   memset(sg_virt(s), 0, s->length);


What if the pages came from highmem? I know that doesn't happen on arm64
today, but the point of this code *is* to be generic, and other users will
arrive eventually.


Hmm, so it probably should use sg_miter_start/stop() too? Looking
at the flush routine doing in PAGE_SIZE for each iteration, would
be possible to map and memset contiguous pages together? Actually
the flush routine might be also optimized if we can map contiguous
pages.


I suppose the ideal point at which to do it would be after the remapping 
when we have the entire buffer contiguous in vmalloc space and can make 
best use of prefetchers etc. - DMA_ATTR_NO_KERNEL_MAPPING is a bit of a 
spanner in the works, but we could probably accommodate a special case 
for that. As Christoph points out, this isn't really the place to be 
looking for performance anyway (unless it's pathologically bad as per 
the DMA_ATTR_ALLOC_SINGLE_PAGES fun), but if we're looking at pulling 
the remapping out of the arch code, maybe we could aim to rework the 
zeroing completely as part of that.


Robin.

->8-
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b04753b204..7d28db3bf4bf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -569,7 +569,7 @@ struct page **iommu_dma_alloc(struct device *dev, 
size_t size, gfp_t gfp,

alloc_sizes = min_size;

count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+	pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp 
& ~__GFP_ZERO);

if (!pages)
return NULL;

@@ -581,15 +581,18 @@ struct page **iommu_dma_alloc(struct device *dev, 
size_t size, gfp_t gfp,

if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;

-   if (!(prot & IOMMU_CACHE)) {
+   {
struct sg_mapping_iter miter;
/*
 * The CPU-centric flushing implied by SG_MITER_TO_SG isn't
 * 

Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-11-06 Thread j...@8bytes.org
On Mon, Oct 22, 2018 at 12:50:56PM +0100, Robin Murphy wrote:
> To me, that sounds like a very good argument for having separate "attach as
> primary domain" and "attach as aux domain" APIs.

I agree with that, overloading iommu_attach_device() to support
aux-domains is not going to be a maintainable solution. I'd like this
to be a two-level approach, where the aux-domains are sub-domains of a
primary domain (naming is still to-be-improved):


struct iommu_domain *domain;

domain = iommu_alloc_aux_domain();

iommu_attach_device(pdev, domain); /* Fails if device has not
  support for this
  domain-type */

/* Bind a process address space to the aux-domain */
sva_handle = iommu_sva_bind_mm(domain, current, ...);
pasid = iommu_sva_get_pasid(sva_handle);

mdev_handle = iommu_mdev_alloc_domain(domain);
iommu_mdev_map(mdev_handle, ...);
iommu_mdev_unmap(mdev_handle, ...);

/* Do more work */

iommu_sva_unbind_mm(sva_handle);

iommu_mdev_free_domain(mdev_handle);

iommu_detach_device(domain, pdev);


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vtd: Fix NULL pointer dereference in prq_event_thread()

2018-11-06 Thread Joerg Roedel
On Mon, Nov 05, 2018 at 10:18:58AM +0800, Lu Baolu wrote:
> When handling page request without pasid event, go to "no_pasid"
> branch instead of "bad_req". Otherwise, a NULL pointer deference
> will happen there.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Sohil Mehta 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied for v4.20, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/ipmmu-vmsa: Hook up r8a77990 DT matching code

2018-11-06 Thread Joerg Roedel
On Wed, Oct 17, 2018 at 11:13:22AM +0200, Simon Horman wrote:
> From: Hai Nguyen Pham 
> 
> Support the R-Car E3 (r8a77990) IPMMU.
> 
> Signed-off-by: Hai Nguyen Pham 
> Signed-off-by: Kazuya Mizuguchi 
> [simon: rebased; dropped no longer required IOMMU_OF_DECLARE hunk]
> Signed-off-by: Simon Horman 

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: Do physical merging in iommu_map_sg()

2018-11-06 Thread Joerg Roedel
On Thu, Oct 11, 2018 at 04:56:42PM +0100, Robin Murphy wrote:
> The original motivation for iommu_map_sg() was to give IOMMU drivers the
> chance to map an IOVA-contiguous scatterlist as efficiently as they
> could. It turns out that there isn't really much driver-specific
> business involved there, so now that the default implementation is
> mandatory let's just improve that - the main thing we're after is to use
> larger pages wherever possible, and as long as domain->pgsize_bitmap
> reflects reality, iommu_map() can already do that in a generic way. All
> we need to do is detect physically-contiguous segments and batch them
> into a single map operation, since whatever we do here is transparent to
> our caller and not bound by any segment-length restrictions on the list
> itself.
> 
> Speaking of efficiency, there's really very little point in duplicating
> the checks that iommu_map() is going to do anyway, so those get cleared
> up in the process.
> 
> Signed-off-by: Robin Murphy 

Looks correct, applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

2018-11-06 Thread Robin Murphy

On 05/11/2018 14:58, Christoph Hellwig wrote:

On Fri, Nov 02, 2018 at 04:36:13PM -0700, Nicolin Chen wrote:

What if the pages came from highmem? I know that doesn't happen on arm64
today, but the point of this code *is* to be generic, and other users will
arrive eventually.


Hmm, so it probably should use sg_miter_start/stop() too? Looking
at the flush routine doing in PAGE_SIZE for each iteration, would
be possible to map and memset contiguous pages together? Actually
the flush routine might be also optimized if we can map contiguous
pages.


FYI, I have patches I plan to submit soon that gets rid of the
struct scatterlist use in this code to simplify it:


...and I have some significant objections to that simplification which I 
plan to respond with ;)


(namely that it defaults the whole higher-order page allocation business 
which will have varying degrees of performance impact on certain cases)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-06 Thread Robin Murphy
of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.

Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
Reported-by: Aaro Koskinen 
Reported-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---

Sorry about that... I guess I only have test setups that either have
dma-ranges or where a 32-bit bus mask goes unnoticed :(

The Octeon and SMMU issues sound like they're purely down to this, and
it's probably related to at least one of John's Hikey woes.

Robin.

 drivers/of/device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 0f27fad9fe94..757ae867674f 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
-   dev->bus_dma_mask = mask;
+   if (!ret)
+   dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;
 
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu/vt-d: Group and domain relationship

2018-11-06 Thread James Sewart via iommu
Hey Lu,

Would you be able to go into more detail about the issues with
allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc?


Cheers,
James.
On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu  wrote:
>
> Hi,
>
> On 10/30/18 10:18 PM, James Sewart via iommu wrote:
> > Hey,
> >
> > I’ve been investigating the relationship between iommu groups and domains
> > on our systems and have a few question. Why does the intel iommu code not
> > allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain
> > type has the side effect that the default_domain for an iommu group is not
> > set, which, when using for e.g. dma_map_ops.map_page, means a domain is
> > allocated per device.
>
> Intel vt-d driver doesn't implement the default domain and allocates
> domain only on demanded. There are lots of things to do before we allow
> iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED.
>
> Best regards,
> Lu Baolu
>
> >
> > This seems to be the opposite behaviour to the AMD iommu code which
> > supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu
> > group if a domain is not attached to the device rather than allocating a
> > new one. On AMD every device in an iommu group will share the same domain.
> >
> > Appended is what I think a patch to implement domain_alloc for
> > IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing
> > shows each device in a group will share a domain by default, it also
> > allows allocating a new dma domain that can be successfully attached to a
> > group with iommu_attach_group.
> >
> > Looking for comment on why the behaviour is how it is currently and if
> > there are any issues with the solution I’ve been testing.
> >
> > Cheers,
> > James.
> >
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index bff2abd6..3a58389f 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5170,10 +5170,15 @@ static struct iommu_domain 
> > *intel_iommu_domain_alloc(unsigned type)
> >   struct dmar_domain *dmar_domain;
> >   struct iommu_domain *domain;
> >
> > - if (type != IOMMU_DOMAIN_UNMANAGED)
> > + if (type == IOMMU_DOMAIN_UNMANAGED)
> > + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
> > + else if(type == IOMMU_DOMAIN_DMA)
> > + dmar_domain = alloc_domain(0);
> > + else if(type == IOMMU_DOMAIN_IDENTITY)
> > + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
> > + else
> >   return NULL;
> >
> > - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
> >   if (!dmar_domain) {
> >   pr_err("Can't allocate dmar_domain\n");
> >   return NULL;
> > @@ -5186,9 +5191,12 @@ static struct iommu_domain 
> > *intel_iommu_domain_alloc(unsigned type)
> >   domain_update_iommu_cap(dmar_domain);
> >
> >   domain = _domain->domain;
> > - domain->geometry.aperture_start = 0;
> > - domain->geometry.aperture_end   = __DOMAIN_MAX_ADDR(dmar_domain->gaw);
> > - domain->geometry.force_aperture = true;
> > +
> > + if (type == IOMMU_DOMAIN_UNMANAGED) {
> > + domain->geometry.aperture_start = 0;
> > + domain->geometry.aperture_end   = 
> > __DOMAIN_MAX_ADDR(dmar_domain->gaw);
> > + domain->geometry.force_aperture = true;
> > + }
> >
> >   return domain;
> >   }
> > ___
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu