date:20190815

Re: [PATCH v6 5/8] iommu: Add bounce page APIs

2019-08-15 Thread Christoph Hellwig

On Fri, Aug 16, 2019 at 10:45:13AM +0800, Lu Baolu wrote:
> Okay. I understand that adding these APIs in iommu.c is not a good idea.
> And, I also don't think merging the bounce buffer implementation into
> iommu_map() is feasible since iommu_map() is not DMA API centric.
>
> The bounce buffer implementation will eventually be part of DMA APIs
> defined in dma-iommu.c, but currently those APIs are not ready for x86
> use yet. So I will put them in iommu/vt-d driver for this time being and
> will move them to dma-iommu.c later.

I think they are more or less ready actually, we just need more people
reviewing the conversions.  Tom just reposted the AMD one which will need
a few more reviews, and he has an older patchset for intel-iommu as well
that could use a some more eyes.

Re: [PATCH v6 5/8] iommu: Add bounce page APIs

2019-08-15 Thread Lu Baolu


Hi Joerg,

On 8/15/19 11:48 PM, Joerg Roedel wrote:

On Thu, Aug 15, 2019 at 02:15:32PM +0800, Lu Baolu wrote:

iommu_map/unmap() APIs haven't parameters for dma direction and
attributions. These parameters are elementary for DMA APIs. Say,
after map, if the dma direction is TO_DEVICE and a bounce buffer is
used, we must sync the data from the original dma buffer to the bounce
buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap,
we need to sync the data from the bounce buffer onto the original
buffer.


The DMA direction from DMA-API maps to the protections in iommu_map():

DMA_FROM_DEVICE:IOMMU_WRITE
DMA_TO_DEVICE:  IOMMU_READ
DMA_BIDIRECTIONAL   IOMMU_READ | IOMMU_WRITE

And for the sync DMA-API also has separate functions for either
direction. So I don't see why these extra functions are needed in the
IOMMU-API.



Okay. I understand that adding these APIs in iommu.c is not a good idea.
And, I also don't think merging the bounce buffer implementation into
iommu_map() is feasible since iommu_map() is not DMA API centric.

The bounce buffer implementation will eventually be part of DMA APIs
defined in dma-iommu.c, but currently those APIs are not ready for x86
use yet. So I will put them in iommu/vt-d driver for this time being and
will move them to dma-iommu.c later.

Does this work for you?

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM

2019-08-15 Thread Jacob Pan

On Fri, 16 Aug 2019 00:17:44 +0300
Andy Shevchenko  wrote:

> On Thu, Aug 15, 2019 at 11:52 PM Jacob Pan
>  wrote:
> >
> > Use combined macros for_each_svm_dev() to simplify SVM device
> > iteration and error checking.
> >
> > Suggested-by: Andy Shevchenko 
> > Signed-off-by: Jacob Pan 
> > Reviewed-by: Eric Auger 
> > ---
> >  drivers/iommu/intel-svm.c | 85
> > +++ 1 file changed, 41
> > insertions(+), 44 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 5a688a5..ea6f2e2 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -218,6 +218,10 @@ static const struct mmu_notifier_ops
> > intel_mmuops = { static DEFINE_MUTEX(pasid_mutex);
> >  static LIST_HEAD(global_svm_list);
> >
> > +#define for_each_svm_dev(svm, dev) \
> > +   list_for_each_entry(sdev, >devs, list) \  
> 
> > +   if (dev == sdev->dev)   \  
> 
> This should be
>   if (dev != sdev->dev) {} else
> and no trailing \ is neeeded.
> 
> The rationale of above form to avoid
> for_each_foo() {
> } else {
>   ...WTF?!..
> }
> 
I understand, but until we have the else {} case we don't have anything
to avoid. The current code only has a simple positive logic.

> > +
> >  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > @@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ goto out;
> > }
> >
> > -   list_for_each_entry(sdev, >devs, list)
> > {
> > -   if (dev == sdev->dev) {
> > -   if (sdev->ops != ops) {
> > -   ret = -EBUSY;
> > -   goto out;
> > -   }
> > -   sdev->users++;
> > -   goto success;
> > +   for_each_svm_dev(svm, dev) {
> > +   if (sdev->ops != ops) {
> > +   ret = -EBUSY;
> > +   goto out;
> > }
> > +   sdev->users++;
> > +   goto success;
> > }
> >
> > break;
> > @@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev,
> > int pasid) goto out;
> >
> > svm = ioasid_find(NULL, pasid, NULL);
> > -   if (IS_ERR(svm)) {
> > +   if (IS_ERR_OR_NULL(svm)) {
> > ret = PTR_ERR(svm);
> > goto out;
> > }
> >
> > -   if (!svm)
> > -   goto out;
> > +   for_each_svm_dev(svm, dev) {
> > +   ret = 0;
> > +   sdev->users--;
> > +   if (!sdev->users) {
> > +   list_del_rcu(>list);
> > +   /* Flush the PASID cache and IOTLB for this
> > device.
> > +* Note that we do depend on the hardware
> > *not* using
> > +* the PASID any more. Just as we depend on
> > other
> > +* devices never using PASIDs that they
> > have no right
> > +* to use. We have a *shared* PASID table,
> > because it's
> > +* large and has to be physically
> > contiguous. So it's
> > +* hard to be as defensive as we might
> > like. */
> > +   intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > +   intel_flush_svm_range_dev(svm, sdev, 0, -1,
> > 0, !svm->mm);
> > +   kfree_rcu(sdev, rcu);
> > +
> > +   if (list_empty(>devs)) {
> > +   ioasid_free(svm->pasid);
> > +   if (svm->mm)
> > +
> > mmu_notifier_unregister(>notifier, svm->mm);
> >
> > -   list_for_each_entry(sdev, >devs, list) {
> > -   if (dev == sdev->dev) {
> > -   ret = 0;
> > -   sdev->users--;
> > -   if (!sdev->users) {
> > -   list_del_rcu(>list);
> > -   /* Flush the PASID cache and IOTLB
> > for this device.
> > -* Note that we do depend on the
> > hardware *not* using
> > -* the PASID any more. Just as we
> > depend on other
> > -* devices never using PASIDs that
> > they have no right
> > -* to use. We have a *shared* PASID
> > table, because it's
> > -* large and has to be physically
> >

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-15 Thread Kuppuswamy Sathyanarayanan




On 8/15/19 3:20 PM, Bjorn Helgaas wrote:

[+cc Joerg, David, iommu list: because IOMMU drivers are the only
callers of pci_enable_pri() and pci_enable_pasid()]

On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:

From: Kuppuswamy Sathyanarayanan 

When IOMMU tries to enable Page Request Interface (PRI) for VF device
in iommu_enable_dev_iotlb(), it always fails because PRI support for
PCIe VF device is currently broken. Current implementation expects
the given PCIe device (PF & VF) to implement PRI capability before
enabling the PRI support. But this assumption is incorrect. As per PCIe
spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
PRI of the PF and not implement it. Hence we need to create exception
for handling the PRI support for PCIe VF device.

Also, since PRI is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
PRI enable/disable call.
2. Use reference count logic to track the usage of PRI resource.
3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Wait, why do we need this at all?  I agree the spec says VFs may not
implement PRI or PASID capabilities and that VFs use the PRI and
PASID of the PF.

But why do we need to support pci_enable_pri() and pci_enable_pasid()
for VFs?  There's nothing interesting we can *do* in the VF, and
passing it off to the PF adds all this locking mess.  For VFs, can we
just make them do nothing or return -EINVAL?  What functionality would
we be missing if we did that?


Currently PRI/PASID capabilities are not enabled by default. IOMMU can
enable PRI/PASID for VF first (and not enable it for PF). In this case,
doing nothing for VF device will break the functionality.

Also the PRI/PASID config options like "PRI Outstanding Page Request 
Allocation"
or "PASID Execute Permission" or "PASID Privileged Mode" are currently 
configured
as per device feature. And hence there is a chance for VF/PF to use 
different

values for these options.


(Obviously returning -EINVAL would require tweaks in the callers to
either avoid the call for VFs or handle the -EINVAL gracefully.)


Cc: Ashok Raj 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan 

---
  drivers/pci/ats.c   | 143 ++--
  include/linux/pci.h |   2 +
  2 files changed, 112 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 1f4be27a071d..079dc544 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
if (pdev->is_virtfn)
return;
  
+	mutex_init(>pri_lock);

+
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
if (!pos)
return;
@@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
  {
u16 control, status;
u32 max_requests;
+   int ret = 0;
+   struct pci_dev *pf = pci_physfn(pdev);
  
-	if (WARN_ON(pdev->pri_enabled))

-   return -EBUSY;
+   mutex_lock(>pri_lock);
  
-	if (!pdev->pri_cap)

-   return -EINVAL;
+   if (WARN_ON(pdev->pri_enabled)) {
+   ret = -EBUSY;
+   goto pri_unlock;
+   }
  
-	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );

-   if (!(status & PCI_PRI_STATUS_STOPPED))
-   return -EBUSY;
+   if (!pf->pri_cap) {
+   ret = -EINVAL;
+   goto pri_unlock;
+   }
+
+   if (pdev->is_virtfn && pf->pri_enabled)
+   goto update_status;
+
+   /*
+* Before updating PRI registers, make sure there is no
+* outstanding PRI requests.
+*/
+   pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, );
+   if (!(status & PCI_PRI_STATUS_STOPPED)) {
+   ret = -EBUSY;
+   goto pri_unlock;
+   }
  
-	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,

- _requests);
+   pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests);
reqs = min(max_requests, reqs);
-   pdev->pri_reqs_alloc = reqs;
-   pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+   pf->pri_reqs_alloc = reqs;
+   pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
  
  	control = PCI_PRI_CTRL_ENABLE;

-   pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+   pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
  
-	pdev->pri_enabled = 1;

+   /*
+* If PRI is not already enabled in PF, increment the PF
+* pri_ref_cnt to track the usage of PRI interface.
+*/
+   if (pdev->is_virtfn && !pf->pri_enabled) {
+   atomic_inc(>pri_ref_cnt);
+   pf->pri_enabled = 1;
+   }
  
-	return 0;

+update_status:
+

Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices

2019-08-15 Thread Bjorn Helgaas

[+cc Joerg, David, iommu list: because IOMMU drivers are the only
callers of pci_enable_pri() and pci_enable_pasid()]

On Thu, Aug 01, 2019 at 05:06:01PM -0700, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan 
> 
> When IOMMU tries to enable Page Request Interface (PRI) for VF device
> in iommu_enable_dev_iotlb(), it always fails because PRI support for
> PCIe VF device is currently broken. Current implementation expects
> the given PCIe device (PF & VF) to implement PRI capability before
> enabling the PRI support. But this assumption is incorrect. As per PCIe
> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> PRI of the PF and not implement it. Hence we need to create exception
> for handling the PRI support for PCIe VF device.
> 
> Also, since PRI is a shared resource between PF/VF, following rules
> should apply.
> 
> 1. Use proper locking before accessing/modifying PF resources in VF
>PRI enable/disable call.
> 2. Use reference count logic to track the usage of PRI resource.
> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Wait, why do we need this at all?  I agree the spec says VFs may not
implement PRI or PASID capabilities and that VFs use the PRI and
PASID of the PF.

But why do we need to support pci_enable_pri() and pci_enable_pasid()
for VFs?  There's nothing interesting we can *do* in the VF, and
passing it off to the PF adds all this locking mess.  For VFs, can we
just make them do nothing or return -EINVAL?  What functionality would
we be missing if we did that?

(Obviously returning -EINVAL would require tweaks in the callers to
either avoid the call for VFs or handle the -EINVAL gracefully.)

> Cc: Ashok Raj 
> Cc: Keith Busch 
> Suggested-by: Ashok Raj 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  drivers/pci/ats.c   | 143 ++--
>  include/linux/pci.h |   2 +
>  2 files changed, 112 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 1f4be27a071d..079dc544 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>   if (pdev->is_virtfn)
>   return;
>  
> + mutex_init(>pri_lock);
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>   if (!pos)
>   return;
> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>   u16 control, status;
>   u32 max_requests;
> + int ret = 0;
> + struct pci_dev *pf = pci_physfn(pdev);
>  
> - if (WARN_ON(pdev->pri_enabled))
> - return -EBUSY;
> + mutex_lock(>pri_lock);
>  
> - if (!pdev->pri_cap)
> - return -EINVAL;
> + if (WARN_ON(pdev->pri_enabled)) {
> + ret = -EBUSY;
> + goto pri_unlock;
> + }
>  
> - pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, );
> - if (!(status & PCI_PRI_STATUS_STOPPED))
> - return -EBUSY;
> + if (!pf->pri_cap) {
> + ret = -EINVAL;
> + goto pri_unlock;
> + }
> +
> + if (pdev->is_virtfn && pf->pri_enabled)
> + goto update_status;
> +
> + /*
> +  * Before updating PRI registers, make sure there is no
> +  * outstanding PRI requests.
> +  */
> + pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, );
> + if (!(status & PCI_PRI_STATUS_STOPPED)) {
> + ret = -EBUSY;
> + goto pri_unlock;
> + }
>  
> - pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> -   _requests);
> + pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests);
>   reqs = min(max_requests, reqs);
> - pdev->pri_reqs_alloc = reqs;
> - pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> + pf->pri_reqs_alloc = reqs;
> + pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>  
>   control = PCI_PRI_CTRL_ENABLE;
> - pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> + pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> - pdev->pri_enabled = 1;
> + /*
> +  * If PRI is not already enabled in PF, increment the PF
> +  * pri_ref_cnt to track the usage of PRI interface.
> +  */
> + if (pdev->is_virtfn && !pf->pri_enabled) {
> + atomic_inc(>pri_ref_cnt);
> + pf->pri_enabled = 1;
> + }
>  
> - return 0;
> +update_status:
> + atomic_inc(>pri_ref_cnt);
> + pdev->pri_enabled = 1;
> +pri_unlock:
> + mutex_unlock(>pri_lock);
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pri);
>  
> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>  void pci_disable_pri(struct pci_dev *pdev)
>  {
>   u16 control;
> + struct pci_dev *pf = pci_physfn(pdev);
>  
> - if (WARN_ON(!pdev->pri_enabled))
> -

Re: [PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM

2019-08-15 Thread Andy Shevchenko

On Thu, Aug 15, 2019 at 11:52 PM Jacob Pan
 wrote:
>
> Use combined macros for_each_svm_dev() to simplify SVM device iteration
> and error checking.
>
> Suggested-by: Andy Shevchenko 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> ---
>  drivers/iommu/intel-svm.c | 85 
> +++
>  1 file changed, 41 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 5a688a5..ea6f2e2 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -218,6 +218,10 @@ static const struct mmu_notifier_ops intel_mmuops = {
>  static DEFINE_MUTEX(pasid_mutex);
>  static LIST_HEAD(global_svm_list);
>
> +#define for_each_svm_dev(svm, dev) \
> +   list_for_each_entry(sdev, >devs, list) \

> +   if (dev == sdev->dev)   \

This should be
  if (dev != sdev->dev) {} else
and no trailing \ is neeeded.

The rationale of above form to avoid
for_each_foo() {
} else {
  ...WTF?!..
}

> +
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
> svm_dev_ops *ops)
>  {
> struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> @@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, 
> int flags, struct svm_dev_
> goto out;
> }
>
> -   list_for_each_entry(sdev, >devs, list) {
> -   if (dev == sdev->dev) {
> -   if (sdev->ops != ops) {
> -   ret = -EBUSY;
> -   goto out;
> -   }
> -   sdev->users++;
> -   goto success;
> +   for_each_svm_dev(svm, dev) {
> +   if (sdev->ops != ops) {
> +   ret = -EBUSY;
> +   goto out;
> }
> +   sdev->users++;
> +   goto success;
> }
>
> break;
> @@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
> goto out;
>
> svm = ioasid_find(NULL, pasid, NULL);
> -   if (IS_ERR(svm)) {
> +   if (IS_ERR_OR_NULL(svm)) {
> ret = PTR_ERR(svm);
> goto out;
> }
>
> -   if (!svm)
> -   goto out;
> +   for_each_svm_dev(svm, dev) {
> +   ret = 0;
> +   sdev->users--;
> +   if (!sdev->users) {
> +   list_del_rcu(>list);
> +   /* Flush the PASID cache and IOTLB for this device.
> +* Note that we do depend on the hardware *not* using
> +* the PASID any more. Just as we depend on other
> +* devices never using PASIDs that they have no right
> +* to use. We have a *shared* PASID table, because 
> it's
> +* large and has to be physically contiguous. So it's
> +* hard to be as defensive as we might like. */
> +   intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, 
> !svm->mm);
> +   kfree_rcu(sdev, rcu);
> +
> +   if (list_empty(>devs)) {
> +   ioasid_free(svm->pasid);
> +   if (svm->mm)
> +   
> mmu_notifier_unregister(>notifier, svm->mm);
>
> -   list_for_each_entry(sdev, >devs, list) {
> -   if (dev == sdev->dev) {
> -   ret = 0;
> -   sdev->users--;
> -   if (!sdev->users) {
> -   list_del_rcu(>list);
> -   /* Flush the PASID cache and IOTLB for this 
> device.
> -* Note that we do depend on the hardware 
> *not* using
> -* the PASID any more. Just as we depend on 
> other
> -* devices never using PASIDs that they have 
> no right
> -* to use. We have a *shared* PASID table, 
> because it's
> -* large and has to be physically contiguous. 
> So it's
> -* hard to be as defensive as we might like. 
> */
> -   intel_pasid_tear_down_entry(iommu, dev, 
> svm->pasid);
> -   intel_flush_svm_range_dev(svm, sdev, 0, -1, 
> 0, !svm->mm);
> -   kfree_rcu(sdev, rcu);
> -
> -

[PATCH v5 07/19] iommu: Add I/O ASID allocator

2019-08-15 Thread Jacob Pan

From: Jean-Philippe Brucker 

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/Kconfig  |   4 ++
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 151 +
 include/linux/ioasid.h |  47 +++
 4 files changed, 203 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd..0ade8a0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,10 @@
 config IOMMU_IOVA
tristate
 
+# The IOASID library may also be used by non-IOMMU_API users
+config IOASID
+   tristate
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
bool
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36a..011429e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index 000..6fbea76
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ioasid_data {
+   ioasid_t id;
+   struct ioasid_set *set;
+   void *private;
+   struct rcu_head rcu;
+};
+
+static DEFINE_XARRAY_ALLOC(ioasid_xa);
+
+/**
+ * ioasid_set_data - Set private data for an allocated ioasid
+ * @ioasid: the ID to set data
+ * @data:   the private data
+ *
+ * For IOASID that is already allocated, private data can be set
+ * via this API. Future lookup can be done via ioasid_find.
+ */
+int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+   struct ioasid_data *ioasid_data;
+   int ret = 0;
+
+   xa_lock(_xa);
+   ioasid_data = xa_load(_xa, ioasid);
+   if (ioasid_data)
+   rcu_assign_pointer(ioasid_data->private, data);
+   else
+   ret = -ENOENT;
+   xa_unlock(_xa);
+
+   /*
+* Wait for readers to stop accessing the old private data, so the
+* caller can free it.
+*/
+   if (!ret)
+   synchronize_rcu();
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_set_data);
+
+/**
+ * ioasid_alloc - Allocate an IOASID
+ * @set: the IOASID set
+ * @min: the minimum ID (inclusive)
+ * @max: the maximum ID (inclusive)
+ * @private: data private to the caller
+ *
+ * Allocate an ID between @min and @max. The @private pointer is stored
+ * internally and can be retrieved with ioasid_find().
+ *
+ * Return: the allocated ID on success, or %INVALID_IOASID on failure.
+ */
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+ void *private)
+{
+   ioasid_t id;
+   struct ioasid_data *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return INVALID_IOASID;
+
+   data->set = set;
+   data->private = private;
+
+   if (xa_alloc(_xa, , data, XA_LIMIT(min, max), GFP_KERNEL)) {
+   pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+   goto exit_free;
+   }
+   data->id = id;
+
+   return id;
+exit_free:
+   kfree(data);
+   return INVALID_IOASID;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc);
+
+/**
+ * ioasid_free - Free an IOASID
+ * @ioasid: the ID to remove
+ */
+void ioasid_free(ioasid_t ioasid)
+{
+   struct ioasid_data *ioasid_data;
+
+   ioasid_data = xa_erase(_xa,

[PATCH v5 09/19] iommu: Introduce guest PASID bind function

2019-08-15 Thread Jacob Pan

Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explanation.

.-.  .---.
|   vIOMMU|  | Guest process mm, FL only |
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |  GP
'-'
Guest
--| Shadow |--- GP->HP* -
  vv  |
Host  v
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.-.
| |   |Set SL to GPA-HPA|
| |   '-'
'-'

Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables
 - GP = Guest PASID
 - HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu Yi L 
---
 drivers/iommu/iommu.c  | 20 
 include/linux/iommu.h  | 22 ++
 include/uapi/linux/iommu.h | 58 ++
 3 files changed, 100 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6228d5d..c19ea1f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1729,6 +1729,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, 
struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
+int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+  struct device *dev, struct iommu_gpasid_bind_data 
*data)
+{
+   if (unlikely(!domain->ops->sva_bind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_bind_gpasid(domain, dev, data);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+ioasid_t pasid)
+{
+   if (unlikely(!domain->ops->sva_unbind_gpasid))
+   return -ENODEV;
+
+   return domain->ops->sva_unbind_gpasid(dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 28f1a8c..91370e7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define IOMMU_READ (1 << 0)
@@ -232,6 +233,8 @@ struct iommu_sva_ops {
  * @detach_pasid_table: detach the pasid table
  * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @sva_bind_gpasid: bind guest pasid and mm
+ * @sva_unbind_gpasid: unbind guest pasid and mm
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -299,6 +302,10 @@ struct iommu_ops {
 struct iommu_page_response *msg);
int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
struct iommu_cache_invalidate_info *inv_info);
+   int (*sva_bind_gpasid)(struct iommu_domain *domain,
+   struct device *dev, struct iommu_gpasid_bind_data 
*data);
+
+   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
unsigned long pgsize_bitmap;
 };
@@ -413,6 +420,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain 
*domain);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
  struct device *dev,
  struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev, struct iommu_gpasid_bind_data *data);
+extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -972,6 +983,17 @@

[PATCH v5 08/19] iommu/ioasid: Add custom allocators

2019-08-15 Thread Jacob Pan

Custom IOASID allocators can be registered at runtime and take precedence
over the default XArray allocator. They have these attributes:

- provides platform specific alloc()/free() functions with private data.
- allocation results lookup are not provided by the allocator, lookup
  request must be done by the IOASID framework by its own XArray.
- allocators can be unregistered at runtime, either fallback to the next
  custom allocator or to the default allocator.
- custom allocators can share the same set of alloc()/free() helpers, in
  this case they also share the same IOASID space, thus the same XArray.
- switching between allocators requires all outstanding IOASIDs to be
  freed unless the two allocators share the same alloc()/free() helpers.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2019/4/26/462
---
 drivers/iommu/ioasid.c | 302 +++--
 include/linux/ioasid.h |  28 +
 2 files changed, 320 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 6fbea76..b2d9ea5 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -17,7 +17,254 @@ struct ioasid_data {
struct rcu_head rcu;
 };
 
-static DEFINE_XARRAY_ALLOC(ioasid_xa);
+/*
+ * struct ioasid_allocator_data - Internal data structure to hold information
+ * about an allocator. There are two types of allocators:
+ *
+ * - Default allocator always has its own XArray to track the IOASIDs 
allocated.
+ * - Custom allocators may share allocation helpers with different private 
data.
+ *   Custom allocators share the same helper functions also share the same
+ *   XArray.
+ * Rules:
+ * 1. Default allocator is always available, not dynamically registered. This 
is
+ *to prevent race conditions with early boot code that want to register
+ *custom allocators or allocate IOASIDs.
+ * 2. Custom allocators take precedence over the default allocator.
+ * 3. When all custom allocators sharing the same helper functions are
+ *unregistered (e.g. due to hotplug), all outstanding IOASIDs must be
+ *freed.
+ * 4. When switching between custom allocators sharing the same helper
+ *functions, outstanding IOASIDs are preserved.
+ * 5. When switching between custom allocator and default allocator, all 
IOASIDs
+ *must be freed to ensure unadulterated space for the new allocator.
+ *
+ * @ops:   allocator helper functions and its data
+ * @list:  registered custom allocators
+ * @slist: allocators share the same ops but different data
+ * @flags: attributes of the allocator
+ * @xa xarray holds the IOASID space
+ * @users  number of allocators sharing the same ops and XArray
+ */
+struct ioasid_allocator_data {
+   struct ioasid_allocator_ops *ops;
+   struct list_head list;
+   struct list_head slist;
+#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */
+   unsigned long flags;
+   struct xarray xa;
+   refcount_t users;
+};
+
+static DEFINE_MUTEX(ioasid_allocator_lock);
+static LIST_HEAD(allocators_list);
+
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque);
+static void default_free(ioasid_t ioasid, void *opaque);
+
+static struct ioasid_allocator_ops default_ops = {
+   .alloc = default_alloc,
+   .free = default_free,
+};
+
+static struct ioasid_allocator_data default_allocator = {
+   .ops = _ops,
+   .flags = 0,
+   .xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC),
+};
+
+static struct ioasid_allocator_data *active_allocator = _allocator;
+
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque)
+{
+   ioasid_t id;
+
+   if (xa_alloc(_allocator.xa, , opaque, XA_LIMIT(min, max), 
GFP_KERNEL)) {
+   pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+   return INVALID_IOASID;
+   }
+
+   return id;
+}
+
+static void default_free(ioasid_t ioasid, void *opaque)
+{
+   struct ioasid_data *ioasid_data;
+
+   ioasid_data = xa_erase(_allocator.xa, ioasid);
+   kfree_rcu(ioasid_data, rcu);
+}
+
+/* Allocate and initialize a new custom allocator with its helper functions */
+static struct ioasid_allocator_data *ioasid_alloc_allocator(struct 
ioasid_allocator_ops *ops)
+{
+   struct ioasid_allocator_data *ia_data;
+
+   ia_data = kzalloc(sizeof(*ia_data), GFP_KERNEL);
+   if (!ia_data)
+   return NULL;
+
+   xa_init_flags(_data->xa, XA_FLAGS_ALLOC);
+   INIT_LIST_HEAD(_data->slist);
+   ia_data->flags |= IOASID_ALLOCATOR_CUSTOM;
+   ia_data->ops = ops;
+
+   /* For tracking custom allocators that share the same ops */
+   list_add_tail(>list, _data->slist);
+   refcount_set(_data->users, 1);
+
+   return ia_data;
+}
+
+static bool use_same_ops(struct ioasid_allocator_ops *a, struct 
ioasid_allocator_ops *b)
+{
+   return (a->free == b->free) &&

[PATCH v5 11/19] iommu/vt-d: Add custom allocator for IOASID

2019-08-15 Thread Jacob Pan

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu 
Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/Kconfig   |  1 +
 drivers/iommu/intel-iommu.c | 67 +
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0ade8a0..d5ca821 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -210,6 +210,7 @@ config INTEL_IOMMU_SVM
bool "Support for Shared Virtual Memory with Intel IOMMU"
depends on INTEL_IOMMU && X86
select PCI_PASID
+   select IOASID
select MMU_NOTIFIER
help
  Shared Virtual Memory (SVM) provides a facility for devices
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bdaed2d..b15ec58 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1693,6 +1693,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
+   ioasid_unregister_allocator(>pasid_allocator);
+
 #endif
 }
 
@@ -4619,6 +4621,46 @@ static int __init probe_acpi_namespace_devices(void)
return 0;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+   struct intel_iommu *iommu = data;
+   ioasid_t ioasid;
+
+   /*
+* VT-d virtual command interface always uses the full 20 bit
+* PASID range. Host can partition guest PASID range based on
+* policies but it is out of guest's control.
+*/
+   if (min < PASID_MIN || max > PASID_MAX)
+   return INVALID_IOASID;
+
+   if (vcmd_alloc_pasid(iommu, ))
+   return INVALID_IOASID;
+
+   return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+   struct iommu_pasid_alloc_info *svm;
+   struct intel_iommu *iommu = data;
+
+   if (!iommu)
+   return;
+   /*
+* Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+* We can only free the PASID when all the devices are unbond.
+*/
+   svm = ioasid_find(NULL, ioasid, NULL);
+   if (!svm) {
+   pr_warn("Freeing unbond IOASID %d\n", ioasid);
+   return;
+   }
+   vcmd_free_pasid(iommu, ioasid);
+}
+#endif
+
 int __init intel_iommu_init(void)
 {
int ret = -ENODEV;
@@ -4722,6 +4764,31 @@ int __init intel_iommu_init(void)
   "%s", iommu->name);
iommu_device_set_ops(>iommu, _iommu_ops);
iommu_device_register(>iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
+   /*
+* Register a custom ASID allocator if we are running
+* in a guest, the purpose is to have a system wide 
PASID
+* namespace among all PASID users.
+* There can be multiple vIOMMUs in each guest but only
+* one allocator is active. All vIOMMU allocators will
+* eventually be calling the same host allocator.
+*/
+   iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+   iommu->pasid_allocator.free = intel_ioasid_free;
+   iommu->pasid_allocator.pdata = (void *)iommu;
+   ret = 
ioasid_register_allocator(>pasid_allocator);
+   if (ret) {
+   pr_warn("Custom PASID allocator registeration 
failed\n");
+   /*
+* Disable scalable mode on this IOMMU if there
+* is no custom allocator. Mixing SM capable 
vIOMMU
+* and non-SM vIOMMU are not supported.
+*/
+   intel_iommu_sm = 0;
+   }
+   }
+#endif
}
 
bus_set_iommu(_bus_type, _iommu_ops);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 37fb0c9..80318c5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -543,6 +544,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
struct page_req_dsc *prq;
unsigned char prq_name[16];/* Name for PRQ interrupt */
+   struct

[PATCH v5 14/19] iommu/vt-d: Avoid duplicated code for PASID setup

2019-08-15 Thread Jacob Pan

After each setup for PASID entry, related translation caches must be flushed.
We can combine duplicated code into one function which is less error prone.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-pasid.c | 48 +
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index c0d1f28..9c5affc 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -512,6 +512,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
 
+static void pasid_flush_caches(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   int pasid, u16 did)
+{
+   if (!ecap_coherent(iommu->ecap))
+   clflush_cache_range(pte, sizeof(*pte));
+
+   if (cap_caching_mode(iommu->cap)) {
+   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+   iotlb_invalidation_with_pasid(iommu, did, pasid);
+   } else {
+   iommu_flush_write_buffer(iommu);
+   }
+}
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -557,16 +572,7 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
/* Setup Present and PASID Granular Transfer Type: */
pasid_set_translation_type(pte, 1);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -630,16 +636,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
@@ -673,16 +670,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 */
pasid_set_sre(pte);
pasid_set_present(pte);
-
-   if (!ecap_coherent(iommu->ecap))
-   clflush_cache_range(pte, sizeof(*pte));
-
-   if (cap_caching_mode(iommu->cap)) {
-   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-   iotlb_invalidation_with_pasid(iommu, did, pasid);
-   } else {
-   iommu_flush_write_buffer(iommu);
-   }
+   pasid_flush_caches(iommu, pte, pasid, did);
 
return 0;
 }
-- 
2.7.4

[PATCH v5 12/19] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

2019-08-15 Thread Jacob Pan

Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 12 ++--
 drivers/iommu/intel-pasid.c | 36 
 drivers/iommu/intel-svm.c   | 37 +
 3 files changed, 27 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b15ec58..96defc3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4989,7 +4989,7 @@ static void auxiliary_unlink_device(struct dmar_domain 
*domain,
domain->auxd_refcnt--;
 
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5007,10 +5007,10 @@ static int aux_domain_add_dev(struct dmar_domain 
*domain,
if (domain->default_pasid <= 0) {
int pasid;
 
-   pasid = intel_pasid_alloc_id(domain, PASID_MIN,
-pci_max_pasids(to_pci_dev(dev)),
-GFP_KERNEL);
-   if (pasid <= 0) {
+   /* No private data needed for the default pasid */
+   pasid = ioasid_alloc(NULL, PASID_MIN, 
pci_max_pasids(to_pci_dev(dev)) - 1,
+   NULL);
+   if (pasid == INVALID_IOASID) {
pr_err("Can't allocate default pasid\n");
return -ENODEV;
}
@@ -5046,7 +5046,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
spin_unlock(>lock);
spin_unlock_irqrestore(_domain_lock, flags);
if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   intel_pasid_free_id(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 
return ret;
 }
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 76bcbb2..c0d1f28 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
  */
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
-   int ret, min, max;
-
-   min = max_t(int, start, PASID_MIN);
-   max = min_t(int, end, intel_pasid_max_id);
-
-   WARN_ON(in_interrupt());
-   idr_preload(gfp);
-   spin_lock(_lock);
-   ret = idr_alloc(_idr, ptr, min, max, GFP_ATOMIC);
-   spin_unlock(_lock);
-   idr_preload_end();
-
-   return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
-   spin_lock(_lock);
-   idr_remove(_idr, pasid);
-   spin_unlock(_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
-   void *p;
-
-   spin_lock(_lock);
-   p = idr_find(_idr, pasid);
-   spin_unlock(_lock);
-
-   return p;
-}
 
 static int check_vcmd_pasid(struct intel_iommu *iommu)
 {
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 780de0c..5a688a5 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "intel-pasid.h"
@@ -324,16 +325,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
 
-   /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
-   ret = intel_pasid_alloc_id(svm,
-  !!cap_caching_mode(iommu->cap),
-  pasid_max - 1, GFP_KERNEL);
-   if (ret < 0) {
+   /* Do not use PASID 0, reserved for RID to PASID */
+   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+   pasid_max - 1, svm);
+   if (svm->pasid == INVALID_IOASID) {
kfree(svm);
kfree(sdev);
+   ret = ENOSPC;
goto out;
}
-   svm->pasid = ret;
svm->notifier.ops = _mmuops;
svm->mm = mm;
svm->flags = flags;
@@ -343,7 +343,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
if (mm) {
ret = mmu_notifier_register(>notifier, mm);
if (ret) {
-   intel_pasid_free_id(svm->pasid);
+   ioasid_free(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -359,7 +359,7 @@ int intel_svm_bind_mm(struct device *dev, int

[PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM

2019-08-15 Thread Jacob Pan

Use combined macros for_each_svm_dev() to simplify SVM device iteration
and error checking.

Suggested-by: Andy Shevchenko 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-svm.c | 85 +++
 1 file changed, 41 insertions(+), 44 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 5a688a5..ea6f2e2 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -218,6 +218,10 @@ static const struct mmu_notifier_ops intel_mmuops = {
 static DEFINE_MUTEX(pasid_mutex);
 static LIST_HEAD(global_svm_list);
 
+#define for_each_svm_dev(svm, dev) \
+   list_for_each_entry(sdev, >devs, list) \
+   if (dev == sdev->dev)   \
+
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
svm_dev_ops *ops)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
goto out;
}
 
-   list_for_each_entry(sdev, >devs, list) {
-   if (dev == sdev->dev) {
-   if (sdev->ops != ops) {
-   ret = -EBUSY;
-   goto out;
-   }
-   sdev->users++;
-   goto success;
+   for_each_svm_dev(svm, dev) {
+   if (sdev->ops != ops) {
+   ret = -EBUSY;
+   goto out;
}
+   sdev->users++;
+   goto success;
}
 
break;
@@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
goto out;
 
svm = ioasid_find(NULL, pasid, NULL);
-   if (IS_ERR(svm)) {
+   if (IS_ERR_OR_NULL(svm)) {
ret = PTR_ERR(svm);
goto out;
}
 
-   if (!svm)
-   goto out;
+   for_each_svm_dev(svm, dev) {
+   ret = 0;
+   sdev->users--;
+   if (!sdev->users) {
+   list_del_rcu(>list);
+   /* Flush the PASID cache and IOTLB for this device.
+* Note that we do depend on the hardware *not* using
+* the PASID any more. Just as we depend on other
+* devices never using PASIDs that they have no right
+* to use. We have a *shared* PASID table, because it's
+* large and has to be physically contiguous. So it's
+* hard to be as defensive as we might like. */
+   intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, 
!svm->mm);
+   kfree_rcu(sdev, rcu);
+
+   if (list_empty(>devs)) {
+   ioasid_free(svm->pasid);
+   if (svm->mm)
+   mmu_notifier_unregister(>notifier, 
svm->mm);
 
-   list_for_each_entry(sdev, >devs, list) {
-   if (dev == sdev->dev) {
-   ret = 0;
-   sdev->users--;
-   if (!sdev->users) {
-   list_del_rcu(>list);
-   /* Flush the PASID cache and IOTLB for this 
device.
-* Note that we do depend on the hardware *not* 
using
-* the PASID any more. Just as we depend on 
other
-* devices never using PASIDs that they have no 
right
-* to use. We have a *shared* PASID table, 
because it's
-* large and has to be physically contiguous. 
So it's
-* hard to be as defensive as we might like. */
-   intel_pasid_tear_down_entry(iommu, dev, 
svm->pasid);
-   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, 
!svm->mm);
-   kfree_rcu(sdev, rcu);
-
-   if (list_empty(>devs)) {
-   ioasid_free(svm->pasid);
-   if (svm->mm)
-   
mmu_notifier_unregister(>notifier, svm->mm);
-
-   list_del(>list);
-
-   /* We mandate that no page faults may 
be outstanding
-

[PATCH v5 03/19] trace/iommu: Add sva trace events

2019-08-15 Thread Jacob Pan

From: Jean-Philippe Brucker 

For development only, trace I/O page faults and responses.

Signed-off-by: Jacob Pan 
[JPB: removed the invalidate trace event, that will be added later]
Signed-off-by: Jean-Philippe Brucker 

Signed-off-by: Jacob Pan 
---
 include/trace/events/iommu.h | 84 
 1 file changed, 84 insertions(+)

diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index 72b4582..767b92c 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -12,6 +12,8 @@
 #define _TRACE_IOMMU_H
 
 #include 
+#include 
+#include 
 
 struct device;
 
@@ -161,6 +163,88 @@ DEFINE_EVENT(iommu_error, io_page_fault,
 
TP_ARGS(dev, iova, flags)
 );
+
+TRACE_EVENT(dev_fault,
+
+   TP_PROTO(struct device *dev,  struct iommu_fault *evt),
+
+   TP_ARGS(dev, evt),
+
+   TP_STRUCT__entry(
+   __string(device, dev_name(dev))
+   __field(int, type)
+   __field(int, reason)
+   __field(u64, addr)
+   __field(u64, fetch_addr)
+   __field(u32, pasid)
+   __field(u32, grpid)
+   __field(u32, flags)
+   __field(u32, prot)
+   ),
+
+   TP_fast_assign(
+   __assign_str(device, dev_name(dev));
+   __entry->type = evt->type;
+   if (evt->type == IOMMU_FAULT_DMA_UNRECOV) {
+   __entry->reason = evt->event.reason;
+   __entry->flags  = evt->event.flags;
+   __entry->pasid  = evt->event.pasid;
+   __entry->grpid  = 0;
+   __entry->prot   = evt->event.perm;
+   __entry->addr   = evt->event.addr;
+   __entry->fetch_addr = evt->event.fetch_addr;
+   } else {
+   __entry->reason = 0;
+   __entry->flags  = evt->prm.flags;
+   __entry->pasid  = evt->prm.pasid;
+   __entry->grpid  = evt->prm.grpid;
+   __entry->prot   = evt->prm.perm;
+   __entry->addr   = evt->prm.addr;
+   __entry->fetch_addr = 0;
+   }
+   ),
+
+   TP_printk("IOMMU:%s type=%d reason=%d addr=0x%016llx fetch=0x%016llx 
pasid=%d group=%d flags=%x prot=%d",
+   __get_str(device),
+   __entry->type,
+   __entry->reason,
+   __entry->addr,
+   __entry->fetch_addr,
+   __entry->pasid,
+   __entry->grpid,
+   __entry->flags,
+   __entry->prot
+   )
+);
+
+TRACE_EVENT(dev_page_response,
+
+   TP_PROTO(struct device *dev,  struct iommu_fault_page_response *msg),
+
+   TP_ARGS(dev, msg),
+
+   TP_STRUCT__entry(
+   __string(device, dev_name(dev))
+   __field(int, code)
+   __field(u32, pasid)
+   __field(u32, grpid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(device, dev_name(dev));
+   __entry->code = msg->code;
+   __entry->pasid = msg->pasid;
+   __entry->grpid = msg->grpid;
+   ),
+
+   TP_printk("IOMMU:%s code=%d pasid=%d group=%d",
+   __get_str(device),
+   __entry->code,
+   __entry->pasid,
+   __entry->grpid
+   )
+);
+
 #endif /* _TRACE_IOMMU_H */
 
 /* This part must be outside protection */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 02/19] iommu: handle page response timeout

2019-08-15 Thread Jacob Pan

When IO page faults are reported outside IOMMU subsystem, the page
request handler may fail for various reasons. E.g. a guest received
page requests but did not have a chance to run for a long time. The
irresponsive behavior could hold off limited resources on the pending
device.
There can be hardware or credit based software solutions as suggested
in the PCI ATS Ch-4. To provide a basic safty net this patch
introduces a per device deferrable timer which monitors the longest
pending page fault that requires a response. Proper action such as
sending failure response code could be taken when timer expires but not
included in this patch. We need to consider the life cycle of page
groupd ID to prevent confusion with reused group ID by a device.
For now, a warning message provides clue of such failure.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/iommu.c | 55 +++
 include/linux/iommu.h |  4 
 2 files changed, 59 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5b26499..8f2c7d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -909,6 +909,39 @@ int iommu_group_unregister_notifier(struct iommu_group 
*group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+static void iommu_dev_fault_timer_fn(struct timer_list *t)
+{
+   struct iommu_fault_param *fparam = from_timer(fparam, t, timer);
+   struct iommu_fault_event *evt;
+   struct iommu_fault_page_request *prm;
+
+   u64 now;
+
+   now = get_jiffies_64();
+
+   /* The goal is to ensure driver or guest page fault handler(via vfio)
+* send page response on time. Otherwise, limited queue resources
+* may be occupied by some irresponsive guests or drivers.
+* When per device pending fault list is not empty, we periodically 
checks
+* if any anticipated page response time has expired.
+*
+* TODO:
+* We could do the following if response time expires:
+* 1. send page response code FAILURE to all pending PRQ
+* 2. inform device driver or vfio
+* 3. drain in-flight page requests and responses for this device
+* 4. clear pending fault list such that driver can unregister fault
+*handler(otherwise blocked when pending faults are present).
+*/
+   list_for_each_entry(evt, >faults, list) {
+   prm = >fault.prm;
+   if (time_after64(now, evt->expire))
+   pr_err("Page response time expired!, pasid %d gid %d 
exp %llu now %llu\n",
+   prm->pasid, prm->grpid, evt->expire, now);
+   }
+   mod_timer(t, now + prq_timeout);
+}
+
 /**
  * iommu_register_device_fault_handler() - Register a device fault handler
  * @dev: the device
@@ -956,6 +989,9 @@ int iommu_register_device_fault_handler(struct device *dev,
mutex_init(>fault_param->lock);
INIT_LIST_HEAD(>fault_param->faults);
 
+   if (prq_timeout)
+   timer_setup(>fault_param->timer, 
iommu_dev_fault_timer_fn,
+   TIMER_DEFERRABLE);
 done_unlock:
mutex_unlock(>lock);
 
@@ -1017,7 +1053,9 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
struct iommu_param *param = dev->iommu_param;
struct iommu_fault_event *evt_pending = NULL;
struct iommu_fault_param *fparam;
+   struct timer_list *tmr;
int ret = 0;
+   u64 exp;
 
if (!param || !evt)
return -EINVAL;
@@ -1038,7 +1076,17 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
ret = -ENOMEM;
goto done_unlock;
}
+   /* Keep track of response expiration time */
+   exp = get_jiffies_64() + prq_timeout;
+   evt_pending->expire = exp;
mutex_lock(>lock);
+   if (list_empty(>faults)) {
+   /* First pending event, start timer */
+   tmr = >iommu_param->fault_param->timer;
+   WARN_ON(timer_pending(tmr));
+   mod_timer(tmr, exp);
+   }
+
list_add_tail(_pending->list, >faults);
mutex_unlock(>lock);
}
@@ -1103,6 +1151,13 @@ int iommu_page_response(struct device *dev,
break;
}
 
+   /* stop response timer if no more pending request */
+   if (list_empty(>fault_param->faults) &&
+   timer_pending(>fault_param->timer)) {
+   pr_debug("no pending PRQ, stop timer\n");
+   del_timer(>fault_param->timer);
+   }
+
 done_unlock:
mutex_unlock(>fault_param->lock);
return ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fdc355c..39d371b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -317,10

[PATCH v5 19/19] iommu/vt-d: Add svm/sva invalidate function

2019-08-15 Thread Jacob Pan

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c | 170 
 1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index dcac964..b7ca33a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5169,6 +5169,175 @@ static void intel_iommu_aux_detach_device(struct 
iommu_domain *domain,
aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int 
inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB, support PASID selective and page selective */
+   {0, 1, 1},
+   /* PASID based dev TLBs, only support all PASIDs or single PASID */
+   {1, 1, 0},
+   /* PASID cache */
+   {1, 1, 0}
+};
+
+const static u64 
inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB */
+   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+   /* PASID based dev TLBs */
+   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+   /* PASID cache */
+   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+   !inv_type_granu_map[type][granu])
+   return -EINVAL;
+
+   *vtd_granu = inv_type_granu_table[type][granu];
+
+   return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+   u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+* IOMMU cache invalidate API passes granu_size in bytes, and number of
+* granu size in contiguous memory.
+*/
+   return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct iommu_cache_invalidate_info 
*inv_info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct intel_iommu *iommu;
+   unsigned long flags;
+   int cache_type;
+   u8 bus, devfn;
+   u16 did, sid;
+   int ret = 0;
+   u64 size;
+
+   if (!inv_info || !dmar_domain ||
+   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   spin_lock(>lock);
+   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+   if (!info) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   sid = PCI_DEVID(bus, devfn);
+   size = to_vtd_size(inv_info->addr_info.granule_size, 
inv_info->addr_info.nb_granules);
+
+   for_each_set_bit(cache_type, (unsigned long *)_info->cache, 
IOMMU_CACHE_INV_TYPE_NR) {
+   u64 granu = 0;
+   u64 pasid = 0;

[PATCH v5 18/19] iommu/vt-d: Support flushing more translation cache types

2019-08-15 Thread Jacob Pan

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dmar.c| 46 +
 drivers/iommu/intel-pasid.c |  3 ++-
 include/linux/intel-iommu.h | 21 +
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 5d0754e..1da4c68 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1345,6 +1345,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
qi_submit_sync(, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+   unsigned int size_order, u64 granu, int ih)
+{
+   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+   desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+   QI_EIOTLB_AM(size_order);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   qi_submit_sync(, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask)
 {
@@ -1368,6 +1383,37 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
qi_submit_sync(, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u32 pasid,  u16 qdep, u64 addr, unsigned size_order, u64 granu)
+{
+   struct qi_desc desc;
+   unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
+
+   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+   QI_DEV_IOTLB_PFSID(pfsid);
+   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+   desc.qw1 |= addr & ~mask;
+   /* If S bit is 0, we only flush a single page. If S bit is set,
+* The least significant zero bit indicates the invalidation address
+* range. VT-d spec 6.5.2.6.
+* e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
+*/
+   if (size_order)
+   desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int 
pasid)
+{
+   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
QI_PC_TYPE;
+   qi_submit_sync(, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index fd2c82f..ff7e877 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -518,7 +518,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu 
*iommu,
 {
struct qi_desc desc;
 
-   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
+   QI_PC_PASID(pasid) | QI_PC_TYPE;
desc.qw1 = 0;
desc.qw2 = 0;
desc.qw3 = 0;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d673b39..682eafa1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -327,7 +327,7 @@ enum {
 #define QI_IOTLB_GRAN(gran)(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)(((u8)am))
+#define QI_IOTLB_AM(am)(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)   (((u64)fm) << 48)
 #define QI_CC_SID(sid) (((u64)sid) << 32)
@@ -345,17 +345,22 @@ enum {
 #define QI_PC_DID(did) (((u64)did) << 16)
 #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS   0
+#define QI_PC_PASID_SEL1
 
 #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_GL(gl)   (((u64)gl) << 7)
 #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)   (((u64)am))
+#define QI_EIOTLB_AM(am)   (((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did) (((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran)   (((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL  1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL0
+
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE

[PATCH v5 17/19] iommu/vt-d: Add bind guest PASID support

2019-08-15 Thread Jacob Pan

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 184 
 include/linux/intel-iommu.h |   8 +-
 include/linux/intel-svm.h   |  17 
 4 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d2cc355..dcac964 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5691,6 +5691,10 @@ const struct iommu_ops intel_iommu_ops = {
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   .sva_bind_gpasid= intel_svm_bind_gpasid,
+   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index ea6f2e2..c6edef2 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -222,6 +222,190 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry(sdev, >devs, list) \
if (dev == sdev->dev)   \
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev,
+   struct iommu_gpasid_bind_data *data)
+{
+   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct dmar_domain *ddomain;
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm;
+   int ret = 0;
+
+   if (WARN_ON(!iommu) || !data)
+   return -EINVAL;
+
+   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   if (dev_is_pci(dev)) {
+   /* VT-d supports devices with full 20 bit PASIDs only */
+   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+   return -EINVAL;
+   }
+
+   /*
+* We only check host PASID range, we have no knowledge to check
+* guest PASID range nor do we use the guest PASID.
+*/
+   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+   return -EINVAL;
+
+   ddomain = to_dmar_domain(domain);
+   /* REVISIT:
+* Sanity check adddress width and paging mode support
+* width matching in two dimensions:
+* 1. paging mode CPU <= IOMMU
+* 2. address width Guest <= Host.
+*/
+   mutex_lock(_mutex);
+   svm = ioasid_find(NULL, data->hpasid, NULL);
+   if (IS_ERR(svm)) {
+   ret = PTR_ERR(svm);
+   goto out;
+   }
+   if (svm) {
+   /*
+* If we found svm for the PASID, there must be at
+* least one device bond, otherwise svm should be freed.
+*/
+   BUG_ON(list_empty(>devs));
+
+   for_each_svm_dev(svm, dev) {
+   /* In case of multiple sub-devices of the same pdev 
assigned, we should
+* allow multiple bind calls with the same PASID and 
pdev.
+*/
+   sdev->users++;
+   goto out;
+   }
+   } else {
+   /* We come here when PASID has never been bond to a device. */
+   svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+   if (!svm) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   /* REVISIT: upper

[PATCH v5 01/19] iommu: Add a timeout parameter for PRQ response

2019-08-15 Thread Jacob Pan

When an I/O page request is processed outside the IOMMU subsystem,
response can be delayed or lost. Add a tunable setup parameter such that
user can choose the timeout for IOMMU to track pending page requests.

This timeout mechanism is a basic safety net which can be implemented in
conjunction with credit based or device level page response exception
handling.

Signed-off-by: Jacob Pan 
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++
 drivers/iommu/iommu.c   | 33 +
 2 files changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a..7da5a83 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1817,6 +1817,14 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+   iommu.prq_timeout=
+   Timeout in seconds to wait for page response
+   of a pending page request.
+   Format: 
+   Default: 10
+   0 - no timeout tracking
+   1 to 100 - allowed range
+
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d8..5b26499 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -33,6 +33,19 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
 static bool iommu_dma_strict __read_mostly = true;
 
+/*
+ * Timeout to wait for page response of a pending page request. This is
+ * intended as a basic safty net in case a pending page request is not
+ * responded for an exceptionally long time. Device may also implement
+ * its own protection mechanism against this exception.
+ * Units are in jiffies with a range between 1 - 100 seconds equivalent.
+ * Default to 10 seconds.
+ * Setting 0 means no timeout tracking.
+ */
+#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100)
+#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10)
+static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT;
+
 struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
@@ -176,6 +189,26 @@ static int __init iommu_dma_setup(char *str)
 }
 early_param("iommu.strict", iommu_dma_setup);
 
+static int __init iommu_set_prq_timeout(char *str)
+{
+   int ret;
+   unsigned long timeout;
+
+   if (!str)
+   return -EINVAL;
+
+   ret = kstrtoul(str, 10, );
+   if (ret)
+   return ret;
+   timeout = timeout * HZ;
+   if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT)
+   return -EINVAL;
+   prq_timeout = timeout;
+
+   return 0;
+}
+early_param("iommu.prq_timeout", iommu_set_prq_timeout);
+
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 10/19] iommu/vt-d: Enlightened PASID allocation

2019-08-15 Thread Jacob Pan

From: Lu Baolu 

If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
IOMMU driver should rely on the emulation software to allocate
and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
register set to support this. This includes a capability register,
a virtual command register and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command register.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-pasid.c | 83 +
 drivers/iommu/intel-pasid.h | 13 ++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 040a445..76bcbb2 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -63,6 +63,89 @@ void *intel_pasid_lookup_id(int pasid)
return p;
 }
 
+static int check_vcmd_pasid(struct intel_iommu *iommu)
+{
+   u64 cap;
+
+   if (!ecap_vcs(iommu->ecap)) {
+   pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+   iommu->name);
+   return -ENODEV;
+   }
+
+   cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
+   if (!(cap & DMA_VCS_PAS)) {
+   pr_warn("IOMMU: %s: Emulation software doesn't support PASID 
allocation\n",
+   iommu->name);
+   return -ENODEV;
+   }
+
+   return 0;
+}
+
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+   u64 res;
+   u8 status_code;
+   unsigned long flags;
+   int ret = 0;
+
+   ret = check_vcmd_pasid(iommu);
+   if (ret)
+   return ret;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   *pasid = VCMD_VRSP_RESULT(res);
+   break;
+   case VCMD_VRSP_SC_NO_PASID_AVAIL:
+   pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+   ret = -ENOMEM;
+   break;
+   default:
+   ret = -ENODEV;
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+
+   return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+   u64 res;
+   u8 status_code;
+   unsigned long flags;
+
+   if (check_vcmd_pasid(iommu))
+   return;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   break;
+   case VCMD_VRSP_SC_INVALID_PASID:
+   pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+   break;
+   default:
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index fc8cd8f..e413e88 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -23,6 +23,16 @@
 #define is_pasid_enabled(entry)(((entry)->lo >> 3) & 0x1)
 #define get_pasid_dir_size(entry)  (1 << entry)->lo >> 9) & 0x7) + 7))
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC 0x1
+#define VCMD_CMD_FREE  0x2
+#define VCMD_VRSP_IP   0x1
+#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS   0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL1
+#define VCMD_VRSP_SC_INVALID_PASID 1
+#define VCMD_VRSP_RESULT(e)(((e) >> 8) & 0xf)
+
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
   struct device *dev, int pasid);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);

[PATCH v5 06/19] iommu: Introduce cache_invalidate API

2019-08-15 Thread Jacob Pan

From: Yi L Liu 

In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.

Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  |  10 +
 include/linux/iommu.h  |  14 ++
 include/uapi/linux/iommu.h | 110 +
 3 files changed, 134 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 155ebef..6228d5d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1719,6 +1719,16 @@ void iommu_detach_pasid_table(struct iommu_domain 
*domain)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
 
+int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   if (unlikely(!domain->ops->cache_invalidate))
+   return -ENODEV;
+
+   return domain->ops->cache_invalidate(domain, dev, inv_info);
+}
+EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8c64065..28f1a8c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -230,6 +230,7 @@ struct iommu_sva_ops {
  * @page_response: handle page request response
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -296,6 +297,8 @@ struct iommu_ops {
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
 struct iommu_page_response *msg);
+   int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
+   struct iommu_cache_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -407,6 +410,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 extern int iommu_attach_pasid_table(struct iommu_domain *domain,
struct iommu_pasid_table_config *cfg);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -959,6 +965,14 @@ int iommu_attach_pasid_table(struct iommu_domain *domain,
 static inline
 void iommu_detach_pasid_table(struct iommu_domain *domain) {}
 
+static inline int
+iommu_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+   return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 0f9d249..919ea02 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -203,4 +203,114 @@ struct iommu_pasid_table_config {
};
 };
 
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+   IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
+   IOMMU_INV_GRANU_PASID,  /* PASID-selective invalidation */
+   IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
+   IOMMU_INV_GRANU_NR, /* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the 
invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether

[PATCH v5 15/19] iommu/vt-d: Add nested translation helper function

2019-08-15 Thread Jacob Pan

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-pasid.c | 207 
 drivers/iommu/intel-pasid.h |  12 +++
 2 files changed, 219 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 9c5affc..fd2c82f 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -442,6 +442,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Memory Type(EMT) field (Bits 91-93)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emt(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(29, 27), value << 27);
+}
+
+/*
+ * Setup the Page Attribute Table (PAT) field (Bits 96-127)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pat(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(63, 32), value << 27);
+}
+
+/*
+ * Setup the Cache Disable (CD) field (Bit 89)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_cd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 25, 1);
+}
+
+/*
+ * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emte(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 26, 1);
+}
+
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_eafe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 7, 1);
+}
+
+/*
+ * Setup the Page-level Cache Disable (PCD) field (Bit 95)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pcd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 31, 1);
+}
+
+/*
+ * Setup the Page-level Write-Through (PWT)) field (Bit 94)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pwt(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 30, 1);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
u16 did, int pasid)
@@ -674,3 +744,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 
return 0;
 }
+
+static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   struct iommu_gpasid_bind_data_vtd *pasid_data)
+{
+   /*
+* Not all guest PASID table entry fields are passed down during bind,
+* here we only set up the ones that are dependent on guest settings.
+* Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
+* therefore not set. Other fields, such as snoop related, are set based
+* on host needs regardless of  guest settings.
+*/
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
+   if (!ecap_srs(iommu->ecap)) {
+   pr_err("No supervisor request support on %s\n",
+  iommu->name);
+   return -EINVAL;
+   }
+   pasid_set_sre(pte);
+   }
+
+   if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && 
ecap_eafs(iommu->ecap))
+   pasid_set_eafe(pte);
+
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
+   pasid_set_emte(pte);
+   pasid_set_emt(pte, pasid_data->emt);
+   }
+
+   /*
+* Memory type is only applicable to devices inside processor coherent
+* domain. PCIe devices are not included. We can skip the rest of the
+* flags if IOMMU does not support MTS.
+*/
+   if (!ecap_mts(iommu->ecap)) {
+   pr_info("%s does not support memory type bind guest PASID\n",
+   iommu->name);
+   return 0;
+   }
+
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
+   pasid_set_pcd(pte);
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
+   pasid_set_pwt(pte);
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
+   pasid_set_cd(pte);
+   pasid_set_pat(pte, pasid_data->pat);
+
+   return 0;
+
+}
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA translation in the guest, second level page tables are used
+ * for GPA to HPA translation.
+ *
+ * @iommu:

[PATCH v5 13/19] iommu/vt-d: Move domain helper to header

2019-08-15 Thread Jacob Pan

Move domain helper to header to be used by SVA code.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 6 --
 include/linux/intel-iommu.h | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 96defc3..d2cc355 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -418,12 +418,6 @@ static void init_translation_status(struct intel_iommu 
*iommu)
iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-   return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 80318c5..e1865f1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -591,6 +591,12 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
 }
 
+/* Convert generic struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4

[PATCH v5 00/19] Shared virtual address IOMMU and VT-d support

2019-08-15 Thread Jacob Pan

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. shared guest
application address space and physical device DMA address. Only IOMMU portion
of the changes are included in this series. Additional support is needed in
VFIO and QEMU (will be submitted separately) to complete this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables


This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,

[1] Common APIs git://linux-arm.org/linux-jpb.git sva/api

[2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

[3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by Lu Baolu
https://lkml.org/lkml/2018/11/12/1921

[4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
Address (SVA)
https://lwn.net/Articles/754331/

There are roughly three parts:
1. Generic PASID allocator [1] with extension to support custom allocator
2. IOMMU cache invalidation passdown from guest to host
3. Guest PASID bind for nested translation

All generic IOMMU APIs are reused from [1] with minor tweaks. With this
patchset, guest SVA without page request works on VT-d. PRS patches
will come next as we try to avoid large patchset that is hard to review.
The patches for basic SVA support (w/o PRS) starts:
[PATCH v5 05/19] iommu: Introduce attach/detach_pasid_table API

It is worth noting that unlike sMMU nested stage setup, where PASID table
is owned by the guest, VT-d PASID table is owned by the host, individual
PASIDs are bound instead of the PASID table.

This series is based on the new VT-d 3.0 Specification
(https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf).
This is different than the older series in [4] which was based on the older
specification that does not have scalable mode.


ChangeLog:
- V5
  Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged.
  Addressed v4 review comments from Eric Auger, Baolu Lu, and
Jonathan Cameron. Specific changes are as follows:
  - Refined custom IOASID allocator to support multiple vIOMMU, hotplug
cases.
  - Extracted vendor data from IOMMU guest PASID bind data, for VT-d
will support all necessary guest PASID entry fields for PASID
bind.
  - Support non-identity host-guest PASID mapping
  - Exception handling in various cases

- V4
  - Redesigned IOASID allocator such that it can support custom
  allocators with shared helper functions. Use separate XArray
  to store IOASIDs per allocator. Took advice from Eric Auger to
  have default allocator use the generic allocator structure.
  Combined into one patch in that the default allocator is just
  "another" allocator now. Can be built as a module in case of
  driver use without IOMMU.
  - Extended bind guest PASID data to support SMMU and non-identity
  guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
  - Rebased on Jean's sva/api common tree, new patches starts with
   [PATCH v4 10/22]

- V3
  - Addressed thorough review comments from Eric Auger (Thank you!)
  - Moved IOASID allocator from driver core to IOMMU code per
suggestion by Christoph Hellwig
(https://lkml.org/lkml/2019/4/26/462)
  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
(git://linux-arm.org/linux-jpb.git

[PATCH v5 05/19] iommu: Introduce attach/detach_pasid_table API

2019-08-15 Thread Jacob Pan

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
Signed-off-by: Jacob Pan 
Signed-off-by: Eric Auger 
Reviewed-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  | 19 +
 include/linux/iommu.h  | 18 
 include/uapi/linux/iommu.h | 51 ++
 3 files changed, 88 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index feada31..155ebef 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1700,6 +1700,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   if (unlikely(!domain->ops->attach_pasid_table))
+   return -ENODEV;
+
+   return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+   if (unlikely(!domain->ops->detach_pasid_table))
+   return;
+
+   domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 39d371b..8c64065 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -228,6 +228,8 @@ struct iommu_sva_ops {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -287,6 +289,9 @@ struct iommu_ops {
  void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
int (*sva_get_pasid)(struct iommu_sva *handle);
+   int (*attach_pasid_table)(struct iommu_domain *domain,
+ struct iommu_pasid_table_config *cfg);
+   void (*detach_pasid_table)(struct iommu_domain *domain);
 
int (*page_response)(struct device *dev,
 struct iommu_fault_event *evt,
@@ -399,6 +404,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+   struct iommu_pasid_table_config *cfg);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -941,6 +949,16 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+struct iommu_pasid_table_config *cfg)
+{
+   return -ENODEV;
+}
+
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index fc00c5d..0f9d249 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -152,4 +152,55 @@ struct iommu_page_response {
__u32   code;
 };
 
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ * information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ * or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ * and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is

[PATCH v5 04/19] iommu: Use device fault trace event

2019-08-15 Thread Jacob Pan

From: Jean-Philippe Brucker 

For performance and debugging purposes, these trace events help
analyzing device faults that interact with IOMMU subsystem.
E.g.
IOMMU::00:0a.0 type=2 reason=0 addr=0x007ff000 pasid=1
group=1 last=0 prot=1

Signed-off-by: Jacob Pan 
[JPB: removed invalidate event, that will be added later]
Signed-off-by: Jean-Philippe Brucker 

Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c| 2 ++
 include/trace/events/iommu.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8f2c7d5..feada31 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1098,6 +1098,7 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
mutex_unlock(>lock);
kfree(evt_pending);
}
+   trace_dev_fault(dev, >fault);
 done_unlock:
mutex_unlock(>lock);
return ret;
@@ -1146,6 +1147,7 @@ int iommu_page_response(struct device *dev,
msg->flags = pasid_valid ? IOMMU_PAGE_RESP_PASID_VALID : 0;
 
ret = domain->ops->page_response(dev, evt, msg);
+   trace_dev_page_response(dev, msg);
list_del(>list);
kfree(evt);
break;
diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index 767b92c..7a7801b 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -219,7 +219,7 @@ TRACE_EVENT(dev_fault,
 
 TRACE_EVENT(dev_page_response,
 
-   TP_PROTO(struct device *dev,  struct iommu_fault_page_response *msg),
+   TP_PROTO(struct device *dev,  struct iommu_page_response *msg),
 
TP_ARGS(dev, msg),
 
-- 
2.7.4

Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn

2019-08-15 Thread Jason Gunthorpe

On Thu, Aug 15, 2019 at 10:28:21AM +0200, Christian König wrote:
> Am 07.08.19 um 01:15 schrieb Jason Gunthorpe:
> > From: Jason Gunthorpe 
> > 
> > radeon is using a device global hash table to track what mmu_notifiers
> > have been registered on struct mm. This is better served with the new
> > get/put scheme instead.
> > 
> > radeon has a bug where it was not blocking notifier release() until all
> > the BO's had been invalidated. This could result in a use after free of
> > pages the BOs. This is tied into a second bug where radeon left the
> > notifiers running endlessly even once the interval tree became
> > empty. This could result in a use after free with module unload.
> > 
> > Both are fixed by changing the lifetime model, the BOs exist in the
> > interval tree with their natural lifetimes independent of the mm_struct
> > lifetime using the get/put scheme. The release runs synchronously and just
> > does invalidate_start across the entire interval tree to create the
> > required DMA fence.
> > 
> > Additions to the interval tree after release are already impossible as
> > only current->mm is used during the add.
> > 
> > Signed-off-by: Jason Gunthorpe 
> 
> Acked-by: Christian König 

Thanks!

> But I'm wondering if we shouldn't completely drop radeon userptr support.
> It's just to buggy,

I would not object :)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 11/17] iommu/arm-smmu: Abstract GR0 accesses

2019-08-15 Thread Robin Murphy

Clean up the remaining accesses to GR0 registers, so that everything is
now neatly abstracted. This folds up the Non-Secure alias quirk as the
first step towards moving it out of the way entirely. Although GR0 does
technically contain some 64-bit registers (sGFAR and the weird SMMUv2
HYPC and MONC stuff), they're not ones we have any need to access.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 106 +--
 1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e72554f334ee..e9fd9117109e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -69,19 +69,6 @@
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
-/* SMMU global address space */
-#define ARM_SMMU_GR0(smmu) ((smmu)->base)
-
-/*
- * SMMU global address space with conditional offset to access secure
- * aliases of non-secure registers (e.g. nsCR0: 0x400, nsGFSR: 0x448,
- * nsGFSYNR0: 0x450)
- */
-#define ARM_SMMU_GR0_NS(smmu)  \
-   ((smmu)->base + \
-   ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)   \
-   ? 0x400 : 0))
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -246,6 +233,21 @@ struct arm_smmu_domain {
struct iommu_domain domain;
 };
 
+static int arm_smmu_gr0_ns(int offset)
+{
+   switch(offset) {
+   case ARM_SMMU_GR0_sCR0:
+   case ARM_SMMU_GR0_sACR:
+   case ARM_SMMU_GR0_sGFSR:
+   case ARM_SMMU_GR0_sGFSYNR0:
+   case ARM_SMMU_GR0_sGFSYNR1:
+   case ARM_SMMU_GR0_sGFSYNR2:
+   return offset + 0x400;
+   default:
+   return offset;
+   }
+}
+
 static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
 {
return smmu->base + (n << smmu->pgshift);
@@ -253,12 +255,18 @@ static void __iomem *arm_smmu_page(struct arm_smmu_device 
*smmu, int n)
 
 static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset)
 {
+   if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0)
+   offset = arm_smmu_gr0_ns(offset);
+
return readl_relaxed(arm_smmu_page(smmu, page) + offset);
 }
 
 static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset,
u32 val)
 {
+   if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0)
+   offset = arm_smmu_gr0_ns(offset);
+
writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
 }
 
@@ -273,9 +281,15 @@ static void arm_smmu_writeq(struct arm_smmu_device *smmu, 
int page, int offset,
writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
 }
 
+#define ARM_SMMU_GR0   0
 #define ARM_SMMU_GR1   1
 #define ARM_SMMU_CB(s, n)  ((s)->numpage + (n))
 
+#define arm_smmu_gr0_read(s, o)\
+   arm_smmu_readl((s), ARM_SMMU_GR0, (o))
+#define arm_smmu_gr0_write(s, o, v)\
+   arm_smmu_writel((s), ARM_SMMU_GR0, (o), (v))
+
 #define arm_smmu_gr1_read(s, o)\
arm_smmu_readl((s), ARM_SMMU_GR1, (o))
 #define arm_smmu_gr1_write(s, o, v)\
@@ -470,7 +484,7 @@ static void arm_smmu_tlb_sync_global(struct arm_smmu_device 
*smmu)
unsigned long flags;
 
spin_lock_irqsave(>global_sync_lock, flags);
-   __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC,
+   __arm_smmu_tlb_sync(smmu, ARM_SMMU_GR0, ARM_SMMU_GR0_sTLBGSYNC,
ARM_SMMU_GR0_sTLBGSTATUS);
spin_unlock_irqrestore(>global_sync_lock, flags);
 }
@@ -511,10 +525,10 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   void __iomem *base = ARM_SMMU_GR0(smmu);
 
-   /* NOTE: see above */
-   writel(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+   /* See above */
+   wmb();
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
arm_smmu_tlb_sync_global(smmu);
 }
 
@@ -579,12 +593,12 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long 
iova, size_t size,
 size_t granule, bool leaf, void 
*cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
-   void __iomem *base = ARM_SMMU_GR0(smmu_domain->smmu);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-   if (smmu_domain->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+   if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
wmb();
 
-   writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid);
 }
 
 static const struct iommu_gather_ops

[PATCH v2 17/17] iommu/arm-smmu: Add context init implementation hook

2019-08-15 Thread Robin Murphy

Allocating and initialising a context for a domain is another point
where certain implementations are known to want special behaviour.
Currently the other half of the Cavium workaround comes into play here,
so let's finish the job to get the whole thing right out of the way.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-impl.c | 42 ++---
 drivers/iommu/arm-smmu.c  | 51 +++
 drivers/iommu/arm-smmu.h  | 42 +++--
 3 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 4dc8b1c4befb..e22e9004f449 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -48,25 +48,60 @@ const struct arm_smmu_impl calxeda_impl = {
 };
 
 
+struct cavium_smmu {
+   struct arm_smmu_device smmu;
+   u32 id_base;
+};
+
 static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 {
static atomic_t context_count = ATOMIC_INIT(0);
+   struct cavium_smmu *cs = container_of(smmu, struct cavium_smmu, smmu);
/*
 * Cavium CN88xx erratum #27704.
 * Ensure ASID and VMID allocation is unique across all SMMUs in
 * the system.
 */
-   smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks,
-  _count);
+   cs->id_base = atomic_fetch_add(smmu->num_context_banks, _count);
dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
27704\n");
 
return 0;
 }
 
+int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+{
+   struct cavium_smmu *cs = container_of(smmu_domain->smmu,
+ struct cavium_smmu, smmu);
+
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
+   smmu_domain->cfg.vmid += cs->id_base;
+   else
+   smmu_domain->cfg.asid += cs->id_base;
+
+   return 0;
+}
+
 const struct arm_smmu_impl cavium_impl = {
.cfg_probe = cavium_cfg_probe,
+   .init_context = cavium_init_context,
 };
 
+struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   struct cavium_smmu *cs;
+
+   cs = devm_kzalloc(smmu->dev, sizeof(*cs), GFP_KERNEL);
+   if (!cs)
+   return ERR_PTR(-ENOMEM);
+
+   cs->smmu = *smmu;
+   cs->smmu.impl = _impl;
+
+   devm_kfree(smmu->dev, smmu);
+
+   return >smmu;
+}
+
 
 #define ARM_MMU500_ACTLR_CPRE  (1 << 1)
 
@@ -126,8 +161,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
-   smmu->impl = _impl;
-   break;
+   return cavium_smmu_impl_init(smmu);
default:
break;
}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fc98992d120d..b8628e2ab579 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -111,44 +110,6 @@ struct arm_smmu_master_cfg {
 #define for_each_cfg_sme(fw, i, idx) \
for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
 
-enum arm_smmu_context_fmt {
-   ARM_SMMU_CTX_FMT_NONE,
-   ARM_SMMU_CTX_FMT_AARCH64,
-   ARM_SMMU_CTX_FMT_AARCH32_L,
-   ARM_SMMU_CTX_FMT_AARCH32_S,
-};
-
-struct arm_smmu_cfg {
-   u8  cbndx;
-   u8  irptndx;
-   union {
-   u16 asid;
-   u16 vmid;
-   };
-   enum arm_smmu_cbar_type cbar;
-   enum arm_smmu_context_fmt   fmt;
-};
-#define INVALID_IRPTNDX0xff
-
-enum arm_smmu_domain_stage {
-   ARM_SMMU_DOMAIN_S1 = 0,
-   ARM_SMMU_DOMAIN_S2,
-   ARM_SMMU_DOMAIN_NESTED,
-   ARM_SMMU_DOMAIN_BYPASS,
-};
-
-struct arm_smmu_domain {
-   struct arm_smmu_device  *smmu;
-   struct io_pgtable_ops   *pgtbl_ops;
-   const struct iommu_gather_ops   *tlb_ops;
-   struct arm_smmu_cfg cfg;
-   enum arm_smmu_domain_stage  stage;
-   boolnon_strict;
-   struct mutexinit_mutex; /* Protects smmu pointer */
-   spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
-   struct iommu_domain domain;
-};
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -749,9 +710,16 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
}
 
if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
-   cfg->vmid = cfg->cbndx + 1 + smmu->cavium_id_base;
+   cfg->vmid = cfg->cbndx + 1;
else
-

[PATCH v2 14/17] iommu/arm-smmu: Move Secure access quirk to implementation

2019-08-15 Thread Robin Murphy

Move detection of the Secure access quirk to its new home, trimming it
down in the process - time has proven that boolean DT flags are neither
ideal nor necessarily sufficient, so it's highly unlikely we'll ever add
more, let alone enough to justify the frankly overengineered parsing
machinery.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-impl.c | 44 
 drivers/iommu/arm-smmu.c  | 97 ---
 drivers/iommu/arm-smmu.h  | 72 +-
 3 files changed, 114 insertions(+), 99 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index efeb6d78da17..0657c85580cb 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -4,10 +4,54 @@
 
 #define pr_fmt(fmt) "arm-smmu: " fmt
 
+#include 
+
 #include "arm-smmu.h"
 
 
+static int arm_smmu_gr0_ns(int offset)
+{
+   switch(offset) {
+   case ARM_SMMU_GR0_sCR0:
+   case ARM_SMMU_GR0_sACR:
+   case ARM_SMMU_GR0_sGFSR:
+   case ARM_SMMU_GR0_sGFSYNR0:
+   case ARM_SMMU_GR0_sGFSYNR1:
+   case ARM_SMMU_GR0_sGFSYNR2:
+   return offset + 0x400;
+   default:
+   return offset;
+   }
+}
+
+static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page,
+   int offset)
+{
+   if (page == ARM_SMMU_GR0)
+   offset = arm_smmu_gr0_ns(offset);
+   return readl_relaxed(arm_smmu_page(smmu, page) + offset);
+}
+
+static void arm_smmu_write_ns(struct arm_smmu_device *smmu, int page,
+ int offset, u32 val)
+{
+   if (page == ARM_SMMU_GR0)
+   offset = arm_smmu_gr0_ns(offset);
+   writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
+}
+
+/* Since we don't care for sGFAR, we can do without 64-bit accessors */
+const struct arm_smmu_impl calxeda_impl = {
+   .read_reg = arm_smmu_read_ns,
+   .write_reg = arm_smmu_write_ns,
+};
+
+
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
+   if (of_property_read_bool(smmu->dev->of_node,
+ "calxeda,smmu-secure-config-access"))
+   smmu->impl = _impl;
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1e8153182830..432d781f05f3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -155,91 +155,10 @@ struct arm_smmu_domain {
struct iommu_domain domain;
 };
 
-static int arm_smmu_gr0_ns(int offset)
-{
-   switch(offset) {
-   case ARM_SMMU_GR0_sCR0:
-   case ARM_SMMU_GR0_sACR:
-   case ARM_SMMU_GR0_sGFSR:
-   case ARM_SMMU_GR0_sGFSYNR0:
-   case ARM_SMMU_GR0_sGFSYNR1:
-   case ARM_SMMU_GR0_sGFSYNR2:
-   return offset + 0x400;
-   default:
-   return offset;
-   }
-}
-
-static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
-{
-   return smmu->base + (n << smmu->pgshift);
-}
-
-static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset)
-{
-   if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0)
-   offset = arm_smmu_gr0_ns(offset);
-
-   return readl_relaxed(arm_smmu_page(smmu, page) + offset);
-}
-
-static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset,
-   u32 val)
-{
-   if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0)
-   offset = arm_smmu_gr0_ns(offset);
-
-   writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
-}
-
-static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset)
-{
-   return readq_relaxed(arm_smmu_page(smmu, page) + offset);
-}
-
-static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset,
-   u64 val)
-{
-   writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
-}
-
-#define ARM_SMMU_GR0   0
-#define ARM_SMMU_GR1   1
-#define ARM_SMMU_CB(s, n)  ((s)->numpage + (n))
-
-#define arm_smmu_gr0_read(s, o)\
-   arm_smmu_readl((s), ARM_SMMU_GR0, (o))
-#define arm_smmu_gr0_write(s, o, v)\
-   arm_smmu_writel((s), ARM_SMMU_GR0, (o), (v))
-
-#define arm_smmu_gr1_read(s, o)\
-   arm_smmu_readl((s), ARM_SMMU_GR1, (o))
-#define arm_smmu_gr1_write(s, o, v)\
-   arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v))
-
-#define arm_smmu_cb_read(s, n, o)  \
-   arm_smmu_readl((s), ARM_SMMU_CB((s), (n)), (o))
-#define arm_smmu_cb_write(s, n, o, v)  \
-   arm_smmu_writel((s), ARM_SMMU_CB((s), (n)), (o), (v))
-#define arm_smmu_cb_readq(s, n, o) \
-   arm_smmu_readq((s), ARM_SMMU_CB((s), (n)), (o))
-#define arm_smmu_cb_writeq(s, n, o, v) \
-   arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v))
-
-struct arm_smmu_option_prop {
-   u32 opt;
-   const char *prop;
-};
-
 static atomic_t

[PATCH v2 15/17] iommu/arm-smmu: Add configuration implementation hook

2019-08-15 Thread Robin Murphy

Probing the ID registers and setting up the SMMU configuration is an
area where overrides and workarounds may well be needed. Indeed, the
Cavium workaround detection lives there at the moment, so let's break
that out.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-impl.c | 34 ++
 drivers/iommu/arm-smmu.c  | 17 +++--
 drivers/iommu/arm-smmu.h  |  1 +
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 0657c85580cb..696417908793 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -47,8 +47,42 @@ const struct arm_smmu_impl calxeda_impl = {
 };
 
 
+static int cavium_cfg_probe(struct arm_smmu_device *smmu)
+{
+   static atomic_t context_count = ATOMIC_INIT(0);
+   /*
+* Cavium CN88xx erratum #27704.
+* Ensure ASID and VMID allocation is unique across all SMMUs in
+* the system.
+*/
+   smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks,
+  _count);
+   dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
27704\n");
+
+   return 0;
+}
+
+const struct arm_smmu_impl cavium_impl = {
+   .cfg_probe = cavium_cfg_probe,
+};
+
+
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
+   /*
+* We will inevitably have to combine model-specific implementation
+* quirks with platform-specific integration quirks, but everything
+* we currently support happens to work out as straightforward
+* mutually-exclusive assignments.
+*/
+   switch (smmu->model) {
+   case CAVIUM_SMMUV2:
+   smmu->impl = _impl;
+   break;
+   default:
+   break;
+   }
+
if (of_property_read_bool(smmu->dev->of_node,
  "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 432d781f05f3..362b6b5a28ee 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -155,8 +155,6 @@ struct arm_smmu_domain {
struct iommu_domain domain;
 };
 
-static atomic_t cavium_smmu_context_count = ATOMIC_INIT(0);
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -1804,18 +1802,6 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
}
dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
   smmu->num_context_banks, smmu->num_s2_context_banks);
-   /*
-* Cavium CN88xx erratum #27704.
-* Ensure ASID and VMID allocation is unique across all SMMUs in
-* the system.
-*/
-   if (smmu->model == CAVIUM_SMMUV2) {
-   smmu->cavium_id_base =
-   atomic_add_return(smmu->num_context_banks,
- _smmu_context_count);
-   smmu->cavium_id_base -= smmu->num_context_banks;
-   dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
27704\n");
-   }
smmu->cbs = devm_kcalloc(smmu->dev, smmu->num_context_banks,
 sizeof(*smmu->cbs), GFP_KERNEL);
if (!smmu->cbs)
@@ -1884,6 +1870,9 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
dev_notice(smmu->dev, "\tStage-2: %lu-bit IPA -> %lu-bit PA\n",
   smmu->ipa_size, smmu->pa_size);
 
+   if (smmu->impl && smmu->impl->cfg_probe)
+   return smmu->impl->cfg_probe(smmu);
+
return 0;
 }
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d4fd29d70705..f4e90f33fce2 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -287,6 +287,7 @@ struct arm_smmu_impl {
u64 (*read_reg64)(struct arm_smmu_device *smmu, int page, int offset);
void (*write_reg64)(struct arm_smmu_device *smmu, int page, int offset,
u64 val);
+   int (*cfg_probe)(struct arm_smmu_device *smmu);
 };
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 16/17] iommu/arm-smmu: Add reset implementation hook

2019-08-15 Thread Robin Murphy

Reset is an activity rife with implementation-defined poking. Add a
corresponding hook, and use it to encapsulate the existing MMU-500
details.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-impl.c | 49 +++
 drivers/iommu/arm-smmu.c  | 39 +++-
 drivers/iommu/arm-smmu.h  |  1 +
 3 files changed, 54 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 696417908793..4dc8b1c4befb 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -4,6 +4,7 @@
 
 #define pr_fmt(fmt) "arm-smmu: " fmt
 
+#include 
 #include 
 
 #include "arm-smmu.h"
@@ -67,6 +68,51 @@ const struct arm_smmu_impl cavium_impl = {
 };
 
 
+#define ARM_MMU500_ACTLR_CPRE  (1 << 1)
+
+#define ARM_MMU500_ACR_CACHE_LOCK  (1 << 26)
+#define ARM_MMU500_ACR_S2CRB_TLBEN (1 << 10)
+#define ARM_MMU500_ACR_SMTNMB_TLBEN(1 << 8)
+
+static int arm_mmu500_reset(struct arm_smmu_device *smmu)
+{
+   u32 reg, major;
+   int i;
+   /*
+* On MMU-500 r2p0 onwards we need to clear ACR.CACHE_LOCK before
+* writes to the context bank ACTLRs will stick. And we just hope that
+* Secure has also cleared SACR.CACHE_LOCK for this to take effect...
+*/
+   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7);
+   major = FIELD_GET(ID7_MAJOR, reg);
+   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR);
+   if (major >= 2)
+   reg &= ~ARM_MMU500_ACR_CACHE_LOCK;
+   /*
+* Allow unmatched Stream IDs to allocate bypass
+* TLB entries for reduced latency.
+*/
+   reg |= ARM_MMU500_ACR_SMTNMB_TLBEN | ARM_MMU500_ACR_S2CRB_TLBEN;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sACR, reg);
+
+   /*
+* Disable MMU-500's not-particularly-beneficial next-page
+* prefetcher for the sake of errata #841119 and #826419.
+*/
+   for (i = 0; i < smmu->num_context_banks; ++i) {
+   reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR);
+   reg &= ~ARM_MMU500_ACTLR_CPRE;
+   arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg);
+   }
+
+   return 0;
+}
+
+const struct arm_smmu_impl arm_mmu500_impl = {
+   .reset = arm_mmu500_reset,
+};
+
+
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 {
/*
@@ -76,6 +122,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 * mutually-exclusive assignments.
 */
switch (smmu->model) {
+   case ARM_MMU500:
+   smmu->impl = _mmu500_impl;
+   break;
case CAVIUM_SMMUV2:
smmu->impl = _impl;
break;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 362b6b5a28ee..fc98992d120d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -54,12 +54,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define ARM_MMU500_ACTLR_CPRE  (1 << 1)
-
-#define ARM_MMU500_ACR_CACHE_LOCK  (1 << 26)
-#define ARM_MMU500_ACR_S2CRB_TLBEN (1 << 10)
-#define ARM_MMU500_ACR_SMTNMB_TLBEN(1 << 8)
-
 #define TLB_LOOP_TIMEOUT   100 /* 1s! */
 #define TLB_SPIN_COUNT 10
 
@@ -1574,7 +1568,7 @@ static struct iommu_ops arm_smmu_ops = {
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
int i;
-   u32 reg, major;
+   u32 reg;
 
/* clear global FSR */
reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
@@ -1587,38 +1581,10 @@ static void arm_smmu_device_reset(struct 
arm_smmu_device *smmu)
for (i = 0; i < smmu->num_mapping_groups; ++i)
arm_smmu_write_sme(smmu, i);
 
-   if (smmu->model == ARM_MMU500) {
-   /*
-* Before clearing ARM_MMU500_ACTLR_CPRE, need to
-* clear CACHE_LOCK bit of ACR first. And, CACHE_LOCK
-* bit is only present in MMU-500r2 onwards.
-*/
-   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7);
-   major = FIELD_GET(ID7_MAJOR, reg);
-   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR);
-   if (major >= 2)
-   reg &= ~ARM_MMU500_ACR_CACHE_LOCK;
-   /*
-* Allow unmatched Stream IDs to allocate bypass
-* TLB entries for reduced latency.
-*/
-   reg |= ARM_MMU500_ACR_SMTNMB_TLBEN | ARM_MMU500_ACR_S2CRB_TLBEN;
-   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sACR, reg);
-   }
-
/* Make sure all context banks are disabled and clear CB_FSR  */
for (i = 0; i < smmu->num_context_banks; ++i) {
arm_smmu_write_context_bank(smmu, i);
arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_FSR, FSR_FAULT);
-   /*
-* Disable MMU-500's not-particularly-beneficial next-page
-

[PATCH v2 12/17] iommu/arm-smmu: Rename arm-smmu-regs.h

2019-08-15 Thread Robin Murphy

We're about to start using it for more than just register definitions,
so generalise the name.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c  | 2 +-
 drivers/iommu/{arm-smmu-regs.h => arm-smmu.h} | 6 +++---
 drivers/iommu/qcom_iommu.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename drivers/iommu/{arm-smmu-regs.h => arm-smmu.h} (98%)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e9fd9117109e..f3b8301a3059 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -46,7 +46,7 @@
 #include 
 #include 
 
-#include "arm-smmu-regs.h"
+#include "arm-smmu.h"
 
 /*
  * Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU
diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu.h
similarity index 98%
rename from drivers/iommu/arm-smmu-regs.h
rename to drivers/iommu/arm-smmu.h
index a8e288192285..ccc3097a4247 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu.h
@@ -7,8 +7,8 @@
  * Author: Will Deacon 
  */
 
-#ifndef _ARM_SMMU_REGS_H
-#define _ARM_SMMU_REGS_H
+#ifndef _ARM_SMMU_H
+#define _ARM_SMMU_H
 
 #include 
 
@@ -194,4 +194,4 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_ATSR   0x8f0
 #define ATSR_ACTIVEBIT(0)
 
-#endif /* _ARM_SMMU_REGS_H */
+#endif /* _ARM_SMMU_H */
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 60a125dd7300..a2062d13584f 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -33,7 +33,7 @@
 #include 
 #include 
 
-#include "arm-smmu-regs.h"
+#include "arm-smmu.h"
 
 #define SMMU_INTR_SEL_NS 0x2000
 
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 07/17] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()

2019-08-15 Thread Robin Murphy

Since we now use separate iommu_gather_ops for stage 1 and stage 2
contexts, we may as well divide up the monolithic callback into its
respective stage 1 and stage 2 parts.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 66 ++--
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 19126230c780..5b12e96d7878 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -490,46 +490,54 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
arm_smmu_tlb_sync_global(smmu);
 }
 
-static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
- size_t granule, bool leaf, void 
*cookie)
+static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size,
+ size_t granule, bool leaf, void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
-   void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
+   void __iomem *reg = ARM_SMMU_CB(smmu, cfg->cbndx);
 
-   if (smmu_domain->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+   if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
wmb();
 
-   if (stage1) {
-   reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
+   reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
-   if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
-   iova = (iova >> 12) << 12;
-   iova |= cfg->asid;
-   do {
-   writel_relaxed(iova, reg);
-   iova += granule;
-   } while (size -= granule);
-   } else {
-   iova >>= 12;
-   iova |= (u64)cfg->asid << 48;
-   do {
-   writeq_relaxed(iova, reg);
-   iova += granule >> 12;
-   } while (size -= granule);
-   }
-   } else {
-   reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
- ARM_SMMU_CB_S2_TLBIIPAS2;
-   iova >>= 12;
+   if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
+   iova = (iova >> 12) << 12;
+   iova |= cfg->asid;
do {
-   smmu_write_atomic_lq(iova, reg);
+   writel_relaxed(iova, reg);
+   iova += granule;
+   } while (size -= granule);
+   } else {
+   iova >>= 12;
+   iova |= (u64)cfg->asid << 48;
+   do {
+   writeq_relaxed(iova, reg);
iova += granule >> 12;
} while (size -= granule);
}
 }
 
+static void arm_smmu_tlb_inv_range_s2(unsigned long iova, size_t size,
+ size_t granule, bool leaf, void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = cookie;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   void __iomem *reg = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
+
+   if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+   wmb();
+
+   reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : ARM_SMMU_CB_S2_TLBIIPAS2;
+   iova >>= 12;
+   do {
+   smmu_write_atomic_lq(iova, reg);
+   iova += granule >> 12;
+   } while (size -= granule);
+}
+
 /*
  * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears
  * almost negligible, but the benefit of getting the first one in as far ahead
@@ -550,13 +558,13 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long 
iova, size_t size,
 
 static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s1,
-   .tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
+   .tlb_add_flush  = arm_smmu_tlb_inv_range_s1,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
 static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = {
.tlb_flush_all  = arm_smmu_tlb_inv_context_s2,
-   .tlb_add_flush  = arm_smmu_tlb_inv_range_nosync,
+   .tlb_add_flush  = arm_smmu_tlb_inv_range_s2,
.tlb_sync   = arm_smmu_tlb_sync_context,
 };
 
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 13/17] iommu/arm-smmu: Add implementation infrastructure

2019-08-15 Thread Robin Murphy

Add some nascent infrastructure for handling implementation-specific
details outside the flow of the architectural code. This will allow us
to keep mutually-incompatible vendor-specific hooks in their own files
where the respective interested parties can maintain them with minimal
chance of conflicts. As somewhat of a template, we'll start with a
general place to collect the relatively trivial existing quirks.

Signed-off-by: Robin Murphy 
---
 MAINTAINERS   |  3 +-
 drivers/iommu/Makefile|  2 +-
 drivers/iommu/arm-smmu-impl.c | 13 +
 drivers/iommu/arm-smmu.c  | 82 ++--
 drivers/iommu/arm-smmu.h  | 89 +++
 5 files changed, 108 insertions(+), 81 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-impl.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6426db5198f0..35ff49ac303b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1350,8 +1350,7 @@ M:Will Deacon 
 R: Robin Murphy 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
-F: drivers/iommu/arm-smmu.c
-F: drivers/iommu/arm-smmu-v3.c
+F: drivers/iommu/arm-smmu*
 F: drivers/iommu/io-pgtable-arm.c
 F: drivers/iommu/io-pgtable-arm-v7s.c
 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..a2729aadd300 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
new file mode 100644
index ..efeb6d78da17
--- /dev/null
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Miscellaneous Arm SMMU implementation and integration quirks
+// Copyright (C) 2019 Arm Limited
+
+#define pr_fmt(fmt) "arm-smmu: " fmt
+
+#include "arm-smmu.h"
+
+
+struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   return smmu;
+}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f3b8301a3059..1e8153182830 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -19,7 +19,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -29,7 +28,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -41,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -66,9 +63,6 @@
 #define TLB_LOOP_TIMEOUT   100 /* 1s! */
 #define TLB_SPIN_COUNT 10
 
-/* Maximum number of context banks per SMMU */
-#define ARM_SMMU_MAX_CBS   128
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -86,19 +80,6 @@ module_param(disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
-enum arm_smmu_arch_version {
-   ARM_SMMU_V1,
-   ARM_SMMU_V1_64K,
-   ARM_SMMU_V2,
-};
-
-enum arm_smmu_implementation {
-   GENERIC_SMMU,
-   ARM_MMU500,
-   CAVIUM_SMMUV2,
-   QCOM_SMMUV2,
-};
-
 struct arm_smmu_s2cr {
struct iommu_group  *group;
int count;
@@ -136,65 +117,6 @@ struct arm_smmu_master_cfg {
 #define for_each_cfg_sme(fw, i, idx) \
for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
 
-struct arm_smmu_device {
-   struct device   *dev;
-
-   void __iomem*base;
-   unsigned intnumpage;
-   unsigned intpgshift;
-
-#define ARM_SMMU_FEAT_COHERENT_WALK(1 << 0)
-#define ARM_SMMU_FEAT_STREAM_MATCH (1 << 1)
-#define ARM_SMMU_FEAT_TRANS_S1 (1 << 2)
-#define ARM_SMMU_FEAT_TRANS_S2 (1 << 3)
-#define ARM_SMMU_FEAT_TRANS_NESTED (1 << 4)
-#define ARM_SMMU_FEAT_TRANS_OPS(1 << 5)
-#define ARM_SMMU_FEAT_VMID16   (1 << 6)
-#define ARM_SMMU_FEAT_FMT_AARCH64_4K   (1 << 7)
-#define ARM_SMMU_FEAT_FMT_AARCH64_16K  (1 << 8)
-#define ARM_SMMU_FEAT_FMT_AARCH64_64K  (1 << 9)
-#define ARM_SMMU_FEAT_FMT_AARCH32_L(1 << 10)
-#define ARM_SMMU_FEAT_FMT_AARCH32_S(1 << 11)
-#define ARM_SMMU_FEAT_EXIDS(1 << 12)
-   u32 features;
-
-#define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
-   u32 options;
-

[PATCH v2 10/17] iommu/arm-smmu: Abstract context bank accesses

2019-08-15 Thread Robin Murphy

Context bank accesses are fiddly enough to deserve a number of extra
helpers to keep the callsites looking sane, even though there are only
one or two of each.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 138 +--
 1 file changed, 73 insertions(+), 65 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d612dda2889f..e72554f334ee 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -82,9 +82,6 @@
((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)   \
? 0x400 : 0))
 
-/* Translation context bank */
-#define ARM_SMMU_CB(smmu, n)   ((smmu)->base + (((smmu)->numpage + (n)) << 
(smmu)->pgshift))
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
@@ -265,13 +262,34 @@ static void arm_smmu_writel(struct arm_smmu_device *smmu, 
int page, int offset,
writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
 }
 
+static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset)
+{
+   return readq_relaxed(arm_smmu_page(smmu, page) + offset);
+}
+
+static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset,
+   u64 val)
+{
+   writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
+}
+
 #define ARM_SMMU_GR1   1
+#define ARM_SMMU_CB(s, n)  ((s)->numpage + (n))
 
 #define arm_smmu_gr1_read(s, o)\
arm_smmu_readl((s), ARM_SMMU_GR1, (o))
 #define arm_smmu_gr1_write(s, o, v)\
arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v))
 
+#define arm_smmu_cb_read(s, n, o)  \
+   arm_smmu_readl((s), ARM_SMMU_CB((s), (n)), (o))
+#define arm_smmu_cb_write(s, n, o, v)  \
+   arm_smmu_writel((s), ARM_SMMU_CB((s), (n)), (o), (v))
+#define arm_smmu_cb_readq(s, n, o) \
+   arm_smmu_readq((s), ARM_SMMU_CB((s), (n)), (o))
+#define arm_smmu_cb_writeq(s, n, o, v) \
+   arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v))
+
 struct arm_smmu_option_prop {
u32 opt;
const char *prop;
@@ -427,15 +445,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, 
int idx)
 }
 
 /* Wait for any pending TLB invalidations to complete */
-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
-   void __iomem *sync, void __iomem *status)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
+   int sync, int status)
 {
unsigned int spin_cnt, delay;
+   u32 reg;
 
-   writel_relaxed(QCOM_DUMMY_VAL, sync);
+   arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
-   if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
+   reg = arm_smmu_readl(smmu, page, status);
+   if (!(reg & sTLBGSTATUS_GSACTIVE))
return;
cpu_relax();
}
@@ -447,12 +467,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
*smmu,
 
 static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
 {
-   void __iomem *base = ARM_SMMU_GR0(smmu);
unsigned long flags;
 
spin_lock_irqsave(>global_sync_lock, flags);
-   __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
-   base + ARM_SMMU_GR0_sTLBGSTATUS);
+   __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC,
+   ARM_SMMU_GR0_sTLBGSTATUS);
spin_unlock_irqrestore(>global_sync_lock, flags);
 }
 
@@ -460,12 +479,11 @@ static void arm_smmu_tlb_sync_context(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
unsigned long flags;
 
spin_lock_irqsave(_domain->cb_lock, flags);
-   __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
-   base + ARM_SMMU_CB_TLBSTATUS);
+   __arm_smmu_tlb_sync(smmu, ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx),
+   ARM_SMMU_CB_TLBSYNC, ARM_SMMU_CB_TLBSTATUS);
spin_unlock_irqrestore(_domain->cb_lock, flags);
 }
 
@@ -479,14 +497,13 @@ static void arm_smmu_tlb_sync_vmid(void *cookie)
 static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
struct arm_smmu_domain *smmu_domain = cookie;
-   struct arm_smmu_cfg *cfg = _domain->cfg;
-   void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
-
/*
-* NOTE: this is not a relaxed write; it needs to guarantee that PTEs
-* cleared by the current CPU are visible to the SMMU before the TLBI.
+* The TLBI write may be relaxed, so ensure that PTEs cleared by the
+* current

[PATCH v2 09/17] iommu/arm-smmu: Abstract GR1 accesses

2019-08-15 Thread Robin Murphy

Introduce some register access abstractions which we will later use to
encapsulate various quirks. GR1 is the easiest page to start with.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 24b4de1a4185..d612dda2889f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -71,7 +71,6 @@
 
 /* SMMU global address space */
 #define ARM_SMMU_GR0(smmu) ((smmu)->base)
-#define ARM_SMMU_GR1(smmu) ((smmu)->base + (1 << (smmu)->pgshift))
 
 /*
  * SMMU global address space with conditional offset to access secure
@@ -250,6 +249,29 @@ struct arm_smmu_domain {
struct iommu_domain domain;
 };
 
+static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
+{
+   return smmu->base + (n << smmu->pgshift);
+}
+
+static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset)
+{
+   return readl_relaxed(arm_smmu_page(smmu, page) + offset);
+}
+
+static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset,
+   u32 val)
+{
+   writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
+}
+
+#define ARM_SMMU_GR1   1
+
+#define arm_smmu_gr1_read(s, o)\
+   arm_smmu_readl((s), ARM_SMMU_GR1, (o))
+#define arm_smmu_gr1_write(s, o, v)\
+   arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v))
+
 struct arm_smmu_option_prop {
u32 opt;
const char *prop;
@@ -574,7 +596,6 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = _domain->cfg;
struct arm_smmu_device *smmu = smmu_domain->smmu;
-   void __iomem *gr1_base = ARM_SMMU_GR1(smmu);
void __iomem *cb_base;
 
cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
@@ -585,7 +606,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
 
fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
-   cbfrsynra = readl_relaxed(gr1_base + 
ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
+   cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
 
dev_err_ratelimited(smmu->dev,
"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
@@ -676,7 +697,7 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
bool stage1;
struct arm_smmu_cb *cb = >cbs[idx];
struct arm_smmu_cfg *cfg = cb->cfg;
-   void __iomem *cb_base, *gr1_base;
+   void __iomem *cb_base;
 
cb_base = ARM_SMMU_CB(smmu, idx);
 
@@ -686,7 +707,6 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
return;
}
 
-   gr1_base = ARM_SMMU_GR1(smmu);
stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
 
/* CBA2R */
@@ -699,7 +719,7 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
if (smmu->features & ARM_SMMU_FEAT_VMID16)
reg |= FIELD_PREP(CBA2R_VMID16, cfg->vmid);
 
-   writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBA2R(idx));
+   arm_smmu_gr1_write(smmu, ARM_SMMU_GR1_CBA2R(idx), reg);
}
 
/* CBAR */
@@ -718,7 +738,7 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
/* 8-bit VMIDs live in CBAR */
reg |= FIELD_PREP(CBAR_VMID, cfg->vmid);
}
-   writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(idx));
+   arm_smmu_gr1_write(smmu, ARM_SMMU_GR1_CBAR(idx), reg);
 
/*
 * TCR
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 08/17] iommu/arm-smmu: Get rid of weird "atomic" write

2019-08-15 Thread Robin Murphy

The smmu_write_atomic_lq oddity made some sense when the context
format was effectively tied to CONFIG_64BIT, but these days it's
simpler to just pick an explicit access size based on the format
for the one-and-a-half times we actually care.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5b12e96d7878..24b4de1a4185 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -83,17 +83,6 @@
((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)   \
? 0x400 : 0))
 
-/*
- * Some 64-bit registers only make sense to write atomically, but in such
- * cases all the data relevant to AArch32 formats lies within the lower word,
- * therefore this actually makes more sense than it might first appear.
- */
-#ifdef CONFIG_64BIT
-#define smmu_write_atomic_lq   writeq_relaxed
-#else
-#define smmu_write_atomic_lq   writel_relaxed
-#endif
-
 /* Translation context bank */
 #define ARM_SMMU_CB(smmu, n)   ((smmu)->base + (((smmu)->numpage + (n)) << 
(smmu)->pgshift))
 
@@ -533,7 +522,10 @@ static void arm_smmu_tlb_inv_range_s2(unsigned long iova, 
size_t size,
reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : ARM_SMMU_CB_S2_TLBIIPAS2;
iova >>= 12;
do {
-   smmu_write_atomic_lq(iova, reg);
+   if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   writeq_relaxed(iova, reg);
+   else
+   writel_relaxed(iova, reg);
iova += granule >> 12;
} while (size -= granule);
 }
@@ -1371,11 +1363,10 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
 
spin_lock_irqsave(_domain->cb_lock, flags);
-   /* ATS1 registers can only be written atomically */
va = iova & ~0xfffUL;
-   if (smmu->version == ARM_SMMU_V2)
-   smmu_write_atomic_lq(va, cb_base + ARM_SMMU_CB_ATS1PR);
-   else /* Register is only 32-bit in v1 */
+   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   writeq_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR);
+   else
writel_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR);
 
if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 06/17] iommu/arm-smmu: Rework cb_base handling

2019-08-15 Thread Robin Murphy

To keep register-access quirks manageable, we want to structure things
to avoid needing too many individual overrides. It seems fairly clean to
have a single interface which handles both global and context registers
in terms of the architectural pages, so the first preparatory step is to
rework cb_base into a page number rather than an absolute address.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a877de006d02..19126230c780 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -95,7 +95,7 @@
 #endif
 
 /* Translation context bank */
-#define ARM_SMMU_CB(smmu, n)   ((smmu)->cb_base + ((n) << (smmu)->pgshift))
+#define ARM_SMMU_CB(smmu, n)   ((smmu)->base + (((smmu)->numpage + (n)) << 
(smmu)->pgshift))
 
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
@@ -168,8 +168,8 @@ struct arm_smmu_device {
struct device   *dev;
 
void __iomem*base;
-   void __iomem*cb_base;
-   unsigned long   pgshift;
+   unsigned intnumpage;
+   unsigned intpgshift;
 
 #define ARM_SMMU_FEAT_COHERENT_WALK(1 << 0)
 #define ARM_SMMU_FEAT_STREAM_MATCH (1 << 1)
@@ -1815,7 +1815,7 @@ static int arm_smmu_id_size_to_bits(int size)
 
 static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 {
-   unsigned long size;
+   unsigned int size;
void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
u32 id;
bool cttw_reg, cttw_fw = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK;
@@ -1899,7 +1899,7 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
return -ENOMEM;
 
dev_notice(smmu->dev,
-  "\tstream matching with %lu register groups", size);
+  "\tstream matching with %u register groups", size);
}
/* s2cr->type == 0 means translation, so initialise explicitly */
smmu->s2crs = devm_kmalloc_array(smmu->dev, size, sizeof(*smmu->s2crs),
@@ -1925,11 +1925,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
 
/* Check for size mismatch of SMMU address space from mapped region */
size = 1 << (FIELD_GET(ID1_NUMPAGENDXB, id) + 1);
-   size <<= smmu->pgshift;
-   if (smmu->cb_base != gr0_base + size)
+   if (smmu->numpage != 2 * size << smmu->pgshift)
dev_warn(smmu->dev,
-   "SMMU address space size (0x%lx) differs from mapped 
region size (0x%tx)!\n",
-   size * 2, (smmu->cb_base - gr0_base) * 2);
+   "SMMU address space size (0x%x) differs from mapped 
region size (0x%x)!\n",
+   2 * size << smmu->pgshift, smmu->numpage);
+   /* Now properly encode NUMPAGE to subsequently derive SMMU_CB_BASE */
+   smmu->numpage = size;
 
smmu->num_s2_context_banks = FIELD_GET(ID1_NUMS2CB, id);
smmu->num_context_banks = FIELD_GET(ID1_NUMCB, id);
@@ -2200,7 +2201,11 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
smmu->base = devm_ioremap_resource(dev, res);
if (IS_ERR(smmu->base))
return PTR_ERR(smmu->base);
-   smmu->cb_base = smmu->base + resource_size(res) / 2;
+   /*
+* The resource size should effectively match the value of SMMU_TOP;
+* stash that temporarily until we know PAGESIZE to validate it with.
+*/
+   smmu->numpage = resource_size(res);
 
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 04/17] iommu/arm-smmu: Convert GR1 registers to bitfields

2019-08-15 Thread Robin Murphy

As for GR0, use the bitfield helpers to make GR1 usage a little cleaner,
and use it as an opportunity to audit and tidy the definitions. This
tweaks the handling of CBAR types to match what we did for S2CR a while
back, and fixes a couple of names which didn't quite match the latest
architecture spec (IHI0062D.c).

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-regs.h | 33 ++---
 drivers/iommu/arm-smmu.c  | 18 +-
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index 351ab09c7d4f..8522330ee624 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -108,30 +108,25 @@ enum arm_smmu_s2cr_type {
 
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
-#define CBAR_VMID_SHIFT0
-#define CBAR_VMID_MASK 0xff
-#define CBAR_S1_BPSHCFG_SHIFT  8
-#define CBAR_S1_BPSHCFG_MASK   3
-#define CBAR_S1_BPSHCFG_NSH3
-#define CBAR_S1_MEMATTR_SHIFT  12
-#define CBAR_S1_MEMATTR_MASK   0xf
+#define CBAR_IRPTNDX   GENMASK(31, 24)
+#define CBAR_TYPE  GENMASK(17, 16)
+enum arm_smmu_cbar_type {
+   CBAR_TYPE_S2_TRANS,
+   CBAR_TYPE_S1_TRANS_S2_BYPASS,
+   CBAR_TYPE_S1_TRANS_S2_FAULT,
+   CBAR_TYPE_S1_TRANS_S2_TRANS,
+};
+#define CBAR_S1_MEMATTRGENMASK(15, 12)
 #define CBAR_S1_MEMATTR_WB 0xf
-#define CBAR_TYPE_SHIFT16
-#define CBAR_TYPE_MASK 0x3
-#define CBAR_TYPE_S2_TRANS (0 << CBAR_TYPE_SHIFT)
-#define CBAR_TYPE_S1_TRANS_S2_BYPASS   (1 << CBAR_TYPE_SHIFT)
-#define CBAR_TYPE_S1_TRANS_S2_FAULT(2 << CBAR_TYPE_SHIFT)
-#define CBAR_TYPE_S1_TRANS_S2_TRANS(3 << CBAR_TYPE_SHIFT)
-#define CBAR_IRPTNDX_SHIFT 24
-#define CBAR_IRPTNDX_MASK  0xff
+#define CBAR_S1_BPSHCFGGENMASK(9, 8)
+#define CBAR_S1_BPSHCFG_NSH3
+#define CBAR_VMID  GENMASK(7, 0)
 
 #define ARM_SMMU_GR1_CBFRSYNRA(n)  (0x400 + ((n) << 2))
 
 #define ARM_SMMU_GR1_CBA2R(n)  (0x800 + ((n) << 2))
-#define CBA2R_RW64_32BIT   (0 << 0)
-#define CBA2R_RW64_64BIT   (1 << 0)
-#define CBA2R_VMID_SHIFT   16
-#define CBA2R_VMID_MASK0x
+#define CBA2R_VMID16   GENMASK(31, 16)
+#define CBA2R_VA64 BIT(0)
 
 #define ARM_SMMU_CB_SCTLR  0x0
 #define ARM_SMMU_CB_ACTLR  0x4
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 105015798c06..293a95b0d682 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -237,7 +237,7 @@ struct arm_smmu_cfg {
u16 asid;
u16 vmid;
};
-   u32 cbar;
+   enum arm_smmu_cbar_type cbar;
enum arm_smmu_context_fmt   fmt;
 };
 #define INVALID_IRPTNDX0xff
@@ -692,31 +692,31 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
/* CBA2R */
if (smmu->version > ARM_SMMU_V1) {
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
-   reg = CBA2R_RW64_64BIT;
+   reg = CBA2R_VA64;
else
-   reg = CBA2R_RW64_32BIT;
+   reg = 0;
/* 16-bit VMIDs live in CBA2R */
if (smmu->features & ARM_SMMU_FEAT_VMID16)
-   reg |= cfg->vmid << CBA2R_VMID_SHIFT;
+   reg |= FIELD_PREP(CBA2R_VMID16, cfg->vmid);
 
writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBA2R(idx));
}
 
/* CBAR */
-   reg = cfg->cbar;
+   reg = FIELD_PREP(CBAR_TYPE, cfg->cbar);
if (smmu->version < ARM_SMMU_V2)
-   reg |= cfg->irptndx << CBAR_IRPTNDX_SHIFT;
+   reg |= FIELD_PREP(CBAR_IRPTNDX, cfg->irptndx);
 
/*
 * Use the weakest shareability/memory types, so they are
 * overridden by the ttbcr/pte.
 */
if (stage1) {
-   reg |= (CBAR_S1_BPSHCFG_NSH << CBAR_S1_BPSHCFG_SHIFT) |
-   (CBAR_S1_MEMATTR_WB << CBAR_S1_MEMATTR_SHIFT);
+   reg |= FIELD_PREP(CBAR_S1_BPSHCFG, CBAR_S1_BPSHCFG_NSH) |
+   FIELD_PREP(CBAR_S1_MEMATTR, CBAR_S1_MEMATTR_WB);
} else if (!(smmu->features & ARM_SMMU_FEAT_VMID16)) {
/* 8-bit VMIDs live in CBAR */
-   reg |= cfg->vmid << CBAR_VMID_SHIFT;
+   reg |= FIELD_PREP(CBAR_VMID, cfg->vmid);
}
writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(idx));
 
-- 
2.21.0.dirty

___
iommu mailing list

[PATCH v2 02/17] iommu/qcom: Mask TLBI addresses correctly

2019-08-15 Thread Robin Murphy

As with arm-smmu from whence this code was borrowed, the IOVAs passed in
here happen to be at least page-aligned anyway, but still; oh dear.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/qcom_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 34d0b9783b3e..bed948c3058a 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -155,7 +155,7 @@ static void qcom_iommu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]);
size_t s = size;
 
-   iova &= ~12UL;
+   iova = (iova >> 12) << 12;
iova |= ctx->asid;
do {
iommu_writel(ctx, reg, iova);
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 05/17] iommu/arm-smmu: Convert context bank registers to bitfields

2019-08-15 Thread Robin Murphy

Finish the final part of the job, once again updating some names to
match the current spec.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-regs.h | 86 ++-
 drivers/iommu/arm-smmu.c  | 16 +++
 drivers/iommu/qcom_iommu.c| 13 +++---
 3 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index 8522330ee624..a8e288192285 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -129,19 +129,59 @@ enum arm_smmu_cbar_type {
 #define CBA2R_VA64 BIT(0)
 
 #define ARM_SMMU_CB_SCTLR  0x0
+#define SCTLR_S1_ASIDPNE   BIT(12)
+#define SCTLR_CFCFGBIT(7)
+#define SCTLR_CFIE BIT(6)
+#define SCTLR_CFRE BIT(5)
+#define SCTLR_EBIT(4)
+#define SCTLR_AFE  BIT(2)
+#define SCTLR_TRE  BIT(1)
+#define SCTLR_MBIT(0)
+
 #define ARM_SMMU_CB_ACTLR  0x4
+
 #define ARM_SMMU_CB_RESUME 0x8
-#define ARM_SMMU_CB_TTBCR2 0x10
+#define RESUME_TERMINATE   BIT(0)
+
+#define ARM_SMMU_CB_TCR2   0x10
+#define TCR2_SEP   GENMASK(17, 15)
+#define TCR2_SEP_UPSTREAM  0x7
+#define TCR2_ASBIT(4)
+
 #define ARM_SMMU_CB_TTBR0  0x20
 #define ARM_SMMU_CB_TTBR1  0x28
-#define ARM_SMMU_CB_TTBCR  0x30
+#define TTBRn_ASID GENMASK_ULL(63, 48)
+
+#define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_CB_CONTEXTIDR 0x34
 #define ARM_SMMU_CB_S1_MAIR0   0x38
 #define ARM_SMMU_CB_S1_MAIR1   0x3c
+
 #define ARM_SMMU_CB_PAR0x50
+#define CB_PAR_F   BIT(0)
+
 #define ARM_SMMU_CB_FSR0x58
+#define FSR_MULTI  BIT(31)
+#define FSR_SS BIT(30)
+#define FSR_UUTBIT(8)
+#define FSR_ASFBIT(7)
+#define FSR_TLBLKF BIT(6)
+#define FSR_TLBMCF BIT(5)
+#define FSR_EF BIT(4)
+#define FSR_PF BIT(3)
+#define FSR_AFFBIT(2)
+#define FSR_TF BIT(1)
+
+#define FSR_IGN(FSR_AFF | FSR_ASF | \
+FSR_TLBMCF | FSR_TLBLKF)
+#define FSR_FAULT  (FSR_MULTI | FSR_SS | FSR_UUT | \
+FSR_EF | FSR_PF | FSR_TF | FSR_IGN)
+
 #define ARM_SMMU_CB_FAR0x60
+
 #define ARM_SMMU_CB_FSYNR0 0x68
+#define FSYNR0_WNR BIT(4)
+
 #define ARM_SMMU_CB_S1_TLBIVA  0x600
 #define ARM_SMMU_CB_S1_TLBIASID0x610
 #define ARM_SMMU_CB_S1_TLBIVAL 0x620
@@ -150,46 +190,8 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TLBSYNC0x7f0
 #define ARM_SMMU_CB_TLBSTATUS  0x7f4
 #define ARM_SMMU_CB_ATS1PR 0x800
+
 #define ARM_SMMU_CB_ATSR   0x8f0
-
-#define SCTLR_S1_ASIDPNE   (1 << 12)
-#define SCTLR_CFCFG(1 << 7)
-#define SCTLR_CFIE (1 << 6)
-#define SCTLR_CFRE (1 << 5)
-#define SCTLR_E(1 << 4)
-#define SCTLR_AFE  (1 << 2)
-#define SCTLR_TRE  (1 << 1)
-#define SCTLR_M(1 << 0)
-
-#define CB_PAR_F   (1 << 0)
-
-#define ATSR_ACTIVE(1 << 0)
-
-#define RESUME_RETRY   (0 << 0)
-#define RESUME_TERMINATE   (1 << 0)
-
-#define TTBCR2_SEP_SHIFT   15
-#define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT)
-#define TTBCR2_AS  (1 << 4)
-
-#define TTBRn_ASID_SHIFT   48
-
-#define FSR_MULTI  (1 << 31)
-#define FSR_SS (1 << 30)
-#define FSR_UUT(1 << 8)
-#define FSR_ASF(1 << 7)
-#define FSR_TLBLKF (1 << 6)
-#define FSR_TLBMCF (1 << 5)
-#define FSR_EF (1 << 4)
-#define FSR_PF (1 << 3)
-#define FSR_AFF(1 << 2)
-#define FSR_TF (1 << 1)
-
-#define FSR_IGN(FSR_AFF | FSR_ASF | \
-FSR_TLBMCF | FSR_TLBLKF)
-#define FSR_FAULT  (FSR_MULTI | FSR_SS | FSR_UUT | \
-FSR_EF | FSR_PF | FSR_TF | FSR_IGN)
-
-#define FSYNR0_WNR (1 << 4)
+#define

[PATCH v2 03/17] iommu/arm-smmu: Convert GR0 registers to bitfields

2019-08-15 Thread Robin Murphy

FIELD_PREP remains a terrible name, but the overall simplification will
make further work on this stuff that much more manageable. This also
serves as an audit of the header, wherein we can impose a consistent
grouping and ordering of the offset and field definitions

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-regs.h | 126 --
 drivers/iommu/arm-smmu.c  |  51 +++---
 2 files changed, 84 insertions(+), 93 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index 1c278f7ae888..351ab09c7d4f 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -10,111 +10,101 @@
 #ifndef _ARM_SMMU_REGS_H
 #define _ARM_SMMU_REGS_H
 
+#include 
+
 /* Configuration registers */
 #define ARM_SMMU_GR0_sCR0  0x0
-#define sCR0_CLIENTPD  (1 << 0)
-#define sCR0_GFRE  (1 << 1)
-#define sCR0_GFIE  (1 << 2)
-#define sCR0_EXIDENABLE(1 << 3)
-#define sCR0_GCFGFRE   (1 << 4)
-#define sCR0_GCFGFIE   (1 << 5)
-#define sCR0_USFCFG(1 << 10)
-#define sCR0_VMIDPNE   (1 << 11)
-#define sCR0_PTM   (1 << 12)
-#define sCR0_FB(1 << 13)
-#define sCR0_VMID16EN  (1 << 31)
-#define sCR0_BSU_SHIFT 14
-#define sCR0_BSU_MASK  0x3
+#define sCR0_VMID16EN  BIT(31)
+#define sCR0_BSU   GENMASK(15, 14)
+#define sCR0_FBBIT(13)
+#define sCR0_PTM   BIT(12)
+#define sCR0_VMIDPNE   BIT(11)
+#define sCR0_USFCFGBIT(10)
+#define sCR0_GCFGFIE   BIT(5)
+#define sCR0_GCFGFRE   BIT(4)
+#define sCR0_EXIDENABLEBIT(3)
+#define sCR0_GFIE  BIT(2)
+#define sCR0_GFRE  BIT(1)
+#define sCR0_CLIENTPD  BIT(0)
 
 /* Auxiliary Configuration register */
 #define ARM_SMMU_GR0_sACR  0x10
 
 /* Identification registers */
 #define ARM_SMMU_GR0_ID0   0x20
+#define ID0_S1TS   BIT(30)
+#define ID0_S2TS   BIT(29)
+#define ID0_NTSBIT(28)
+#define ID0_SMSBIT(27)
+#define ID0_ATOSNS BIT(26)
+#define ID0_PTFS_NO_AARCH32BIT(25)
+#define ID0_PTFS_NO_AARCH32S   BIT(24)
+#define ID0_NUMIRPTGENMASK(23, 16)
+#define ID0_CTTW   BIT(14)
+#define ID0_NUMSIDBGENMASK(12, 9)
+#define ID0_EXIDS  BIT(8)
+#define ID0_NUMSMRGGENMASK(7, 0)
+
 #define ARM_SMMU_GR0_ID1   0x24
+#define ID1_PAGESIZE   BIT(31)
+#define ID1_NUMPAGENDXBGENMASK(30, 28)
+#define ID1_NUMS2CBGENMASK(23, 16)
+#define ID1_NUMCB  GENMASK(7, 0)
+
 #define ARM_SMMU_GR0_ID2   0x28
+#define ID2_VMID16 BIT(15)
+#define ID2_PTFS_64K   BIT(14)
+#define ID2_PTFS_16K   BIT(13)
+#define ID2_PTFS_4KBIT(12)
+#define ID2_UBSGENMASK(11, 8)
+#define ID2_OASGENMASK(7, 4)
+#define ID2_IASGENMASK(3, 0)
+
 #define ARM_SMMU_GR0_ID3   0x2c
 #define ARM_SMMU_GR0_ID4   0x30
 #define ARM_SMMU_GR0_ID5   0x34
 #define ARM_SMMU_GR0_ID6   0x38
+
 #define ARM_SMMU_GR0_ID7   0x3c
+#define ID7_MAJOR  GENMASK(7, 4)
+#define ID7_MINOR  GENMASK(3, 0)
+
 #define ARM_SMMU_GR0_sGFSR 0x48
 #define ARM_SMMU_GR0_sGFSYNR0  0x50
 #define ARM_SMMU_GR0_sGFSYNR1  0x54
 #define ARM_SMMU_GR0_sGFSYNR2  0x58
 
-#define ID0_S1TS   (1 << 30)
-#define ID0_S2TS   (1 << 29)
-#define ID0_NTS(1 << 28)
-#define ID0_SMS(1 << 27)
-#define ID0_ATOSNS (1 << 26)
-#define ID0_PTFS_NO_AARCH32(1 << 25)
-#define ID0_PTFS_NO_AARCH32S   (1 << 24)
-#define ID0_CTTW   (1 << 14)
-#define ID0_NUMIRPT_SHIFT  16
-#define ID0_NUMIRPT_MASK   0xff
-#define ID0_NUMSIDB_SHIFT  9
-#define ID0_NUMSIDB_MASK   0xf
-#define ID0_EXIDS  (1 << 8)
-#define ID0_NUMSMRG_SHIFT  0
-#define ID0_NUMSMRG_MASK   0xff
-
-#define ID1_PAGESIZE   (1 << 31)
-#define ID1_NUMPAGENDXB_SHIFT  28
-#define ID1_NUMPAGENDXB_MASK   7
-#define ID1_NUMS2CB_SHIFT  16
-#define ID1_NUMS2CB_MASK

[PATCH v2 01/17] iommu/arm-smmu: Mask TLBI address correctly

2019-08-15 Thread Robin Murphy

The less said about "~12UL" the better. Oh dear.

We get away with it due to calling constraints that mean IOVAs are
implicitly at least page-aligned to begin with, but still; oh dear.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 64977c131ee6..d60ee292ecee 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -504,7 +504,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long 
iova, size_t size,
reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
-   iova &= ~12UL;
+   iova = (iova >> 12) << 12;
iova |= cfg->asid;
do {
writel_relaxed(iova, reg);
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 00/17] Arm SMMU refactoring

2019-08-15 Thread Robin Murphy

Hi all,

v1 for context: https://patchwork.kernel.org/cover/11087347/

Here's a quick v2 attempting to address all the minor comments; I've
tweaked a whole bunch of names, added some verbosity in macros and
comments for clarity, and rejigged arm_smmu_impl_init() for a bit more
structure. The (new) patches #1 and #2 are up front as conceptual fixes,
although they're not actually critical - it turns out to be more of an
embarrassment than a real problem in practice.

For ease of reference, the overall diff against v1 is attached below.

Robin.


Robin Murphy (17):
  iommu/arm-smmu: Mask TLBI address correctly
  iommu/qcom: Mask TLBI addresses correctly
  iommu/arm-smmu: Convert GR0 registers to bitfields
  iommu/arm-smmu: Convert GR1 registers to bitfields
  iommu/arm-smmu: Convert context bank registers to bitfields
  iommu/arm-smmu: Rework cb_base handling
  iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()
  iommu/arm-smmu: Get rid of weird "atomic" write
  iommu/arm-smmu: Abstract GR1 accesses
  iommu/arm-smmu: Abstract context bank accesses
  iommu/arm-smmu: Abstract GR0 accesses
  iommu/arm-smmu: Rename arm-smmu-regs.h
  iommu/arm-smmu: Add implementation infrastructure
  iommu/arm-smmu: Move Secure access quirk to implementation
  iommu/arm-smmu: Add configuration implementation hook
  iommu/arm-smmu: Add reset implementation hook
  iommu/arm-smmu: Add context init implementation hook

 MAINTAINERS   |   3 +-
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c | 174 +++
 drivers/iommu/arm-smmu-regs.h | 210 -
 drivers/iommu/arm-smmu.c  | 573 +++---
 drivers/iommu/arm-smmu.h  | 394 +++
 drivers/iommu/qcom_iommu.c|  17 +-
 7 files changed, 764 insertions(+), 609 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-impl.c
 delete mode 100644 drivers/iommu/arm-smmu-regs.h
 create mode 100644 drivers/iommu/arm-smmu.h

->8-
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 3c731e087854..e22e9004f449 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -28,7 +28,7 @@ static int arm_smmu_gr0_ns(int offset)
 static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page,
int offset)
 {
-   if (page == 0)
+   if (page == ARM_SMMU_GR0)
offset = arm_smmu_gr0_ns(offset);
return readl_relaxed(arm_smmu_page(smmu, page) + offset);
 }
@@ -36,7 +36,7 @@ static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int 
page,
 static void arm_smmu_write_ns(struct arm_smmu_device *smmu, int page,
  int offset, u32 val)
 {
-   if (page == 0)
+   if (page == ARM_SMMU_GR0)
offset = arm_smmu_gr0_ns(offset);
writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
 }
@@ -52,18 +52,17 @@ struct cavium_smmu {
struct arm_smmu_device smmu;
u32 id_base;
 };
-#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu)
 
 static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 {
static atomic_t context_count = ATOMIC_INIT(0);
+   struct cavium_smmu *cs = container_of(smmu, struct cavium_smmu, smmu);
/*
 * Cavium CN88xx erratum #27704.
 * Ensure ASID and VMID allocation is unique across all SMMUs in
 * the system.
 */
-   to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks,
-  _count);
+   cs->id_base = atomic_fetch_add(smmu->num_context_banks, _count);
dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
27704\n");
 
return 0;
@@ -71,12 +70,13 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 
 int cavium_init_context(struct arm_smmu_domain *smmu_domain)
 {
-   u32 id_base = to_csmmu(smmu_domain->smmu)->id_base;
+   struct cavium_smmu *cs = container_of(smmu_domain->smmu,
+ struct cavium_smmu, smmu);
 
if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
-   smmu_domain->cfg.vmid += id_base;
+   smmu_domain->cfg.vmid += cs->id_base;
else
-   smmu_domain->cfg.asid += id_base;
+   smmu_domain->cfg.asid += cs->id_base;
 
return 0;
 }
@@ -88,18 +88,18 @@ const struct arm_smmu_impl cavium_impl = {
 
 struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu)
 {
-   struct cavium_smmu *csmmu;
+   struct cavium_smmu *cs;
 
-   csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL);
-   if (!csmmu)
+   cs = devm_kzalloc(smmu->dev, sizeof(*cs), GFP_KERNEL);
+   if (!cs)
return ERR_PTR(-ENOMEM);
 
-   csmmu->smmu = *smmu;
-   csmmu->smmu.impl = _impl;
+   cs->smmu = *smmu;
+   cs->smmu.impl = _impl;
 
devm_kfree(smmu->dev, smmu);
 
-   return >smmu;
+

Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition)

2019-08-15 Thread Paul Menzel

Dear Dave,


On 13.08.19 04:46, Dave Young wrote:

> On 08/13/19 at 10:43am, Dave Young wrote:

[…]

> The question is to Paul,  also it would be always good to cc kexec mail
> list for kexec and kdump issues.

kexec@ was CCed in my original mail, but my messages got moderated. It’d
great if you checked that with the list administrators.

> Your mail to 'kexec' with the subject
> 
> Crash kernel with 256 MB reserved memory runs into OOM condition
> 
> Is being held until the list moderator can review it for approval.
> 
> The reason it is being held:
> 
> Message has a suspicious header
> 
> Either the message will get posted to the list, or you will receive
> notification of the moderator's decision.  If you would like to cancel
> this posting, please visit the following URL:
> 
> 
> http://lists.infradead.org/mailman/confirm/kexec/a23ab6162ef34d099af5dd86c46113def5152bb1


Kind regards,

Paul



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH v6 5/8] iommu: Add bounce page APIs

2019-08-15 Thread Joerg Roedel

On Thu, Aug 15, 2019 at 02:15:32PM +0800, Lu Baolu wrote:
> iommu_map/unmap() APIs haven't parameters for dma direction and
> attributions. These parameters are elementary for DMA APIs. Say,
> after map, if the dma direction is TO_DEVICE and a bounce buffer is
> used, we must sync the data from the original dma buffer to the bounce
> buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap,
> we need to sync the data from the bounce buffer onto the original
> buffer.

The DMA direction from DMA-API maps to the protections in iommu_map():

DMA_FROM_DEVICE:IOMMU_WRITE
DMA_TO_DEVICE:  IOMMU_READ
DMA_BIDIRECTIONAL   IOMMU_READ | IOMMU_WRITE

And for the sync DMA-API also has separate functions for either
direction. So I don't see why these extra functions are needed in the
IOMMU-API.


Regards,

Joerg

Re: [PATCH v3 1/2] iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable format

2019-08-15 Thread Jordan Crouse

On Wed, Aug 07, 2019 at 04:21:39PM -0600, Jordan Crouse wrote:
> Add a new sub-format ARM_ADRENO_GPU_LPAE to set up TTBR0 and TTBR1 for
> use by the Adreno GPU. This will allow The GPU driver to map global
> buffers in the TTBR1 and leave the TTBR0 configured but unset and
> free to be changed dynamically by the GPU.

It would take a bit of code rework and un-static-ifying a few functions but I'm
wondering if it would be cleaner to add the Adreno GPU pagetable format in a new
file, such as io-pgtable-adreno.c. 

Jordan

> Signed-off-by: Jordan Crouse 
> ---
> 
>  drivers/iommu/io-pgtable-arm.c | 214 
> ++---
>  drivers/iommu/io-pgtable.c |   1 +
>  include/linux/io-pgtable.h |   2 +
>  3 files changed, 202 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 161a7d5..8eb0dbb 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -112,13 +112,19 @@
>  #define ARM_32_LPAE_TCR_EAE  (1 << 31)
>  #define ARM_64_LPAE_S2_TCR_RES1  (1 << 31)
>  
> +#define ARM_LPAE_TCR_EPD0(1 << 7)
>  #define ARM_LPAE_TCR_EPD1(1 << 23)
>  
>  #define ARM_LPAE_TCR_TG0_4K  (0 << 14)
>  #define ARM_LPAE_TCR_TG0_64K (1 << 14)
>  #define ARM_LPAE_TCR_TG0_16K (2 << 14)
>  
> +#define ARM_LPAE_TCR_TG1_4K  (0 << 30)
> +#define ARM_LPAE_TCR_TG1_64K (1 << 30)
> +#define ARM_LPAE_TCR_TG1_16K (2 << 30)
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT   12
> +#define ARM_LPAE_TCR_SH1_SHIFT   28
>  #define ARM_LPAE_TCR_SH0_MASK0x3
>  #define ARM_LPAE_TCR_SH_NS   0
>  #define ARM_LPAE_TCR_SH_OS   2
> @@ -126,6 +132,8 @@
>  
>  #define ARM_LPAE_TCR_ORGN0_SHIFT 10
>  #define ARM_LPAE_TCR_IRGN0_SHIFT 8
> +#define ARM_LPAE_TCR_ORGN1_SHIFT 26
> +#define ARM_LPAE_TCR_IRGN1_SHIFT 24
>  #define ARM_LPAE_TCR_RGN_MASK0x3
>  #define ARM_LPAE_TCR_RGN_NC  0
>  #define ARM_LPAE_TCR_RGN_WBWA1
> @@ -136,6 +144,7 @@
>  #define ARM_LPAE_TCR_SL0_MASK0x3
>  
>  #define ARM_LPAE_TCR_T0SZ_SHIFT  0
> +#define ARM_LPAE_TCR_T1SZ_SHIFT  16
>  #define ARM_LPAE_TCR_SZ_MASK 0xf
>  
>  #define ARM_LPAE_TCR_PS_SHIFT16
> @@ -152,6 +161,14 @@
>  #define ARM_LPAE_TCR_PS_48_BIT   0x5ULL
>  #define ARM_LPAE_TCR_PS_52_BIT   0x6ULL
>  
> +#define ARM_LPAE_TCR_SEP_SHIFT   47
> +#define ARM_LPAE_TCR_SEP_31  (0x0ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +#define ARM_LPAE_TCR_SEP_35  (0x1ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +#define ARM_LPAE_TCR_SEP_39  (0x2ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +#define ARM_LPAE_TCR_SEP_41  (0x3ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +#define ARM_LPAE_TCR_SEP_43  (0x4ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +#define ARM_LPAE_TCR_SEP_UPSTREAM(0x7ULL << ARM_LPAE_TCR_SEP_SHIFT)
> +
>  #define ARM_LPAE_MAIR_ATTR_SHIFT(n)  ((n) << 3)
>  #define ARM_LPAE_MAIR_ATTR_MASK  0xff
>  #define ARM_LPAE_MAIR_ATTR_DEVICE0x04
> @@ -426,7 +443,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
> arm_lpae_io_pgtable *data,
>   arm_lpae_iopte pte;
>  
>   if (data->iop.fmt == ARM_64_LPAE_S1 ||
> - data->iop.fmt == ARM_32_LPAE_S1) {
> + data->iop.fmt == ARM_32_LPAE_S1 ||
> + data->iop.fmt == ARM_ADRENO_GPU_LPAE) {
>   pte = ARM_LPAE_PTE_nG;
>   if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
>   pte |= ARM_LPAE_PTE_AP_RDONLY;
> @@ -497,6 +515,21 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
>   return ret;
>  }
>  
> +static int arm_adreno_gpu_lpae_map(struct io_pgtable_ops *ops,
> + unsigned long iova, phys_addr_t paddr, size_t size,
> + int iommu_prot)
> +{
> + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> + unsigned long mask = 1UL << data->iop.cfg.ias;
> +
> + /* This configuration expects all iova addresses to be in TTBR1 */
> + if (WARN_ON(iova & mask))
> + return -ERANGE;
> +
> + /* Mask off the sign extended bits and map as usual */
> + return arm_lpae_map(ops, iova & (mask - 1), paddr, size, iommu_prot);
> +}
> +
>  static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int 
> lvl,
>   arm_lpae_iopte *ptep)
>  {
> @@ -643,6 +676,22 @@ static size_t __arm_lpae_unmap(struct 
> arm_lpae_io_pgtable *data,
>   return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep);
>  }
>  
> +static size_t arm_adreno_gpu_lpae_unmap(struct io_pgtable_ops *ops,
> +unsigned long iova, size_t size)
> +{
> + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> + arm_lpae_iopte *ptep = data->pgd;
> + int lvl =

Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support

2019-08-15 Thread Jordan Crouse

On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote:
> (Sigh, resend. I freaked out my SMTP server)
> 
> This is part of an ongoing evolution for enabling split pagetable support for
> arm-smmu. Previous versions can be found [1].
> 
> In the discussion for v2 Robin pointed out that this is a very Adreno specific
> use case and that is exactly true. Not only do we want to configure and use a
> pagetable in the TTBR1 space, we also want to configure the TTBR0 region but
> not allocate a pagetable for it or touch it until the GPU hardware does so. As
> much as I want it to be a generic concept it really isn't.
> 
> This revision leans into that idea. Most of the same io-pgtable code is there
> but now it is wrapped as an Adreno GPU specific format that is selected by the
> compatible string in the arm-smmu device.
> 
> Additionally, per Robin's suggestion we are skipping creating a TTBR0 
> pagetable
> to save on wasted memory.
> 
> This isn't as clean as I would like it to be but I think that this is a better
> direction than trying to pretend that the generic format would work.
> 
> I'm tempting fate by posting this and then taking some time off, but I wanted
> to try to kick off a conversation or at least get some flames so I can try to
> refine this again next week. Please take a look and give some advice on the
> direction.

Will, Robin -

Modulo the impl changes from Robin, do you think that using a dedicated
pagetable format is the right approach for supporting split pagetables for the
Adreno GPU?

If so, then is adding the changes to io-pgtable-arm.c possible for 5.4 and then
add the implementation specific code on top of Robin's stack later or do you
feel they should come as part of a package deal?

Jordan

> Jordan Crouse (2):
>   iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable
> format
>   iommu/arm-smmu: Add support for Adreno GPU pagetable formats
> 
>  drivers/iommu/arm-smmu.c   |   8 +-
>  drivers/iommu/io-pgtable-arm.c | 214 
> ++---
>  drivers/iommu/io-pgtable.c |   1 +
>  include/linux/io-pgtable.h |   2 +
>  4 files changed, 209 insertions(+), 16 deletions(-)
> 
> -- 
> 2.7.4
> 
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/2] iommu/arm-smmu-v3: add nr_ats_masters for quickly check

2019-08-15 Thread Will Deacon

On Thu, Aug 15, 2019 at 01:44:39PM +0800, Zhen Lei wrote:
> When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a
> smmu domain does not contain any ats master, the operations of
> arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain()
> are always executed. This will impact performance, especially in
> multi-core and stress scenarios. For my FIO test scenario, about 8%
> performance reduced.
> 
> In fact, we can use a struct member to record how many ats masters that
> the smmu contains. And check that without traverse the list and check all
> masters one by one in the lock protection.
> 
> Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS")
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/arm-smmu-v3.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 29056d9bb12aa01..154334d3310c9b8 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -631,6 +631,7 @@ struct arm_smmu_domain {
>  
>   struct io_pgtable_ops   *pgtbl_ops;
>   boolnon_strict;
> + int nr_ats_masters;
>  
>   enum arm_smmu_domain_stage  stage;
>   union {
> @@ -1531,7 +1532,16 @@ static int arm_smmu_atc_inv_domain(struct 
> arm_smmu_domain *smmu_domain,
>   struct arm_smmu_cmdq_ent cmd;
>   struct arm_smmu_master *master;
>  
> - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
> + /*
> +  * The protectiom of spinlock(_domain->devices_lock) is omitted.
> +  * Because for a given master, its map/unmap operations should only be
> +  * happened after it has been attached and before it has been detached.
> +  * So that, if at least one master need to be atc invalidated, the
> +  * value of smmu_domain->nr_ats_masters can not be zero.
> +  *
> +  * This can alleviate performance loss in multi-core scenarios.
> +  */

I find this reasoning pretty dubious, since I think you're assuming that
an endpoint cannot issue speculative ATS translation requests once its
ATS capability is enabled. That said, I think it also means we should enable
ATS in the STE *before* enabling it in the endpoint -- the current logic
looks like it's the wrong way round to me (including in detach()).

Anyway, these speculative translations could race with a concurrent unmap()
call and end up with the ATC containing translations for unmapped pages,
which I think we should try to avoid.

Did the RCU approach not work out? You could use an rwlock instead as a
temporary bodge if the performance doesn't hurt too much.

Alternatively... maybe we could change the attach flow to do something
like:

enable_ats_in_ste(master);
enable_ats_at_pcie_endpoint(master);
spin_lock(devices_lock)
add_to_device_list(master);
nr_ats_masters++;
spin_unlock(devices_lock);
invalidate_atc(master);

in which case, the concurrent unmapper will be doing something like:

issue_tlbi();
smp_mb();
if (READ_ONCE(nr_ats_masters)) {
...
}

and I *think* that means that either the unmapper will see the
nr_ats_masters update and perform the invalidation, or they'll miss
the update but the attach will invalidate the ATC /after/ the TLBI
in the command queue.

Also, John's idea of converting this stuff over to my command batching
mechanism should help a lot if we can defer this to sync time using the
gather structure. Maybe an rwlock would be alright for that. Dunno.

Will

Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook

2019-08-15 Thread Jordan Crouse

On Thu, Aug 15, 2019 at 01:09:07PM +0100, Robin Murphy wrote:
> On 15/08/2019 11:56, Will Deacon wrote:
> >On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote:
> >>Allocating and initialising a context for a domain is another point
> >>where certain implementations are known to want special behaviour.
> >>Currently the other half of the Cavium workaround comes into play here,
> >>so let's finish the job to get the whole thing right out of the way.
> >>
> >>Signed-off-by: Robin Murphy 
> >>---
> >>  drivers/iommu/arm-smmu-impl.c | 39 +--
> >>  drivers/iommu/arm-smmu.c  | 51 +++
> >>  drivers/iommu/arm-smmu.h  | 42 +++--
> >>  3 files changed, 86 insertions(+), 46 deletions(-)
> >>
> >>diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> >>index c8904da08354..7a657d47b6ec 100644
> >>--- a/drivers/iommu/arm-smmu-impl.c
> >>+++ b/drivers/iommu/arm-smmu-impl.c
> >>@@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = {
> >>  };
> >>+struct cavium_smmu {
> >>+   struct arm_smmu_device smmu;
> >>+   u32 id_base;
> >>+};
> >>+#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu)
> >
> >To be honest with you, I'd just use container_of directly for the two
> >callsites that need it. "to_csmmu" isn't a great name when we're also got
> >the calxeda thing in here.
> 
> Sure, by this point I was mostly just going for completeness in terms of
> sketching out an example for subclassing arm_smmu_device. The Tegra patches
> will now serve as a more complete example anyway, so indeed we can live
> without it here.
> 
> >>  static int cavium_cfg_probe(struct arm_smmu_device *smmu)
> >>  {
> >>static atomic_t context_count = ATOMIC_INIT(0);
> >>@@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device 
> >>*smmu)
> >> * Ensure ASID and VMID allocation is unique across all SMMUs in
> >> * the system.
> >> */
> >>-   smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks,
> >>+   to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks,
> >>   _count);
> >>dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
> >> 27704\n");
> >>return 0;
> >>  }
> >>+int cavium_init_context(struct arm_smmu_domain *smmu_domain)
> >>+{
> >>+   u32 id_base = to_csmmu(smmu_domain->smmu)->id_base;
> >>+
> >>+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
> >>+   smmu_domain->cfg.vmid += id_base;
> >>+   else
> >>+   smmu_domain->cfg.asid += id_base;
> >>+
> >>+   return 0;
> >>+}
> >>+
> >>  const struct arm_smmu_impl cavium_impl = {
> >>.cfg_probe = cavium_cfg_probe,
> >>+   .init_context = cavium_init_context,
> >>  };
> >>+struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu)
> >>+{
> >>+   struct cavium_smmu *csmmu;
> >>+
> >>+   csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL);
> >>+   if (!csmmu)
> >>+   return ERR_PTR(-ENOMEM);
> >>+
> >>+   csmmu->smmu = *smmu;
> >>+   csmmu->smmu.impl = _impl;
> >>+
> >>+   devm_kfree(smmu->dev, smmu);
> >>+
> >>+   return >smmu;
> >>+}
> >>+
> >>  #define ARM_MMU500_ACTLR_CPRE (1 << 1)
> >>@@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> >>arm_smmu_device *smmu)
> >>smmu->impl = _impl;
> >>if (smmu->model == CAVIUM_SMMUV2)
> >>-   smmu->impl = _impl;
> >>+   return cavium_smmu_impl_init(smmu);
> >>if (smmu->model == ARM_MMU500)
> >>smmu->impl = _mmu500_impl;
> >
> >Maybe rework this so we do the calxeda detection first (and return if we
> >match), followed by a switch on smmu->model to make it crystal clear that
> >we match only one?
> 
> As I see it, "match only one" is really only a short-term thing, though, so
> I didn't want to get *too* hung up on it. Ultimately we're going to have
> cases where we need to combine e.g. MMU-500 implementation quirks with
> platform integration quirks - I've been mostly planning on coming back to
> think about that (and potentially rework this whole logic) later, but I
> guess it wouldn't hurt to plan out a bit more structure from the start.

I was going to ask something similar. I'm guessing that the intent is that
we'll eventually we'll have a couple of arm-smmu-.c files
and we'll need some sort of centralized place to set up the smmu->impl pointer.
I had figured that it would be table based or something, but you make a good
point about mixing and matching different workarounds. I don't really have 
a solution, just something I'm pondering while I'm thinking about how to start
merging some of the qcom stuff into this.

Jordan 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: next take at setting up a dma mask by default for platform devices

2019-08-15 Thread Alan Stern

On Thu, 15 Aug 2019, Christoph Hellwig wrote:

> On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote:
> > I've taken the first 2 patches for 5.3-final.  Given that patch 3 needs
> > to be fixed, I'll wait for a respin of these before considering them.
> 
> I have a respun version ready, but I'd really like to hear some
> comments from usb developers about the approach before spamming
> everyone again..

I didn't see any problems with your approach at first glance; it looked 
like a good idea.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()

2019-08-15 Thread Robin Murphy


On 15/08/2019 14:57, Will Deacon wrote:

Hi Robin,

On Thu, Aug 15, 2019 at 01:43:11PM +0100, Robin Murphy wrote:

On 14/08/2019 18:56, Will Deacon wrote:

Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict
mode") added an unconditional call to io_pgtable_tlb_sync() immediately
after the case where we replace a block entry with a table entry during
an unmap() call. This is redundant, since the IOMMU API will call
iommu_tlb_sync() on this path and the patch in question mentions this:

   | To save having to reason about it too much, make sure the invalidation
   | in arm_lpae_split_blk_unmap() just performs its own unconditional sync
   | to minimise the window in which we're technically violating the break-
   | before-make requirement on a live mapping. This might work out redundant
   | with an outer-level sync for strict unmaps, but we'll never be splitting
   | blocks on a DMA fastpath anyway.

However, this sync gets in the way of deferred TLB invalidation for leaf
entries and is at best a questionable, unproven hack. Remove it.


Hey, that's my questionable, unproven hack! :P


I thought you'd like to remain anonymous, but I can credit you if you like?
;)


It's not entirely clear to me how this gets in the way though - AFAICS the
intent of tlb_flush_leaf exactly matches the desired operation here, so
couldn't these just wait to be converted in patch #8?


Good point. I think there are two things:

1. Initially, I didn't plan to have tlb_flush_leaf() at all because
   I didn't think it would be needed. Then I ran into the v7s CONT
   stuff and ended up needing it after all (I think it's the only
   user). So that's an oversight.

2. If we do the tlb_flush_leaf() here, then we could potentially
   put a hole in the ongoing gather structure, but I suppose we
   could do both a tlb_add_page() *and* a tlb_flush_leaf() to get
   around that.

So yes, I probably could move this back if the sync is necessary but...


In principle the concern is that if the caller splits a block with
iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync()
before returning to the caller, and thus there's the potential to run into a
TLB conflict on a subsequent access even if the endpoint was "good" and
didn't make any accesses *during* the unmap call.


... this just feels pretty theoretical to me. The fact of the matter is
that we're unable to do break before make because we can't reliably tolerate
faults. If the hardware actually requires BBM for correctness, then we
should probably explore proper solutions (e.g. quirks, avoiding block
mappings, handling faults) rather than emitting a random sync and hoping
for the best.

Did you add the sync just in case, or was it based on a real crash?


Nope, just a theoretical best-effort thing, which I'm certainly not 
going to lose sleep over either way - I just felt compelled to question 
the rationale which didn't seem to fit. Realistically, this 
partial-unmap case is not well-defined in IOMMU API terms, and other 
drivers don't handle it consistently. I think VFIO explicitly rejects 
partial unmaps, so if we see them at all it's only likely to be from 
GPU/SVA type users who in principle ought to be able to tolerate 
transient faults from BBM anyway.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: next take at setting up a dma mask by default for platform devices

2019-08-15 Thread Greg Kroah-Hartman

On Thu, Aug 15, 2019 at 03:25:31PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote:
> > I've taken the first 2 patches for 5.3-final.  Given that patch 3 needs
> > to be fixed, I'll wait for a respin of these before considering them.
> 
> I have a respun version ready, but I'd really like to hear some
> comments from usb developers about the approach before spamming
> everyone again..

Spam away, we can take it :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-15 Thread Greg Kroah-Hartman

On Thu, Aug 15, 2019 at 03:38:12PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 15, 2019 at 03:03:25PM +0200, Greg Kroah-Hartman wrote:
> > > --- a/include/linux/platform_device.h
> > > +++ b/include/linux/platform_device.h
> > > @@ -24,6 +24,7 @@ struct platform_device {
> > >   int id;
> > >   boolid_auto;
> > >   struct device   dev;
> > > + u64 dma_mask;
> > 
> > Why is the dma_mask in 'struct device' which is part of this structure,
> > not sufficient here?  Shouldn't the "platform" be setting that up
> > correctly already in the "archdata" type callback?
> 
> Becaus the dma_mask in struct device is a pointer that needs to point
> to something, and this is the best space we can allocate for 'something'.
> m68k and powerpc currently do something roughly equivalent at the moment,
> while everyone else just has horrible, horrible hacks.  As mentioned in
> the changelog the intent of this patch is that we treat platform devices
> like any other bus, where the bus allocates the space for the dma_mask.
> The long term plan is to eventually kill that weird pointer indirection
> that doesn't help anyone, but for that we need to sort out the basics
> first.

Ah, missed that, sorry.  Ok, no objection from me.  Might as well respin
this series and I can queue it up after 5.3-rc5 is out (which will have
your first 2 patches in it.)

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()

2019-08-15 Thread Will Deacon

Hi Robin,

On Thu, Aug 15, 2019 at 01:43:11PM +0100, Robin Murphy wrote:
> On 14/08/2019 18:56, Will Deacon wrote:
> > Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict
> > mode") added an unconditional call to io_pgtable_tlb_sync() immediately
> > after the case where we replace a block entry with a table entry during
> > an unmap() call. This is redundant, since the IOMMU API will call
> > iommu_tlb_sync() on this path and the patch in question mentions this:
> > 
> >   | To save having to reason about it too much, make sure the invalidation
> >   | in arm_lpae_split_blk_unmap() just performs its own unconditional sync
> >   | to minimise the window in which we're technically violating the break-
> >   | before-make requirement on a live mapping. This might work out redundant
> >   | with an outer-level sync for strict unmaps, but we'll never be splitting
> >   | blocks on a DMA fastpath anyway.
> > 
> > However, this sync gets in the way of deferred TLB invalidation for leaf
> > entries and is at best a questionable, unproven hack. Remove it.
> 
> Hey, that's my questionable, unproven hack! :P

I thought you'd like to remain anonymous, but I can credit you if you like?
;)

> It's not entirely clear to me how this gets in the way though - AFAICS the
> intent of tlb_flush_leaf exactly matches the desired operation here, so
> couldn't these just wait to be converted in patch #8?

Good point. I think there are two things:

1. Initially, I didn't plan to have tlb_flush_leaf() at all because
   I didn't think it would be needed. Then I ran into the v7s CONT
   stuff and ended up needing it after all (I think it's the only
   user). So that's an oversight.

2. If we do the tlb_flush_leaf() here, then we could potentially
   put a hole in the ongoing gather structure, but I suppose we
   could do both a tlb_add_page() *and* a tlb_flush_leaf() to get
   around that.

So yes, I probably could move this back if the sync is necessary but...

> In principle the concern is that if the caller splits a block with
> iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync()
> before returning to the caller, and thus there's the potential to run into a
> TLB conflict on a subsequent access even if the endpoint was "good" and
> didn't make any accesses *during* the unmap call.

... this just feels pretty theoretical to me. The fact of the matter is
that we're unable to do break before make because we can't reliably tolerate
faults. If the hardware actually requires BBM for correctness, then we
should probably explore proper solutions (e.g. quirks, avoiding block
mappings, handling faults) rather than emitting a random sync and hoping
for the best.

Did you add the sync just in case, or was it based on a real crash?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation

2019-08-15 Thread Will Deacon

On Thu, Aug 15, 2019 at 12:19:58PM +0100, John Garry wrote:
> On 14/08/2019 18:56, Will Deacon wrote:
> > If you'd like to play with the patches, then I've also pushed them here:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap
> > 
> > but they should behave as a no-op on their own.
> 
> As anticipated, my storage testing scenarios roughly give parity throughput
> and CPU loading before and after this series.
> 
> Patches to convert the
> > Arm SMMUv3 driver to the new API are here:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
> 
> I quickly tested this again and now I see a performance lift:
> 
>   before (5.3-rc1)after
> D05 8x SAS disks  907K IOPS   970K IOPS
> D05 1x NVMe   450K IOPS   466K IOPS
> D06 1x NVMe   467K IOPS   466K IOPS
> 
> The CPU loading seems to track throughput, so nothing much to say there.
> 
> Note: From 5.2 testing, I was seeing >900K IOPS from that NVMe disk for
> !IOMMU.

Cheers, John. For interest, how do things look if you pass iommu.strict=0?
That might give some indication about how much the invalidation is still
hurting us.

> BTW, what were your thoughts on changing
> arm_smmu_atc_inv_domain()->arm_smmu_atc_inv_master() to batching? It seems
> suitable, but looks untouched. Were you waiting for a resolution to the
> performance issue which Leizhen reported?

In principle, I'm supportive of such a change, but I'm not currently able
to test any ATS stuff so somebody else would need to write the patch.
Jean-Philippe is on holiday at the moment, but I'd be happy to review
something from you if you send it out.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-15 Thread Robin Murphy


On 15/08/2019 14:35, Christoph Hellwig wrote:

On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote:

On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote:

Hello

Since lot of release (at least since 4.19), I hit the following error message:
DMA-API: cacheline tracking ENOMEM, dma-debug disabled

After hitting that, I try to check who is creating so many DMA mapping and see:
cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
   6 ahci
 257 e1000e
   6 ehci-pci
5891 nouveau
  24 uhci_hcd

Does nouveau having this high number of DMA mapping is normal ?


Yeah seems perfectly fine for a gpu.


That is a lot and apparently overwhelm the dma-debug tracking.  Robin
rewrote this code in Linux 4.21 to work a little better, so I'm curious
why this might have changes in 4.19, as dma-debug did not change at
all there.


FWIW, the cacheline tracking entries are a separate thing from the 
dma-debug entries that I rejigged - judging by those numbers there 
should still be plenty of free dma-debug entries, but for some reason it 
has failed to extend the radix tree :/


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-15 Thread Christoph Hellwig

On Thu, Aug 15, 2019 at 03:03:25PM +0200, Greg Kroah-Hartman wrote:
> > --- a/include/linux/platform_device.h
> > +++ b/include/linux/platform_device.h
> > @@ -24,6 +24,7 @@ struct platform_device {
> > int id;
> > boolid_auto;
> > struct device   dev;
> > +   u64 dma_mask;
> 
> Why is the dma_mask in 'struct device' which is part of this structure,
> not sufficient here?  Shouldn't the "platform" be setting that up
> correctly already in the "archdata" type callback?

Becaus the dma_mask in struct device is a pointer that needs to point
to something, and this is the best space we can allocate for 'something'.
m68k and powerpc currently do something roughly equivalent at the moment,
while everyone else just has horrible, horrible hacks.  As mentioned in
the changelog the intent of this patch is that we treat platform devices
like any other bus, where the bus allocates the space for the dma_mask.
The long term plan is to eventually kill that weird pointer indirection
that doesn't help anyone, but for that we need to sort out the basics
first.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-15 Thread Christoph Hellwig

On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote:
> > Hello
> > 
> > Since lot of release (at least since 4.19), I hit the following error 
> > message:
> > DMA-API: cacheline tracking ENOMEM, dma-debug disabled
> > 
> > After hitting that, I try to check who is creating so many DMA mapping and 
> > see:
> > cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
> >   6 ahci
> > 257 e1000e
> >   6 ehci-pci
> >5891 nouveau
> >  24 uhci_hcd
> > 
> > Does nouveau having this high number of DMA mapping is normal ?
> 
> Yeah seems perfectly fine for a gpu.

That is a lot and apparently overwhelm the dma-debug tracking.  Robin
rewrote this code in Linux 4.21 to work a little better, so I'm curious
why this might have changes in 4.19, as dma-debug did not change at
all there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-15 Thread Christoph Hellwig

On Wed, Aug 14, 2019 at 04:49:13PM +0100, Robin Murphy wrote:
>> because we have to support platform_device structures that are
>> statically allocated.
>
> This would be a good point to also get rid of the long-standing bodge in 
> platform_device_register_full().

platform_device_register_full looks odd to start with, especially
as the coumentation is rather lacking..

>>   +static void setup_pdev_archdata(struct platform_device *pdev)
>
> Bikeshed: painting the generic DMA API properties as "archdata" feels a bit 
> off-target :/
>
>> +{
>> +if (!pdev->dev.coherent_dma_mask)
>> +pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
>> +if (!pdev->dma_mask)
>> +pdev->dma_mask = DMA_BIT_MASK(32);
>> +if (!pdev->dev.dma_mask)
>> +pdev->dev.dma_mask = >dma_mask;
>> +arch_setup_pdev_archdata(pdev);
>
> AFAICS m68k's implementation of that arch hook becomes entirely redundant 
> after this change, so may as well go. That would just leave powerpc's 
> actual archdata, which at a glance looks like it could probably be cleaned 
> up with not *too* much trouble.

Actually I think we can just kill both off.  At the point archdata
is indeed entirely misnamed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: next take at setting up a dma mask by default for platform devices

2019-08-15 Thread Christoph Hellwig

On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote:
> I've taken the first 2 patches for 5.3-final.  Given that patch 3 needs
> to be fixed, I'll wait for a respin of these before considering them.

I have a respun version ready, but I'd really like to hear some
comments from usb developers about the approach before spamming
everyone again..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: next take at setting up a dma mask by default for platform devices

2019-08-15 Thread Greg Kroah-Hartman

On Sun, Aug 11, 2019 at 10:05:14AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this is another attempt to make sure the dma_mask pointer is always
> initialized for platform devices.  Not doing so lead to lots of
> boilerplate code, and makes platform devices different from all our
> major busses like PCI where we always set up a dma_mask.  In the long
> run this should also help to eventually make dma_mask a scalar value
> instead of a pointer and remove even more cruft.
> 
> The bigger blocker for this last time was the fact that the usb
> subsystem uses the presence or lack of a dma_mask to check if the core
> should do dma mapping for the driver, which is highly unusual.  So we
> fix this first.  Note that this has some overlap with the pending
> desire to use the proper dma_mmap_coherent helper for mapping usb
> buffers.  The first two patches from this series should probably
> go into 5.3 and then uses as the basis for the decision to use
> dma_mmap_coherent.

I've taken the first 2 patches for 5.3-final.  Given that patch 3 needs
to be fixed, I'll wait for a respin of these before considering them.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-15 Thread Greg Kroah-Hartman

On Sun, Aug 11, 2019 at 10:05:20AM +0200, Christoph Hellwig wrote:
> We still treat devices without a DMA mask as defaulting to 32-bits for
> both mask, but a few releases ago we've started warning about such
> cases, as they require special cases to work around this sloppyness.
> Add a dma_mask field to struct platform_object so that we can initialize
> the dma_mask pointer in struct device and initialize both masks to
> 32-bits by default.  Architectures can still override this in
> arch_setup_pdev_archdata if needed.
> 
> Note that the code looks a little odd with the various conditionals
> because we have to support platform_device structures that are
> statically allocated.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/base/platform.c | 15 +--
>  include/linux/platform_device.h |  1 +
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index ec974ba9c0c4..b216fcb0a8af 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -264,6 +264,17 @@ struct platform_object {
>   char name[];
>  };
>  
> +static void setup_pdev_archdata(struct platform_device *pdev)
> +{
> + if (!pdev->dev.coherent_dma_mask)
> + pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
> + if (!pdev->dma_mask)
> + pdev->dma_mask = DMA_BIT_MASK(32);
> + if (!pdev->dev.dma_mask)
> + pdev->dev.dma_mask = >dma_mask;
> + arch_setup_pdev_archdata(pdev);
> +};
> +
>  /**
>   * platform_device_put - destroy a platform device
>   * @pdev: platform device to free
> @@ -310,7 +321,7 @@ struct platform_device *platform_device_alloc(const char 
> *name, int id)
>   pa->pdev.id = id;
>   device_initialize(>pdev.dev);
>   pa->pdev.dev.release = platform_device_release;
> - arch_setup_pdev_archdata(>pdev);
> + setup_pdev_archdata(>pdev);
>   }
>  
>   return pa ? >pdev : NULL;
> @@ -512,7 +523,7 @@ EXPORT_SYMBOL_GPL(platform_device_del);
>  int platform_device_register(struct platform_device *pdev)
>  {
>   device_initialize(>dev);
> - arch_setup_pdev_archdata(pdev);
> + setup_pdev_archdata(pdev);
>   return platform_device_add(pdev);
>  }
>  EXPORT_SYMBOL_GPL(platform_device_register);
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index 9bc36b589827..a2abde2aef25 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -24,6 +24,7 @@ struct platform_device {
>   int id;
>   boolid_auto;
>   struct device   dev;
> + u64 dma_mask;

Why is the dma_mask in 'struct device' which is part of this structure,
not sufficient here?  Shouldn't the "platform" be setting that up
correctly already in the "archdata" type callback?

confused,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()

2019-08-15 Thread Robin Murphy


On 14/08/2019 18:56, Will Deacon wrote:

Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict
mode") added an unconditional call to io_pgtable_tlb_sync() immediately
after the case where we replace a block entry with a table entry during
an unmap() call. This is redundant, since the IOMMU API will call
iommu_tlb_sync() on this path and the patch in question mentions this:

  | To save having to reason about it too much, make sure the invalidation
  | in arm_lpae_split_blk_unmap() just performs its own unconditional sync
  | to minimise the window in which we're technically violating the break-
  | before-make requirement on a live mapping. This might work out redundant
  | with an outer-level sync for strict unmaps, but we'll never be splitting
  | blocks on a DMA fastpath anyway.

However, this sync gets in the way of deferred TLB invalidation for leaf
entries and is at best a questionable, unproven hack. Remove it.


Hey, that's my questionable, unproven hack! :P

It's not entirely clear to me how this gets in the way though - AFAICS 
the intent of tlb_flush_leaf exactly matches the desired operation here, 
so couldn't these just wait to be converted in patch #8?


In principle the concern is that if the caller splits a block with 
iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync() 
before returning to the caller, and thus there's the potential to run 
into a TLB conflict on a subsequent access even if the endpoint was 
"good" and didn't make any accesses *during* the unmap call.


Robin.


Signed-off-by: Will Deacon 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 1 -
  drivers/iommu/io-pgtable-arm.c | 1 -
  2 files changed, 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 0fc8dfab2abf..a62733c6a632 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -587,7 +587,6 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
}
  
  	io_pgtable_tlb_add_flush(>iop, iova, size, size, true);

-   io_pgtable_tlb_sync(>iop);
return size;
  }
  
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c

index 161a7d56264d..0d6633921c1e 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -583,7 +583,6 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
tablep = iopte_deref(pte, data);
} else if (unmap_idx >= 0) {
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
-   io_pgtable_tlb_sync(>iop);
return size;
}
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/2] iommu/iova: enhance the rcache optimization

2019-08-15 Thread Zhen Lei

v1 --> v2
1. I did not chagne the patches but added this cover-letter.
2. Add a batch of reviewers base on
   9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation")
3. I described the problem I met in patch 2, but I hope below brief description
   can help people to quickly understand.
   Suppose there are six rcache sizes, each size can maximum hold 1 IOVAs.
   
   |  4K   |  8K  | 16K  |  32K | 64K  | 128K |
   
   | 1 | 9000 | 8500 | 8600 | 9200 | 7000 |
   
   As the above map displayed, the whole rcache buffered too many IOVAs. Now, 
the
   worst case can be coming, suppose we need 2 4K IOVAs at one time. That 
means
   1 IOVAs can be allocated from rcache, but another 1 IOVAs should be 
   allocated from RB tree base on alloc_iova() function. But the RB tree 
currently
   have at least (9000 + 8500 + 8600 + 9200 + 7000) = 42300 nodes. The average 
speed
   of RB tree traverse will be very slow. For my test scenario, the 4K size 
IOVAs are
   frequently used, but others are not. So similarly, when the 2 4K IOVAs 
are
   continuous freed, the first 1 IOVAs can be quickly buffered, but the 
other
   1 IOVAs can not.

Zhen Lei (2):
  iommu/iova: introduce iova_magazine_compact_pfns()
  iommu/iova: enhance the rcache optimization

 drivers/iommu/iova.c | 100 +++
 include/linux/iova.h |   1 +
 2 files changed, 95 insertions(+), 6 deletions(-)

-- 
1.8.3

[PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns()

2019-08-15 Thread Zhen Lei

iova_magazine_free_pfns() can only free the whole magazine buffer, add
iova_magazine_compact_pfns() to support free part of it.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 3e1a8a6755723a9..4b7a9efa0ef40af 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -795,18 +795,19 @@ static void iova_magazine_free(struct iova_magazine *mag)
kfree(mag);
 }
 
-static void
-iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
+static void iova_magazine_compact_pfns(struct iova_magazine *mag,
+  struct iova_domain *iovad,
+  unsigned long newsize)
 {
unsigned long flags;
int i;
 
-   if (!mag)
+   if (!mag || mag->size <= newsize)
return;
 
spin_lock_irqsave(>iova_rbtree_lock, flags);
 
-   for (i = 0 ; i < mag->size; ++i) {
+   for (i = newsize; i < mag->size; ++i) {
struct iova *iova = private_find_iova(iovad, mag->pfns[i]);
 
BUG_ON(!iova);
@@ -815,7 +816,13 @@ static void iova_magazine_free(struct iova_magazine *mag)
 
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
 
-   mag->size = 0;
+   mag->size = newsize;
+}
+
+static void
+iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)
+{
+   iova_magazine_compact_pfns(mag, iovad, 0);
 }
 
 static bool iova_magazine_full(struct iova_magazine *mag)
-- 
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/2] iommu/iova: enhance the rcache optimization

2019-08-15 Thread Zhen Lei

The rcache method caches the freed IOVAs, to improve the performance of
IOVAs allocation and release. This is usually okay, but it maybe declined
in some special scenarios.

For example, currently the IOVA_RANGE_CACHE_MAX_SIZE is 6, and for ecch
size, contains: MAX_GLOBAL_MAGS=32 shareable depot magazines, each vcpu
has two magazines(cpu_rcaches->loaded and cpu_rcaches->prev). In an
extreme case, it can max cache ((num_possible_cpus() * 2 + 32) * 128 * 6)
IOVAs, it's very large. The worst case happens when the depot magazines
of a certain size(usually 4K) is full, further free_iova_fast() invoking
will cause iova_magazine_free_pfns() to be called. As the above saied,
too many IOVAs buffered, so that the RB tree is very large, the
iova_magazine_free_pfns()-->private_find_iova(), and the missed iova
allocation: alloc_iova()-->__alloc_and_insert_iova_range() will spend too
much time. And that the current rcache method have no cleanup operation,
the buffered IOVAs will only increase but not decrease.

For my FIO stress test scenario, the performance drop about 35%, and can
not recover even if re-execute the test cases.
Jobs: 21 (f=21): [2.3% done] [8887M/0K /s] [2170K/0 iops]
Jobs: 21 (f=21): [2.3% done] [8902M/0K /s] [2173K/0 iops]
Jobs: 21 (f=21): [2.3% done] [6010M/0K /s] [1467K/0 iops]
Jobs: 21 (f=21): [2.3% done] [5397M/0K /s] [1318K/0 iops]

So that, I add the statistic about the rcache, when the above case
happened, release the IOVAs which are not hit.
Jobs: 21 (f=21): [100.0% done] [10324M/0K /s] [2520K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10290M/0K /s] [2512K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10035M/0K /s] [2450K/0 iops]
Jobs: 21 (f=21): [100.0% done] [10214M/0K /s] [2494K/0 iops]

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 83 +++-
 include/linux/iova.h |  1 +
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4b7a9efa0ef40af..f3828f4add25375 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -23,6 +23,8 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 unsigned long limit_pfn);
 static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
+static void iova_compact_rcache(struct iova_domain *iovad,
+   struct iova_rcache *curr_rcache);
 static void fq_destroy_all_entries(struct iova_domain *iovad);
 static void fq_flush_timeout(struct timer_list *t);
 
@@ -781,6 +783,8 @@ struct iova_magazine {
 
 struct iova_cpu_rcache {
spinlock_t lock;
+   bool prev_mag_hit;
+   unsigned long nr_hit;
struct iova_magazine *loaded;
struct iova_magazine *prev;
 };
@@ -934,6 +938,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (mag_to_free) {
iova_magazine_free_pfns(mag_to_free, iovad);
iova_magazine_free(mag_to_free);
+   iova_compact_rcache(iovad, rcache);
}
 
return can_insert;
@@ -971,18 +976,22 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
} else if (!iova_magazine_empty(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
has_pfn = true;
+   cpu_rcache->prev_mag_hit = true;
} else {
spin_lock(>lock);
if (rcache->depot_size > 0) {
iova_magazine_free(cpu_rcache->loaded);
cpu_rcache->loaded = 
rcache->depot[--rcache->depot_size];
has_pfn = true;
+   rcache->depot_mags_hit = true;
}
spin_unlock(>lock);
}
 
-   if (has_pfn)
+   if (has_pfn) {
+   cpu_rcache->nr_hit++;
iova_pfn = iova_magazine_pop(cpu_rcache->loaded, limit_pfn);
+   }
 
spin_unlock_irqrestore(_rcache->lock, flags);
 
@@ -1049,5 +1058,77 @@ void free_cpu_cached_iovas(unsigned int cpu, struct 
iova_domain *iovad)
}
 }
 
+static void iova_compact_percpu_mags(struct iova_domain *iovad,
+struct iova_rcache *rcache)
+{
+   unsigned int cpu;
+
+   for_each_possible_cpu(cpu) {
+   unsigned long flags;
+   struct iova_cpu_rcache *cpu_rcache;
+
+   cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
+
+   spin_lock_irqsave(_rcache->lock, flags);
+   if (!cpu_rcache->prev_mag_hit)
+   iova_magazine_free_pfns(cpu_rcache->prev, iovad);
+
+   if (cpu_rcache->nr_hit < IOVA_MAG_SIZE)
+   iova_magazine_compact_pfns(cpu_rcache->loaded,
+  iovad,
+  cpu_rcache->nr_hit);
+
+   cpu_rcache->nr_hit = 0;
+

Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook

2019-08-15 Thread Robin Murphy


On 15/08/2019 11:56, Will Deacon wrote:

On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote:

Allocating and initialising a context for a domain is another point
where certain implementations are known to want special behaviour.
Currently the other half of the Cavium workaround comes into play here,
so let's finish the job to get the whole thing right out of the way.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm-smmu-impl.c | 39 +--
  drivers/iommu/arm-smmu.c  | 51 +++
  drivers/iommu/arm-smmu.h  | 42 +++--
  3 files changed, 86 insertions(+), 46 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c8904da08354..7a657d47b6ec 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = {
  };
  
  
+struct cavium_smmu {

+   struct arm_smmu_device smmu;
+   u32 id_base;
+};
+#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu)


To be honest with you, I'd just use container_of directly for the two
callsites that need it. "to_csmmu" isn't a great name when we're also got
the calxeda thing in here.


Sure, by this point I was mostly just going for completeness in terms of 
sketching out an example for subclassing arm_smmu_device. The Tegra 
patches will now serve as a more complete example anyway, so indeed we 
can live without it here.



  static int cavium_cfg_probe(struct arm_smmu_device *smmu)
  {
static atomic_t context_count = ATOMIC_INIT(0);
@@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 * Ensure ASID and VMID allocation is unique across all SMMUs in
 * the system.
 */
-   smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks,
+   to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks,
   _count);
dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
27704\n");
  
  	return 0;

  }
  
+int cavium_init_context(struct arm_smmu_domain *smmu_domain)

+{
+   u32 id_base = to_csmmu(smmu_domain->smmu)->id_base;
+
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
+   smmu_domain->cfg.vmid += id_base;
+   else
+   smmu_domain->cfg.asid += id_base;
+
+   return 0;
+}
+
  const struct arm_smmu_impl cavium_impl = {
.cfg_probe = cavium_cfg_probe,
+   .init_context = cavium_init_context,
  };
  
+struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu)

+{
+   struct cavium_smmu *csmmu;
+
+   csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL);
+   if (!csmmu)
+   return ERR_PTR(-ENOMEM);
+
+   csmmu->smmu = *smmu;
+   csmmu->smmu.impl = _impl;
+
+   devm_kfree(smmu->dev, smmu);
+
+   return >smmu;
+}
+
  
  #define ARM_MMU500_ACTLR_CPRE		(1 << 1)
  
@@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)

smmu->impl = _impl;
  
  	if (smmu->model == CAVIUM_SMMUV2)

-   smmu->impl = _impl;
+   return cavium_smmu_impl_init(smmu);
  
  	if (smmu->model == ARM_MMU500)

smmu->impl = _mmu500_impl;


Maybe rework this so we do the calxeda detection first (and return if we
match), followed by a switch on smmu->model to make it crystal clear that
we match only one?


As I see it, "match only one" is really only a short-term thing, though, 
so I didn't want to get *too* hung up on it. Ultimately we're going to 
have cases where we need to combine e.g. MMU-500 implementation quirks 
with platform integration quirks - I've been mostly planning on coming 
back to think about that (and potentially rework this whole logic) 
later, but I guess it wouldn't hurt to plan out a bit more structure 
from the start.


I'll have a hack on that (and all the other comments) today and 
hopefully have a v2 by tomorrow.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Will Deacon

Ok, I think speaking to Robin helped me a bit with this...

On Thu, Aug 15, 2019 at 06:18:38PM +0800, Yong Wu wrote:
> On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote:
> > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote:
> > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > > > > 
> > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > > > > is remapped to high address from 0x1__ to 0x1__, the
> > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > > > > for all PTEs which means to enable bit32 of physical address. Here is
> > > > > the detailed remap relationship in the "4GB mode":
> > > > > CPU PA ->HW PA
> > > > > 0x4000_  0x1_4000_ (Add bit32)
> > > > > 0x8000_  0x1_8000_ ...
> > > > > 0xc000_  0x1_c000_ ...
> > > > > 0x1__0x1__ (No change)

[...]

> > > > The way I would like this quirk to work is that the io-pgtable code
> > > > basically sets bit 9 in the pte when bit 32 is set in the physical 
> > > > address,
> > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It
> > > > would then do the opposite when converting a pte to a physical address.
> > > > 
> > > > That way, your driver can call the page table code directly with the 
> > > > high
> > > > addresses and we don't have to do any manual offsetting or range 
> > > > checking
> > > > in the page table code.
> > > 
> > > In this case, the mt8183 can work successfully while the "4gb
> > > mode"(mt8173/mt2712) can not.
> > > 
> > > In the "4gb mode", As the remap relationship above, we should always add
> > > bit32 in pte as we did in [2]. and need add bit32 in the
> > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special
> > > flow:
> > > a. Always add bit32 in paddr_to_iopte.
> > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.
> > 
> > I think this is probably at the heart of my misunderstanding. What is so
> > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM
> > or something else?
> 
> SRAM and HW register that IOMMU can not access.

Ok, so redrawing your table from above, I think we can say something like:


CPU Physical address


0G  1G  2G  3G  4G  5G
|---A---|---B---|---C---|---D---|---E---|
+--I/O--+Memory-+


IOMMU output physical address
=

4G  5G  6G  7G  8G
|---E---|---B---|---C---|---D---|
+Memory-+


Do you agree? If so, what happens to region 'A' (the I/O region) in the
IOMMU output physical address space. Is it accessible?

Anyway, I think it's the job of the driver to convert between the two
address spaces, so that:

  - On ->map(), bit 32 of the CPU physical address is set before calling
into the iopgtable code

  - The result from ->iova_to_phys() should be the result from the
iopgtable code, but with the top bit cleared for addresses over
5G.

This assumes that:

  1. We're ok setting bit 9 in the ptes mapping region 'E'.
  2. The IOMMU page-table walker uses CPU physical addresses

Are those true?

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/15] iommu/arm-smmu: Abstract context bank accesses

2019-08-15 Thread Robin Murphy


On 15/08/2019 11:56, Will Deacon wrote:

On Fri, Aug 09, 2019 at 06:07:45PM +0100, Robin Murphy wrote:

Context bank accesses are fiddly enough to deserve a number of extra
helpers to keep the callsites looking sane, even though there are only
one or two of each.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm-smmu.c | 137 ---
  1 file changed, 72 insertions(+), 65 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 72505647b77d..abdcc3f52e2e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -82,9 +82,6 @@
((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)\
? 0x400 : 0))
  
-/* Translation context bank */

-#define ARM_SMMU_CB(smmu, n)   ((smmu)->base + (((smmu)->cb_base + (n)) << 
(smmu)->pgshift))
-
  #define MSI_IOVA_BASE 0x800
  #define MSI_IOVA_LENGTH   0x10
  
@@ -265,9 +262,29 @@ static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset,

writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
  }
  
+static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset)

+{
+   return readq_relaxed(arm_smmu_page(smmu, page) + offset);
+}
+
+static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset,
+   u64 val)
+{
+   writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
+}
+
  #define arm_smmu_read_gr1(s, r)   arm_smmu_readl((s), 1, (r))
  #define arm_smmu_write_gr1(s, r, v)   arm_smmu_writel((s), 1, (r), (v))
  
+#define arm_smmu_read_cb(s, n, r)\

+   arm_smmu_readl((s), (s)->cb_base + (n), (r))
+#define arm_smmu_write_cb(s, n, r, v)  \
+   arm_smmu_writel((s), (s)->cb_base + (n), (r), (v))
+#define arm_smmu_read_cb_q(s, n, r)\
+   arm_smmu_readq((s), (s)->cb_base + (n), (r))
+#define arm_smmu_write_cb_q(s, n, r, v)\
+   arm_smmu_writeq((s), (s)->cb_base + (n), (r), (v))


'r' for 'offset'? (maybe just rename offset => register in the helpers).


I think this all represents the mangled remains of an underlying notion 
of 'register offset' ;)



  struct arm_smmu_option_prop {
u32 opt;
const char *prop;
@@ -423,15 +440,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, 
int idx)
  }
  
  /* Wait for any pending TLB invalidations to complete */

-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
-   void __iomem *sync, void __iomem *status)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
+   int sync, int status)
  {
unsigned int spin_cnt, delay;
+   u32 reg;
  
-	writel_relaxed(QCOM_DUMMY_VAL, sync);

+   arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
-   if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
+   reg = arm_smmu_readl(smmu, page, status);
+   if (!(reg & sTLBGSTATUS_GSACTIVE))
return;
cpu_relax();
}
@@ -443,12 +462,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
*smmu,
  
  static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)

  {
-   void __iomem *base = ARM_SMMU_GR0(smmu);
unsigned long flags;
  
  	spin_lock_irqsave(>global_sync_lock, flags);

-   __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
-   base + ARM_SMMU_GR0_sTLBGSTATUS);
+   __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC,


Can we have a #define for page zero, please?


Again, now I recall pondering the exact same thought, so clearly I don't 
have any grounds to object. I guess it's worth reworking the previous 
ARM_SMMU_{GR0,GR1,CB()} macros into the page number scheme rather than 
just killing them off - let me give that a try.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/15] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()

2019-08-15 Thread Robin Murphy


On 15/08/2019 11:56, Will Deacon wrote:

On Fri, Aug 09, 2019 at 06:07:42PM +0100, Robin Murphy wrote:

Since we now use separate iommu_gather_ops for stage 1 and stage 2
contexts, we may as well divide up the monolithic callback into its
respective stage 1 and stage 2 parts.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm-smmu.c | 66 ++--
  1 file changed, 37 insertions(+), 29 deletions(-)


This will conflict with my iommu API batching stuff, but I can sort that
out if/when it gets queued by Joerg.


-   if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
-   iova &= ~12UL;
-   iova |= cfg->asid;
-   do {
-   writel_relaxed(iova, reg);
-   iova += granule;
-   } while (size -= granule);
-   } else {
-   iova >>= 12;
-   iova |= (u64)cfg->asid << 48;
-   do {
-   writeq_relaxed(iova, reg);
-   iova += granule >> 12;
-   } while (size -= granule);
-   }
-   } else {
-   reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
- ARM_SMMU_CB_S2_TLBIIPAS2;
-   iova >>= 12;
+   if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
+   iova &= ~12UL;


Oh baby. You should move code around more often, so I'm forced to take a
second look!


Oh dear lord... The worst part is that I do now remember seeing this and 
having a similar moment of disbelief, but apparently I was easily 
distracted with rebasing and forgot about it too quickly :(



Can you cook a fix for this that we can route separately, please? I see
it also made its way into qcom_iommu.c...


Sure, I'll split it out to the front of the series for the moment.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation

2019-08-15 Thread John Garry


On 14/08/2019 18:56, Will Deacon wrote:

Hi everybody,

These are the core IOMMU changes that I have posted previously as part
of my ongoing effort to reduce the lock contention of the SMMUv3 command
queue. I thought it would be better to split this out as a separate
series, since I think it's ready to go and all the driver conversions
mean that it's quite a pain for me to maintain out of tree!

The idea of the patch series is to allow TLB invalidation to be batched
up into a new 'struct iommu_iotlb_gather' structure, which tracks the
properties of the virtual address range being invalidated so that it
can be deferred until the driver's ->iotlb_sync() function is called.
This allows for more efficient invalidation on hardware that can submit
multiple invalidations in one go.

The previous series was included in:

  https://lkml.kernel.org/r/20190711171927.28803-1-w...@kernel.org

The only real change since then is incorporating the newly merged
virtio-iommu driver.

If you'd like to play with the patches, then I've also pushed them here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap

but they should behave as a no-op on their own.


Hi Will,

As anticipated, my storage testing scenarios roughly give parity 
throughput and CPU loading before and after this series.


Patches to convert the

Arm SMMUv3 driver to the new API are here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq


I quickly tested this again and now I see a performance lift:

before (5.3-rc1)after
D05 8x SAS disks907K IOPS   970K IOPS
D05 1x NVMe 450K IOPS   466K IOPS
D06 1x NVMe 467K IOPS   466K IOPS

The CPU loading seems to track throughput, so nothing much to say there.

Note: From 5.2 testing, I was seeing >900K IOPS from that NVMe disk for 
!IOMMU.


BTW, what were your thoughts on changing 
arm_smmu_atc_inv_domain()->arm_smmu_atc_inv_master() to batching? It 
seems suitable, but looks untouched. Were you waiting for a resolution 
to the performance issue which Leizhen reported?


Thanks,
John



Cheers,

Will

--->8

Cc: Jean-Philippe Brucker 
Cc: Robin Murphy 
Cc: Jayachandran Chandrasekharan Nair 
Cc: Jan Glauber 
Cc: Jon Masters 
Cc: Eric Auger 
Cc: Zhen Lei 
Cc: Jonathan Cameron 
Cc: Vijay Kilary 
Cc: Joerg Roedel 
Cc: John Garry 
Cc: Alex Williamson 
Cc: Marek Szyprowski 
Cc: David Woodhouse 

Will Deacon (13):
  iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
  iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
  iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
  iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
  iommu: Introduce iommu_iotlb_gather_add_page()
  iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
  iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
  iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
drivers
  iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
  iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
  iommu/io-pgtable: Remove unused ->tlb_sync() callback
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()

 drivers/gpu/drm/panfrost/panfrost_mmu.c |  24 +---
 drivers/iommu/amd_iommu.c   |  11 ++--
 drivers/iommu/arm-smmu-v3.c |  52 +++-
 drivers/iommu/arm-smmu.c| 103 
 drivers/iommu/dma-iommu.c   |   9 ++-
 drivers/iommu/exynos-iommu.c|   3 +-
 drivers/iommu/intel-iommu.c |   3 +-
 drivers/iommu/io-pgtable-arm-v7s.c  |  57 +-
 drivers/iommu/io-pgtable-arm.c  |  48 ---
 drivers/iommu/iommu.c   |  24 
 drivers/iommu/ipmmu-vmsa.c  |  28 +
 drivers/iommu/msm_iommu.c   |  42 +
 drivers/iommu/mtk_iommu.c   |  45 +++---
 drivers/iommu/mtk_iommu_v1.c|   3 +-
 drivers/iommu/omap-iommu.c  |   2 +-
 drivers/iommu/qcom_iommu.c  |  44 +++---
 drivers/iommu/rockchip-iommu.c  |   2 +-
 drivers/iommu/s390-iommu.c  |   3 +-
 drivers/iommu/tegra-gart.c  |  12 +++-
 drivers/iommu/tegra-smmu.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   5 +-
 drivers/vfio/vfio_iommu_type1.c |  27 +
 include/linux/io-pgtable.h  |  57 --
 include/linux/iommu.h   |  92 +---
 24 files changed, 483 insertions(+), 215 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH 04/15] iommu/arm-smmu: Rework cb_base handling

2019-08-15 Thread Robin Murphy


On 14/08/2019 19:05, Will Deacon wrote:

On Fri, Aug 09, 2019 at 06:07:41PM +0100, Robin Murphy wrote:

To keep register-access quirks manageable, we want to structure things
to avoid needing too many individual overrides. It seems fairly clean to
have a single interface which handles both global and context registers
in terms of the architectural pages, so the first preparatory step is to
rework cb_base into a page number rather than an absolute address.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/arm-smmu.c | 22 --
  1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d9a93e5f422f..463bc8d98adb 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -95,7 +95,7 @@
  #endif
  
  /* Translation context bank */

-#define ARM_SMMU_CB(smmu, n)   ((smmu)->cb_base + ((n) << (smmu)->pgshift))
+#define ARM_SMMU_CB(smmu, n)   ((smmu)->base + (((smmu)->cb_base + (n)) << 
(smmu)->pgshift))
  
  #define MSI_IOVA_BASE			0x800

  #define MSI_IOVA_LENGTH   0x10
@@ -168,8 +168,8 @@ struct arm_smmu_device {
struct device   *dev;
  
  	void __iomem			*base;

-   void __iomem*cb_base;
-   unsigned long   pgshift;
+   unsigned intcb_base;


I think this is now a misnomer. Would you be able to rename it cb_pfn or
something, please?


Good point; in the architectural terms (section 8.1 of the spec), 
SMMU_CB_BASE is strictly a byte offset from SMMU_BASE, and the quantity 
we now have here is actually NUMPAGE. I've renamed it as such and 
tweaked the comments to be a bit more useful too.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-08-15 Thread Tom Murphy

Done, I just sent it there. I don't have any AMD hardware to test on
while I'm traveling. However the rebase was very straightforward and
the code was tested a month ago on the old linux-next.

I only have the AMD conversion done. I will work on rebasing the intel
one when I get a chance.

On Tue, 13 Aug 2019 at 14:07, Christoph Hellwig  wrote:
>
> On Tue, Aug 13, 2019 at 08:09:26PM +0800, Tom Murphy wrote:
> > Hi Christoph,
> >
> > I quit my job and am having a great time traveling South East Asia.
>
> Enjoy!  I just returned from my vacation.
>
> > I definitely don't want this work to go to waste and I hope to repost it
> > later this week but I can't guarantee it.
> >
> > Let me know if you need this urgently.
>
> It isn't in any strict sense urgent.  I just have various DMA API plans
> that I'd rather just implement in dma-direct and dma-iommu rather than
> also in two additional commonly used iommu drivers.  So on the one had
> the sooner the better, on the other hand no real urgency.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V5 5/5] iommu/amd: Convert AMD iommu driver to the dma-iommu api

2019-08-15 Thread Tom Murphy

Convert the AMD iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the AMD iommu driver.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/Kconfig |   1 +
 drivers/iommu/amd_iommu.c | 677 --
 2 files changed, 68 insertions(+), 610 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd8cb3c..437428571512 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -138,6 +138,7 @@ config AMD_IOMMU
select PCI_PASID
select IOMMU_API
select IOMMU_IOVA
+   select IOMMU_DMA
depends on X86_64 && PCI && ACPI
---help---
  With this option you can enable support for AMD IOMMU hardware in
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0e53f9bd2be7..eb4801031a99 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,8 +90,6 @@ const struct iommu_ops amd_iommu_ops;
 static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
 int amd_iommu_max_glx_val = -1;
 
-static const struct dma_map_ops amd_iommu_dma_ops;
-
 /*
  * general struct to manage commands send to an IOMMU
  */
@@ -103,21 +102,6 @@ struct kmem_cache *amd_iommu_irq_cache;
 static void update_domain(struct protection_domain *domain);
 static int protection_domain_init(struct protection_domain *domain);
 static void detach_device(struct device *dev);
-static void iova_domain_flush_tlb(struct iova_domain *iovad);
-
-/*
- * Data container for a dma_ops specific protection domain
- */
-struct dma_ops_domain {
-   /* generic protection domain information */
-   struct protection_domain domain;
-
-   /* IOVA RB-Tree */
-   struct iova_domain iovad;
-};
-
-static struct iova_domain reserved_iova_ranges;
-static struct lock_class_key reserved_rbtree_key;
 
 /
  *
@@ -188,12 +172,6 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct dma_ops_domain* to_dma_ops_domain(struct protection_domain 
*domain)
-{
-   BUG_ON(domain->flags != PD_DMA_OPS_MASK);
-   return container_of(domain, struct dma_ops_domain, domain);
-}
-
 static struct iommu_dev_data *alloc_dev_data(u16 devid)
 {
struct iommu_dev_data *dev_data;
@@ -1267,12 +1245,6 @@ static void domain_flush_pages(struct protection_domain 
*domain,
__domain_flush_pages(domain, address, size, 0);
 }
 
-/* Flush the whole IO/TLB for a given protection domain */
-static void domain_flush_tlb(struct protection_domain *domain)
-{
-   __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0);
-}
-
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
 static void domain_flush_tlb_pde(struct protection_domain *domain)
 {
@@ -1674,43 +1646,6 @@ static unsigned long iommu_unmap_page(struct 
protection_domain *dom,
return unmapped;
 }
 
-/
- *
- * The next functions belong to the address allocator for the dma_ops
- * interface functions.
- *
- /
-
-
-static unsigned long dma_ops_alloc_iova(struct device *dev,
-   struct dma_ops_domain *dma_dom,
-   unsigned int pages, u64 dma_mask)
-{
-   unsigned long pfn = 0;
-
-   pages = __roundup_pow_of_two(pages);
-
-   if (dma_mask > DMA_BIT_MASK(32))
-   pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(DMA_BIT_MASK(32)), false);
-
-   if (!pfn)
-   pfn = alloc_iova_fast(_dom->iovad, pages,
- IOVA_PFN(dma_mask), true);
-
-   return (pfn << PAGE_SHIFT);
-}
-
-static void dma_ops_free_iova(struct dma_ops_domain *dma_dom,
- unsigned long address,
- unsigned int pages)
-{
-   pages = __roundup_pow_of_two(pages);
-   address >>= PAGE_SHIFT;
-
-   free_iova_fast(_dom->iovad, address, pages);
-}
-
 /
  *
  * The next functions belong to the domain allocation. A domain is
@@ -1787,38 +1722,23 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom)
-{
-   domain_flush_tlb(>domain);
-   domain_flush_complete(>domain);
-}
-
-static void iova_domain_flush_tlb(struct iova_domain *iovad)
-{
-   struct dma_ops_domain *dom;
-
-   dom = container_of(iovad, struct dma_ops_domain, iovad);
-
-   dma_ops_domain_flush_tlb(dom);
-}

[PATCH V5 3/5] iommu/dma-iommu: Handle deferred devices

2019-08-15 Thread Tom Murphy

Handle devices which defer their attach to the iommu in the dma-iommu api

Signed-off-by: Tom Murphy 
---
 drivers/iommu/dma-iommu.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2712fbc68b28..906b7fa14d3c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct iommu_dma_msi_page {
struct list_headlist;
@@ -351,6 +352,21 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
 }
 
+static int handle_deferred_device(struct device *dev,
+   struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   if (!is_kdump_kernel())
+   return 0;
+
+   if (unlikely(ops->is_attach_deferred &&
+   ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device(domain, dev);
+
+   return 0;
+}
+
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
@@ -463,6 +479,9 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t iova_off = iova_offset(iovad, phys);
dma_addr_t iova;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return DMA_MAPPING_ERROR;
+
size = iova_align(iovad, size + iova_off);
 
iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
@@ -581,6 +600,9 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
*dma_handle = DMA_MAPPING_ERROR;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return NULL;
+
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
min_size = PAGE_SIZE;
@@ -713,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle =__iommu_dma_map(dev, phys, size, prot);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -823,6 +845,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
unsigned long mask = dma_get_seg_boundary(dev);
int i;
 
+   if (unlikely(handle_deferred_device(dev, domain)))
+   return 0;
+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
 
-- 
2.20.1

[PATCH V5 4/5] iommu/dma-iommu: Use the dev->coherent_dma_mask

2019-08-15 Thread Tom Murphy

Use the dev->coherent_dma_mask when allocating in the dma-iommu ops api.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/dma-iommu.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 906b7fa14d3c..b9a3ab02434b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -471,7 +471,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot)
+   size_t size, int prot, dma_addr_t dma_mask)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -484,7 +484,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 
size = iova_align(iovad, size + iova_off);
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
if (!iova)
return DMA_MAPPING_ERROR;
 
@@ -735,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev));
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -938,7 +938,8 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+   dma_get_mask(dev));
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
@@ -1041,7 +1042,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!cpu_addr)
return NULL;
 
-   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);
+   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
+   dev->coherent_dma_mask);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
-- 
2.20.1

[PATCH V5 2/5] iommu: Add gfp parameter to iommu_ops::map

2019-08-15 Thread Tom Murphy

Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.

The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/amd_iommu.c  |  3 ++-
 drivers/iommu/arm-smmu-v3.c|  2 +-
 drivers/iommu/arm-smmu.c   |  2 +-
 drivers/iommu/dma-iommu.c  |  6 ++---
 drivers/iommu/exynos-iommu.c   |  2 +-
 drivers/iommu/intel-iommu.c|  2 +-
 drivers/iommu/iommu.c  | 43 +-
 drivers/iommu/ipmmu-vmsa.c |  2 +-
 drivers/iommu/msm_iommu.c  |  2 +-
 drivers/iommu/mtk_iommu.c  |  2 +-
 drivers/iommu/mtk_iommu_v1.c   |  2 +-
 drivers/iommu/omap-iommu.c |  2 +-
 drivers/iommu/qcom_iommu.c |  2 +-
 drivers/iommu/rockchip-iommu.c |  2 +-
 drivers/iommu/s390-iommu.c |  2 +-
 drivers/iommu/tegra-gart.c |  2 +-
 drivers/iommu/tegra-smmu.c |  2 +-
 drivers/iommu/virtio-iommu.c   |  2 +-
 include/linux/iommu.h  | 21 -
 19 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 1948be7ac8f8..0e53f9bd2be7 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3030,7 +3030,8 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
 }
 
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
-phys_addr_t paddr, size_t page_size, int iommu_prot)
+phys_addr_t paddr, size_t page_size, int iommu_prot,
+gfp_t gfp)
 {
struct protection_domain *domain = to_pdomain(dom);
int prot = 0;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e7f49fd1a7ba..acc0eae7963f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1975,7 +1975,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index aa06498f291d..05f42bdee494 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1284,7 +1284,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d991d40f797f..2712fbc68b28 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -469,7 +469,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
if (!iova)
return DMA_MAPPING_ERROR;
 
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
return DMA_MAPPING_ERROR;
}
@@ -613,7 +613,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
+   if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
< size)
goto out_free_sg;
 
@@ -873,7 +873,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 * We'll leave any physical concatenation to the IOMMU driver's
 * implementation - it knows better than we do.
 */
-   if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
+   if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len)
goto out_free_iova;
 
return __finalise_sg(dev, sg, nents, iova);
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 1934c16a5abc..b7dd46884692 100644
--- a/drivers/iommu/exynos-iommu.c
+++

[PATCH V5 1/5] iommu/amd: Remove unnecessary locking from AMD iommu driver

2019-08-15 Thread Tom Murphy

We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap.
iommu_map doesn’t lock while mapping and so no two calls should touch
the same iova range. The AMD driver already handles the page table page
allocations without locks so we can safely remove the locks.

Signed-off-by: Tom Murphy 
---
 drivers/iommu/amd_iommu.c   | 10 +-
 drivers/iommu/amd_iommu_types.h |  1 -
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 008da21a2592..1948be7ac8f8 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2858,7 +2858,6 @@ static void protection_domain_free(struct 
protection_domain *domain)
 static int protection_domain_init(struct protection_domain *domain)
 {
spin_lock_init(>lock);
-   mutex_init(>api_lock);
domain->id = domain_id_alloc();
if (!domain->id)
return -ENOMEM;
@@ -3045,9 +3044,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   mutex_lock(>api_lock);
ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
-   mutex_unlock(>api_lock);
 
domain_flush_np_cache(domain, iova, page_size);
 
@@ -3058,16 +3055,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
   size_t page_size)
 {
struct protection_domain *domain = to_pdomain(dom);
-   size_t unmap_size;
 
if (domain->mode == PAGE_MODE_NONE)
return 0;
 
-   mutex_lock(>api_lock);
-   unmap_size = iommu_unmap_page(domain, iova, page_size);
-   mutex_unlock(>api_lock);
-
-   return unmap_size;
+   return iommu_unmap_page(domain, iova, page_size);
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 9ac229e92b07..b764e1a73dcf 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -468,7 +468,6 @@ struct protection_domain {
struct iommu_domain domain; /* generic domain handle used by
   iommu core code */
spinlock_t lock;/* mostly used to lock the page table*/
-   struct mutex api_lock;  /* protect page tables in the iommu-api path */
u16 id; /* the domain id written to the device table */
int mode;   /* paging mode (0-6 levels) */
u64 *pt_root;   /* page table root pointer */
-- 
2.20.1

[PATCH V5 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-08-15 Thread Tom Murphy

Convert the AMD iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the AMD iommu driver.

Change-log:
V5:
-Rebase on top of linux-next
V4:
-Rebase on top of linux-next
-Split the removing of the unnecessary locking in the amd iommu driver into a 
seperate patch
-refactor the "iommu/dma-iommu: Handle deferred devices" patch and address 
comments
v3:
-rename dma_limit to dma_mask
-exit handle_deferred_device early if (!is_kdump_kernel())
-remove pointless calls to handle_deferred_device
v2:
-Rebase on top of this series:
 http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3
-Add a gfp_t parameter to the iommu_ops::map function.
-Made use of the reserve region code inside the dma-iommu api

Tom Murphy (5):
  iommu/amd: Remove unnecessary locking from AMD iommu driver
  iommu: Add gfp parameter to iommu_ops::map
  iommu/dma-iommu: Handle deferred devices
  iommu/dma-iommu: Use the dev->coherent_dma_mask
  iommu/amd: Convert AMD iommu driver to the dma-iommu api

 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/amd_iommu.c   | 690 
 drivers/iommu/amd_iommu_types.h |   1 -
 drivers/iommu/arm-smmu-v3.c |   2 +-
 drivers/iommu/arm-smmu.c|   2 +-
 drivers/iommu/dma-iommu.c   |  43 +-
 drivers/iommu/exynos-iommu.c|   2 +-
 drivers/iommu/intel-iommu.c |   2 +-
 drivers/iommu/iommu.c   |  43 +-
 drivers/iommu/ipmmu-vmsa.c  |   2 +-
 drivers/iommu/msm_iommu.c   |   2 +-
 drivers/iommu/mtk_iommu.c   |   2 +-
 drivers/iommu/mtk_iommu_v1.c|   2 +-
 drivers/iommu/omap-iommu.c  |   2 +-
 drivers/iommu/qcom_iommu.c  |   2 +-
 drivers/iommu/rockchip-iommu.c  |   2 +-
 drivers/iommu/s390-iommu.c  |   2 +-
 drivers/iommu/tegra-gart.c  |   2 +-
 drivers/iommu/tegra-smmu.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   2 +-
 include/linux/iommu.h   |  21 +-
 21 files changed, 178 insertions(+), 651 deletions(-)

-- 
2.20.1

Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook

2019-08-15 Thread Will Deacon

On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote:
> Allocating and initialising a context for a domain is another point
> where certain implementations are known to want special behaviour.
> Currently the other half of the Cavium workaround comes into play here,
> so let's finish the job to get the whole thing right out of the way.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-impl.c | 39 +--
>  drivers/iommu/arm-smmu.c  | 51 +++
>  drivers/iommu/arm-smmu.h  | 42 +++--
>  3 files changed, 86 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c8904da08354..7a657d47b6ec 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = {
>  };
>  
>  
> +struct cavium_smmu {
> + struct arm_smmu_device smmu;
> + u32 id_base;
> +};
> +#define to_csmmu(s)  container_of(s, struct cavium_smmu, smmu)

To be honest with you, I'd just use container_of directly for the two
callsites that need it. "to_csmmu" isn't a great name when we're also got
the calxeda thing in here.

>  static int cavium_cfg_probe(struct arm_smmu_device *smmu)
>  {
>   static atomic_t context_count = ATOMIC_INIT(0);
> @@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
>* Ensure ASID and VMID allocation is unique across all SMMUs in
>* the system.
>*/
> - smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks,
> + to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks,
>  _count);
>   dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 
> 27704\n");
>  
>   return 0;
>  }
>  
> +int cavium_init_context(struct arm_smmu_domain *smmu_domain)
> +{
> + u32 id_base = to_csmmu(smmu_domain->smmu)->id_base;
> +
> + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2)
> + smmu_domain->cfg.vmid += id_base;
> + else
> + smmu_domain->cfg.asid += id_base;
> +
> + return 0;
> +}
> +
>  const struct arm_smmu_impl cavium_impl = {
>   .cfg_probe = cavium_cfg_probe,
> + .init_context = cavium_init_context,
>  };
>  
> +struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> + struct cavium_smmu *csmmu;
> +
> + csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL);
> + if (!csmmu)
> + return ERR_PTR(-ENOMEM);
> +
> + csmmu->smmu = *smmu;
> + csmmu->smmu.impl = _impl;
> +
> + devm_kfree(smmu->dev, smmu);
> +
> + return >smmu;
> +}
> +
>  
>  #define ARM_MMU500_ACTLR_CPRE(1 << 1)
>  
> @@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   smmu->impl = _impl;
>  
>   if (smmu->model == CAVIUM_SMMUV2)
> - smmu->impl = _impl;
> + return cavium_smmu_impl_init(smmu);
>  
>   if (smmu->model == ARM_MMU500)
>   smmu->impl = _mmu500_impl;

Maybe rework this so we do the calxeda detection first (and return if we
match), followed by a switch on smmu->model to make it crystal clear that
we match only one?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 00/15] Arm SMMU refactoring

2019-08-15 Thread Will Deacon

Hi Robin,

On Fri, Aug 09, 2019 at 06:07:37PM +0100, Robin Murphy wrote:
> This is a big refactoring of arm-smmu in order to help cope with the
> various divergent implementation details currently flying around. So
> far we've been accruing various quirks and errata workarounds within
> the main flow of the driver, but given that it's written to an
> architecture rather than any particular hardware implementation, after
> a point these start to become increasingly invasive and potentially
> conflict with each other.
> 
> These patches clean up the existing quirks handled by the driver to
> lay a foundation on which we can continue to add more in a maintainable
> fashion. The idea is that major vendor customisations can then be kept
> in arm-smmu-.c implementation files out of each others' way.
> 
> A branch is available at:
> 
>   git://linux-arm.org/linux-rm  iommu/smmu-impl
> 
> which I'll probably keep tweaking until I'm happy with the names of
> things; I just didn't want to delay this initial posting any lomnger.

Thanks, this all looks pretty decent to me. I've mainly left you a bunch
of nits (hey, it's a refactoring series!) but I did spot one pre-existing
howler that we should address.

When do you think you'll have stopped tweaking this so that I can pick it
up? I'd really like to see it in 5.4 so that others can start working on
top of it.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/15] iommu/arm-smmu: Abstract context bank accesses

2019-08-15 Thread Will Deacon

On Fri, Aug 09, 2019 at 06:07:45PM +0100, Robin Murphy wrote:
> Context bank accesses are fiddly enough to deserve a number of extra
> helpers to keep the callsites looking sane, even though there are only
> one or two of each.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 137 ---
>  1 file changed, 72 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 72505647b77d..abdcc3f52e2e 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -82,9 +82,6 @@
>   ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)   \
>   ? 0x400 : 0))
>  
> -/* Translation context bank */
> -#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << 
> (smmu)->pgshift))
> -
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
>  
> @@ -265,9 +262,29 @@ static void arm_smmu_writel(struct arm_smmu_device 
> *smmu, int page, int offset,
>   writel_relaxed(val, arm_smmu_page(smmu, page) + offset);
>  }
>  
> +static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset)
> +{
> + return readq_relaxed(arm_smmu_page(smmu, page) + offset);
> +}
> +
> +static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int 
> offset,
> + u64 val)
> +{
> + writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
> +}
> +
>  #define arm_smmu_read_gr1(s, r)  arm_smmu_readl((s), 1, (r))
>  #define arm_smmu_write_gr1(s, r, v)  arm_smmu_writel((s), 1, (r), (v))
>  
> +#define arm_smmu_read_cb(s, n, r)\
> + arm_smmu_readl((s), (s)->cb_base + (n), (r))
> +#define arm_smmu_write_cb(s, n, r, v)\
> + arm_smmu_writel((s), (s)->cb_base + (n), (r), (v))
> +#define arm_smmu_read_cb_q(s, n, r)  \
> + arm_smmu_readq((s), (s)->cb_base + (n), (r))
> +#define arm_smmu_write_cb_q(s, n, r, v)  \
> + arm_smmu_writeq((s), (s)->cb_base + (n), (r), (v))

'r' for 'offset'? (maybe just rename offset => register in the helpers).

>  struct arm_smmu_option_prop {
>   u32 opt;
>   const char *prop;
> @@ -423,15 +440,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, 
> int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> - void __iomem *sync, void __iomem *status)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
> + int sync, int status)
>  {
>   unsigned int spin_cnt, delay;
> + u32 reg;
>  
> - writel_relaxed(QCOM_DUMMY_VAL, sync);
> + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
>   for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>   for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
> - if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> + reg = arm_smmu_readl(smmu, page, status);
> + if (!(reg & sTLBGSTATUS_GSACTIVE))
>   return;
>   cpu_relax();
>   }
> @@ -443,12 +462,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
> *smmu,
>  
>  static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
>  {
> - void __iomem *base = ARM_SMMU_GR0(smmu);
>   unsigned long flags;
>  
>   spin_lock_irqsave(>global_sync_lock, flags);
> - __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
> - base + ARM_SMMU_GR0_sTLBGSTATUS);
> + __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC,

Can we have a #define for page zero, please?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/15] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()

2019-08-15 Thread Will Deacon

On Fri, Aug 09, 2019 at 06:07:42PM +0100, Robin Murphy wrote:
> Since we now use separate iommu_gather_ops for stage 1 and stage 2
> contexts, we may as well divide up the monolithic callback into its
> respective stage 1 and stage 2 parts.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu.c | 66 ++--
>  1 file changed, 37 insertions(+), 29 deletions(-)

This will conflict with my iommu API batching stuff, but I can sort that
out if/when it gets queued by Joerg.

> - if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> - iova &= ~12UL;
> - iova |= cfg->asid;
> - do {
> - writel_relaxed(iova, reg);
> - iova += granule;
> - } while (size -= granule);
> - } else {
> - iova >>= 12;
> - iova |= (u64)cfg->asid << 48;
> - do {
> - writeq_relaxed(iova, reg);
> - iova += granule >> 12;
> - } while (size -= granule);
> - }
> - } else {
> - reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
> -   ARM_SMMU_CB_S2_TLBIIPAS2;
> - iova >>= 12;
> + if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> + iova &= ~12UL;

Oh baby. You should move code around more often, so I'm forced to take a
second look!

Can you cook a fix for this that we can route separately, please? I see
it also made its way into qcom_iommu.c...

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP

2019-08-15 Thread Christoph Hellwig

On Thu, Aug 15, 2019 at 10:25:52AM +0100, James Bottomley wrote:
> >  which means exporting normally cachable memory to userspace is
> > relatively dangrous due to cache aliasing.
> > 
> > But normally cachable memory is only allocated by dma_alloc_coherent
> > on parisc when using the sba_iommu or ccio_iommu drivers, so just
> > remove the .mmap implementation for them so that we don't have to set
> > ARCH_NO_COHERENT_DMA_MMAP, which I plan to get rid of.
> 
> So I don't think this is quite right.  We have three architectural
> variants essentially (hidden behind about 12 cpu types):
> 
>1. pa70xx: These can't turn off page caching, so they were the non
>   coherent problem case
>2. pa71xx: These can manufacture coherent memory simply by turning off
>   the cache on a per page basis
>3. pa8xxx: these have a full cache flush coherence mechanism.
> 
> (I might have this slightly wrong: I vaguely remember the pa71xxlc
> variants have some weird cache quirks for DMA as well)
> 
> So I think pa70xx we can't mmap.  pa71xx we can provided we mark the
> page as uncached ... which should already have happened in the allocate
> and pa8xxx which can always mmap dma memory without any special tricks.

Except for the different naming scheme vs the code this matches my
assumptions.

In the code we have three cases (and a fourth EISA case mention in
comments, but not actually implemented as far as I can tell):

arch/parisc/kernel/pci-dma.c says in the top of file comments:

** AFAIK, all PA7100LC and PA7300LC platforms can use this code.

and the handles two different case.  for cpu_type == pcxl or pcxl2
it maps the memory as uncached for dma_alloc_coherent, and for all
other cpu types it fails the coherent allocations.

In addition to that there are the ccio and sba iommu drivers, of which
according to your above comment one is always present for pa8xxx.

Which brings us back to this patch, which ensures that no cacheable
memory is exported to userspace by removing ->mmap from ccio and sba.
It then enabled dma_mmap_coherent for the pcxl or pcxl2 case that
allocates uncached memory, which dma_mmap_coherent does not work
because dma_alloc_coherent already failed for the !pcxl && !pcxl2
and thus there is no memory to mmap.

So if the description is too confusing please suggest a better
one, I'm a little lost between all these code names and product
names (arch/parisc/include/asm/dma-mapping.h uses yet another set).

Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver

2019-08-15 Thread Sakari Ailus

On Thu, Aug 15, 2019 at 07:29:59PM +0900, Tomasz Figa wrote:
> On Thu, Aug 15, 2019 at 5:25 PM Sakari Ailus
>  wrote:
> >
> > Hi Helen,
> >
> > On Wed, Aug 14, 2019 at 09:58:05PM -0300, Helen Koike wrote:
> >
> > ...
> >
> > > >> +static int rkisp1_isp_sd_set_fmt(struct v4l2_subdev *sd,
> > > >> +   struct v4l2_subdev_pad_config *cfg,
> > > >> +   struct v4l2_subdev_format *fmt)
> > > >> +{
> > > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > > >> +  struct rkisp1_isp_subdev *isp_sd = _dev->isp_sdev;
> > > >> +  struct v4l2_mbus_framefmt *mf = >format;
> > > >> +
> > > >
> > > > Note that for sub-device nodes, the driver is itself responsible for
> > > > serialising the access to its data structures.
> > >
> > > But looking at subdev_do_ioctl_lock(), it seems that it serializes the
> > > ioctl calls for subdevs, no? Or I'm misunderstanding something (which is
> > > most probably) ?
> >
> > Good question. I had missed this change --- subdev_do_ioctl_lock() is
> > relatively new. But setting that lock is still not possible as the struct
> > is allocated in the framework and the device is registered before the
> > driver gets hold of it. It's a good idea to provide the same serialisation
> > for subdevs as well.
> >
> > I'll get back to this later.
> >
> > ...
> >
> > > >> +static int rkisp1_isp_sd_s_power(struct v4l2_subdev *sd, int on)
> > > >
> > > > If you support runtime PM, you shouldn't implement the s_power op.
> > >
> > > Is is ok to completly remove the usage of runtime PM then?
> > > Like this http://ix.io/1RJb ?
> >
> > Please use runtime PM instead. In the long run we should get rid of the
> > s_power op. Drivers themselves know better when the hardware they control
> > should be powered on or off.
> >
> 
> One also needs to use runtime PM to handle power domains and power
> dependencies on auxiliary devices, e.g. IOMMU.
> 
> > >
> > > tbh I'm not that familar with runtime PM and I'm not sure what is the
> > > difference of it and using s_power op (and 
> > > Documentation/power/runtime_pm.rst
> > > is not being that helpful tbh).
> >
> > You can find a simple example e.g. in
> > drivers/media/platform/atmel/atmel-isi.c .
> >
> > >
> > > >
> > > > You'll still need to call s_power on external subdevs though.
> > > >
> > > >> +{
> > > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > > >> +  int ret;
> > > >> +
> > > >> +  v4l2_dbg(1, rkisp1_debug, _dev->v4l2_dev, "s_power: %d\n", on);
> > > >> +
> > > >> +  if (on) {
> > > >> +  ret = pm_runtime_get_sync(isp_dev->dev);
> > >
> > > If this is not ok to remove suport for runtime PM, then where should I put
> > > the call to pm_runtime_get_sync() if not in this s_power op ?
> >
> > Basically the runtime_resume and runtime_suspend callbacks are where the
> > device power state changes are implemented, and pm_runtime_get_sync and
> > pm_runtime_put are how the driver controls the power state.
> >
> > So you no longer need the s_power() op at all. The op needs to be called on
> > the pipeline however, as there are drivers that still use it.
> >
> 
> For this driver, I suppose we would _get_sync() when we start
> streaming (in the hardware, i.e. we want the ISP to start capturing
> frames) and _put() when we stop and the driver shouldn't perform any
> access to the hardware when the streaming is not active.

Agreed.

-- 
Sakari Ailus
sakari.ai...@linux.intel.com

Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver

2019-08-15 Thread Tomasz Figa

On Thu, Aug 15, 2019 at 5:25 PM Sakari Ailus
 wrote:
>
> Hi Helen,
>
> On Wed, Aug 14, 2019 at 09:58:05PM -0300, Helen Koike wrote:
>
> ...
>
> > >> +static int rkisp1_isp_sd_set_fmt(struct v4l2_subdev *sd,
> > >> +   struct v4l2_subdev_pad_config *cfg,
> > >> +   struct v4l2_subdev_format *fmt)
> > >> +{
> > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > >> +  struct rkisp1_isp_subdev *isp_sd = _dev->isp_sdev;
> > >> +  struct v4l2_mbus_framefmt *mf = >format;
> > >> +
> > >
> > > Note that for sub-device nodes, the driver is itself responsible for
> > > serialising the access to its data structures.
> >
> > But looking at subdev_do_ioctl_lock(), it seems that it serializes the
> > ioctl calls for subdevs, no? Or I'm misunderstanding something (which is
> > most probably) ?
>
> Good question. I had missed this change --- subdev_do_ioctl_lock() is
> relatively new. But setting that lock is still not possible as the struct
> is allocated in the framework and the device is registered before the
> driver gets hold of it. It's a good idea to provide the same serialisation
> for subdevs as well.
>
> I'll get back to this later.
>
> ...
>
> > >> +static int rkisp1_isp_sd_s_power(struct v4l2_subdev *sd, int on)
> > >
> > > If you support runtime PM, you shouldn't implement the s_power op.
> >
> > Is is ok to completly remove the usage of runtime PM then?
> > Like this http://ix.io/1RJb ?
>
> Please use runtime PM instead. In the long run we should get rid of the
> s_power op. Drivers themselves know better when the hardware they control
> should be powered on or off.
>

One also needs to use runtime PM to handle power domains and power
dependencies on auxiliary devices, e.g. IOMMU.

> >
> > tbh I'm not that familar with runtime PM and I'm not sure what is the
> > difference of it and using s_power op (and 
> > Documentation/power/runtime_pm.rst
> > is not being that helpful tbh).
>
> You can find a simple example e.g. in
> drivers/media/platform/atmel/atmel-isi.c .
>
> >
> > >
> > > You'll still need to call s_power on external subdevs though.
> > >
> > >> +{
> > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > >> +  int ret;
> > >> +
> > >> +  v4l2_dbg(1, rkisp1_debug, _dev->v4l2_dev, "s_power: %d\n", on);
> > >> +
> > >> +  if (on) {
> > >> +  ret = pm_runtime_get_sync(isp_dev->dev);
> >
> > If this is not ok to remove suport for runtime PM, then where should I put
> > the call to pm_runtime_get_sync() if not in this s_power op ?
>
> Basically the runtime_resume and runtime_suspend callbacks are where the
> device power state changes are implemented, and pm_runtime_get_sync and
> pm_runtime_put are how the driver controls the power state.
>
> So you no longer need the s_power() op at all. The op needs to be called on
> the pipeline however, as there are drivers that still use it.
>

For this driver, I suppose we would _get_sync() when we start
streaming (in the hardware, i.e. we want the ISP to start capturing
frames) and _put() when we stop and the driver shouldn't perform any
access to the hardware when the streaming is not active.

Best regards,
Tomasz

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Yong Wu

On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote:
> On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote:
> > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > > > 
> > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > > > is remapped to high address from 0x1__ to 0x1__, the
> > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > > > for all PTEs which means to enable bit32 of physical address. Here is
> > > > the detailed remap relationship in the "4GB mode":
> > > > CPU PA ->HW PA
> > > > 0x4000_  0x1_4000_ (Add bit32)
> > > > 0x8000_  0x1_8000_ ...
> > > > 0xc000_  0x1_c000_ ...
> > > > 0x1__0x1__ (No change)
> > > 
> > > So in this example, there are no PAs below 0x4000_ yet you later
> > > add code to deal with that:
> > > 
> > > > +   /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 
> > > > 0x4000_.*/
> > > > +   if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> > > > +   paddr |= BIT_ULL(32);
> > > 
> > > Why? Mainline currently doesn't do anything like this for the "4gb mode"
> > > support as far as I can tell. In fact, we currently unconditionally set
> > > bit 32 in the physical address returned by iova_to_phys() which wouldn't
> > > match your CPU PAs listed above, so I'm confused about how this is 
> > > supposed
> > > to work.
> > 
> > Actually current mainline have a bug for this. So I tried to use another
> > special patch[1] for it in v8.
> 
> If you're fixing a bug in mainline, I'd prefer to see that as a separate
> patch.
> 
> > But the issue is not critical since MediaTek multimedia consumer(v4l2
> > and drm) don't call iommu_iova_to_phys currently.
> > 
> > > 
> > > The way I would like this quirk to work is that the io-pgtable code
> > > basically sets bit 9 in the pte when bit 32 is set in the physical 
> > > address,
> > > and sets bit 4 in the pte when bit 33 is set in the physical address. It
> > > would then do the opposite when converting a pte to a physical address.
> > > 
> > > That way, your driver can call the page table code directly with the high
> > > addresses and we don't have to do any manual offsetting or range checking
> > > in the page table code.
> > 
> > In this case, the mt8183 can work successfully while the "4gb
> > mode"(mt8173/mt2712) can not.
> > 
> > In the "4gb mode", As the remap relationship above, we should always add
> > bit32 in pte as we did in [2]. and need add bit32 in the
> > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special
> > flow:
> > a. Always add bit32 in paddr_to_iopte.
> > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.
> 
> I think this is probably at the heart of my misunderstanding. What is so
> special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM
> or something else?

SRAM and HW register that IOMMU can not access.

(sorry, My mailbox has something wrong.)

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Will Deacon

On Thu, Aug 15, 2019 at 06:03:30PM +0800, Yong Wu wrote:
> On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote:
> > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote:
> > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > > > > 
> > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > > > > is remapped to high address from 0x1__ to 0x1__, the
> > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > > > > for all PTEs which means to enable bit32 of physical address. Here is
> > > > > the detailed remap relationship in the "4GB mode":
> > > > > CPU PA ->HW PA
> > > > > 0x4000_  0x1_4000_ (Add bit32)
> > > > > 0x8000_  0x1_8000_ ...
> > > > > 0xc000_  0x1_c000_ ...
> > > > > 0x1__0x1__ (No change)
> > > > 
> > > > So in this example, there are no PAs below 0x4000_ yet you later
> > > > add code to deal with that:
> > > > 
> > > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 
> > > > > 0x4000_.*/
> > > > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> > > > > + paddr |= BIT_ULL(32);
> > > > 
> > > > Why? Mainline currently doesn't do anything like this for the "4gb mode"
> > > > support as far as I can tell. In fact, we currently unconditionally set
> > > > bit 32 in the physical address returned by iova_to_phys() which wouldn't
> > > > match your CPU PAs listed above, so I'm confused about how this is 
> > > > supposed
> > > > to work.
> > > 
> > > Actually current mainline have a bug for this. So I tried to use another
> > > special patch[1] for it in v8.
> > 
> > If you're fixing a bug in mainline, I'd prefer to see that as a separate
> > patch.
> > 
> > > But the issue is not critical since MediaTek multimedia consumer(v4l2
> > > and drm) don't call iommu_iova_to_phys currently.
> > > 
> > > > 
> > > > The way I would like this quirk to work is that the io-pgtable code
> > > > basically sets bit 9 in the pte when bit 32 is set in the physical 
> > > > address,
> > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It
> > > > would then do the opposite when converting a pte to a physical address.
> > > > 
> > > > That way, your driver can call the page table code directly with the 
> > > > high
> > > > addresses and we don't have to do any manual offsetting or range 
> > > > checking
> > > > in the page table code.
> > > 
> > > In this case, the mt8183 can work successfully while the "4gb
> > > mode"(mt8173/mt2712) can not.
> > > 
> > > In the "4gb mode", As the remap relationship above, we should always add
> > > bit32 in pte as we did in [2]. and need add bit32 in the
> > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special
> > > flow:
> > > a. Always add bit32 in paddr_to_iopte.
> > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.
> > 
> > I think this is probably at the heart of my misunderstanding. What is so
> > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM
> > or something else?
> 
> SRAM and the HW registers.

Do we actually need to be able to map those in the IOMMU?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Yong Wu

On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote:
> On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote:
> > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > > > 
> > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > > > is remapped to high address from 0x1__ to 0x1__, the
> > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > > > for all PTEs which means to enable bit32 of physical address. Here is
> > > > the detailed remap relationship in the "4GB mode":
> > > > CPU PA ->HW PA
> > > > 0x4000_  0x1_4000_ (Add bit32)
> > > > 0x8000_  0x1_8000_ ...
> > > > 0xc000_  0x1_c000_ ...
> > > > 0x1__0x1__ (No change)
> > > 
> > > So in this example, there are no PAs below 0x4000_ yet you later
> > > add code to deal with that:
> > > 
> > > > +   /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 
> > > > 0x4000_.*/
> > > > +   if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> > > > +   paddr |= BIT_ULL(32);
> > > 
> > > Why? Mainline currently doesn't do anything like this for the "4gb mode"
> > > support as far as I can tell. In fact, we currently unconditionally set
> > > bit 32 in the physical address returned by iova_to_phys() which wouldn't
> > > match your CPU PAs listed above, so I'm confused about how this is 
> > > supposed
> > > to work.
> > 
> > Actually current mainline have a bug for this. So I tried to use another
> > special patch[1] for it in v8.
> 
> If you're fixing a bug in mainline, I'd prefer to see that as a separate
> patch.
> 
> > But the issue is not critical since MediaTek multimedia consumer(v4l2
> > and drm) don't call iommu_iova_to_phys currently.
> > 
> > > 
> > > The way I would like this quirk to work is that the io-pgtable code
> > > basically sets bit 9 in the pte when bit 32 is set in the physical 
> > > address,
> > > and sets bit 4 in the pte when bit 33 is set in the physical address. It
> > > would then do the opposite when converting a pte to a physical address.
> > > 
> > > That way, your driver can call the page table code directly with the high
> > > addresses and we don't have to do any manual offsetting or range checking
> > > in the page table code.
> > 
> > In this case, the mt8183 can work successfully while the "4gb
> > mode"(mt8173/mt2712) can not.
> > 
> > In the "4gb mode", As the remap relationship above, we should always add
> > bit32 in pte as we did in [2]. and need add bit32 in the
> > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special
> > flow:
> > a. Always add bit32 in paddr_to_iopte.
> > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.
> 
> I think this is probably at the heart of my misunderstanding. What is so
> special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM
> or something else?

SRAM and the HW registers.

> 
> > > Please can you explain to me why the diff below doesn't work on top of
> > > this series?
> > 
> > The diff below is just I did in v8[3]. The different is that I move the
> > "4gb mode" special flow in the mtk_iommu.c in v8, the code is like
> > [4]below. When I sent v9, I found that I can distinguish the "4gb mode"
> > with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5]
> > based on your diff.
> > 
> > 
> > >  I'm happy to chat on IRC if you think it would be easier,
> > > because I have a horrible feeling that we've been talking past each other
> > > and I'd like to see this support merged for 5.4.
> > 
> > Thanks very much for your view, I'm sorry that I don't have IRC. I will
> > send the next version quickly if we have a conclusion here. Then Which
> > way is better? If you'd like keep the pagetable code clean, I will add
> > the "4gb mode" special flow into mtk_iommu.c.
> 
> I mean, we could even talk on the phone if necessary because I can't accept
> this code unless I understand how it works!
> 
> To be blunt, I'd like to avoid the io-pgtable changes looking different to
> what I suggested:
> 
> > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > > b/drivers/iommu/io-pgtable-arm-v7s.c
> > > index ab12ef5f8b03..d8d84617c822 100644
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t 
> > > paddr, int lvl,
> > >   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
> > >  
> > >   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) {
> > > - if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS)
> > > + if (paddr & BIT_ULL(32))
> > >   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
> > >

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Will Deacon

On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote:
> On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > > 
> > > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > > is remapped to high address from 0x1__ to 0x1__, the
> > > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > > for all PTEs which means to enable bit32 of physical address. Here is
> > > the detailed remap relationship in the "4GB mode":
> > > CPU PA ->HW PA
> > > 0x4000_  0x1_4000_ (Add bit32)
> > > 0x8000_  0x1_8000_ ...
> > > 0xc000_  0x1_c000_ ...
> > > 0x1__0x1__ (No change)
> > 
> > So in this example, there are no PAs below 0x4000_ yet you later
> > add code to deal with that:
> > 
> > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/
> > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> > > + paddr |= BIT_ULL(32);
> > 
> > Why? Mainline currently doesn't do anything like this for the "4gb mode"
> > support as far as I can tell. In fact, we currently unconditionally set
> > bit 32 in the physical address returned by iova_to_phys() which wouldn't
> > match your CPU PAs listed above, so I'm confused about how this is supposed
> > to work.
> 
> Actually current mainline have a bug for this. So I tried to use another
> special patch[1] for it in v8.

If you're fixing a bug in mainline, I'd prefer to see that as a separate
patch.

> But the issue is not critical since MediaTek multimedia consumer(v4l2
> and drm) don't call iommu_iova_to_phys currently.
> 
> > 
> > The way I would like this quirk to work is that the io-pgtable code
> > basically sets bit 9 in the pte when bit 32 is set in the physical address,
> > and sets bit 4 in the pte when bit 33 is set in the physical address. It
> > would then do the opposite when converting a pte to a physical address.
> > 
> > That way, your driver can call the page table code directly with the high
> > addresses and we don't have to do any manual offsetting or range checking
> > in the page table code.
> 
> In this case, the mt8183 can work successfully while the "4gb
> mode"(mt8173/mt2712) can not.
> 
> In the "4gb mode", As the remap relationship above, we should always add
> bit32 in pte as we did in [2]. and need add bit32 in the
> "iova_to_phys"(Not always add.). That means the "4gb mode" has a special
> flow:
> a. Always add bit32 in paddr_to_iopte.
> b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.

I think this is probably at the heart of my misunderstanding. What is so
special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM
or something else?

> > Please can you explain to me why the diff below doesn't work on top of
> > this series?
> 
> The diff below is just I did in v8[3]. The different is that I move the
> "4gb mode" special flow in the mtk_iommu.c in v8, the code is like
> [4]below. When I sent v9, I found that I can distinguish the "4gb mode"
> with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5]
> based on your diff.
> 
> 
> >  I'm happy to chat on IRC if you think it would be easier,
> > because I have a horrible feeling that we've been talking past each other
> > and I'd like to see this support merged for 5.4.
> 
> Thanks very much for your view, I'm sorry that I don't have IRC. I will
> send the next version quickly if we have a conclusion here. Then Which
> way is better? If you'd like keep the pagetable code clean, I will add
> the "4gb mode" special flow into mtk_iommu.c.

I mean, we could even talk on the phone if necessary because I can't accept
this code unless I understand how it works!

To be blunt, I'd like to avoid the io-pgtable changes looking different to
what I suggested:

> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index ab12ef5f8b03..d8d84617c822 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, 
> > int lvl,
> > arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
> >  
> > if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) {
> > -   if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS)
> > +   if (paddr & BIT_ULL(32))
> > pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
> > if (paddr & BIT_ULL(33))
> > pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
> > @@ -206,17 +206,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, 
> > int lvl,
> > mask = ARM_V7S_LVL_MASK(lvl);
> >  
> > paddr = pte & mask;
> > -   if (cfg->oas == 32 || !(cfg->quirks &

Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP

2019-08-15 Thread James Bottomley

On Thu, 2019-08-08 at 19:00 +0300, Christoph Hellwig wrote:
> parisc is the only architecture that sets ARCH_NO_COHERENT_DMA_MMAP
> when an MMU is enabled.  AFAIK this is because parisc CPUs use VIVT
> caches,

We're actually VIPT but the same principle applies.

>  which means exporting normally cachable memory to userspace is
> relatively dangrous due to cache aliasing.
> 
> But normally cachable memory is only allocated by dma_alloc_coherent
> on parisc when using the sba_iommu or ccio_iommu drivers, so just
> remove the .mmap implementation for them so that we don't have to set
> ARCH_NO_COHERENT_DMA_MMAP, which I plan to get rid of.

So I don't think this is quite right.  We have three architectural
variants essentially (hidden behind about 12 cpu types):

   1. pa70xx: These can't turn off page caching, so they were the non
  coherent problem case
   2. pa71xx: These can manufacture coherent memory simply by turning off
  the cache on a per page basis
   3. pa8xxx: these have a full cache flush coherence mechanism.

(I might have this slightly wrong: I vaguely remember the pa71xxlc
variants have some weird cache quirks for DMA as well)

So I think pa70xx we can't mmap.  pa71xx we can provided we mark the
page as uncached ... which should already have happened in the allocate
and pa8xxx which can always mmap dma memory without any special tricks.

James

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-08-15 Thread Yong Wu

On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote:
> Hi Yong Wu,
> 
> Sorry, but I'm still deeply confused by this patch.

Sorry for this. the "4GB mode" really is a bit odd...

> 
> On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote:
> > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> > 
> > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > is remapped to high address from 0x1__ to 0x1__, the
> > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > for all PTEs which means to enable bit32 of physical address. Here is
> > the detailed remap relationship in the "4GB mode":
> > CPU PA ->HW PA
> > 0x4000_  0x1_4000_ (Add bit32)
> > 0x8000_  0x1_8000_ ...
> > 0xc000_  0x1_c000_ ...
> > 0x1__0x1__ (No change)
> 
> So in this example, there are no PAs below 0x4000_ yet you later
> add code to deal with that:
> 
> > +   /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/
> > +   if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL)
> > +   paddr |= BIT_ULL(32);
> 
> Why? Mainline currently doesn't do anything like this for the "4gb mode"
> support as far as I can tell. In fact, we currently unconditionally set
> bit 32 in the physical address returned by iova_to_phys() which wouldn't
> match your CPU PAs listed above, so I'm confused about how this is supposed
> to work.

Actually current mainline have a bug for this. So I tried to use another
special patch[1] for it in v8.

But the issue is not critical since MediaTek multimedia consumer(v4l2
and drm) don't call iommu_iova_to_phys currently.

> 
> The way I would like this quirk to work is that the io-pgtable code
> basically sets bit 9 in the pte when bit 32 is set in the physical address,
> and sets bit 4 in the pte when bit 33 is set in the physical address. It
> would then do the opposite when converting a pte to a physical address.
> 
> That way, your driver can call the page table code directly with the high
> addresses and we don't have to do any manual offsetting or range checking
> in the page table code.

In this case, the mt8183 can work successfully while the "4gb
mode"(mt8173/mt2712) can not.

In the "4gb mode", As the remap relationship above, we should always add
bit32 in pte as we did in [2]. and need add bit32 in the
"iova_to_phys"(Not always add.). That means the "4gb mode" has a special
flow:
a. Always add bit32 in paddr_to_iopte.
b. Add bit32 only when PA < 0x4000 in iopte_to_paddr.

> 
> Please can you explain to me why the diff below doesn't work on top of
> this series?

The diff below is just I did in v8[3]. The different is that I move the
"4gb mode" special flow in the mtk_iommu.c in v8, the code is like
[4]below. When I sent v9, I found that I can distinguish the "4gb mode"
with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5]
based on your diff.


>  I'm happy to chat on IRC if you think it would be easier,
> because I have a horrible feeling that we've been talking past each other
> and I'd like to see this support merged for 5.4.

Thanks very much for your view, I'm sorry that I don't have IRC. I will
send the next version quickly if we have a conclusion here. Then Which
way is better? If you'd like keep the pagetable code clean, I will add
the "4gb mode" special flow into mtk_iommu.c.

Thanks.


[1]http://lists.infradead.org/pipermail/linux-mediatek/2019-June/020988.html
[2]
https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/iommu/io-pgtable-arm-v7s.c#L299
[3]http://lists.infradead.org/pipermail/linux-mediatek/2019-June/020991.html

[4]==4gb mode special flow in mtk_iommu.c==

+#define MTK_IOMMU_4GB_MODE_REMAP_BASE   0x14000UL

@@ -380,12 +379,16 @@ static int mtk_iommu_map(struct iommu_domain
*domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
unsigned long flags;
int ret;
 
+   /* The "4GB mode" M4U physically can not use the lower remap of Dram.
*/
+   if (data->enable_4GB)
+   paddr |= BIT_ULL(32);
+
spin_lock_irqsave(>pgtlock, flags);
-   ret = dom->iop->map(dom->iop, iova, paddr & DMA_BIT_MASK(32),
-   size, prot);
+   ret = dom->iop->map(dom->iop, iova, paddr, size, prot);
spin_unlock_irqrestore(>pgtlock, flags);
 
return ret;
@@ -422,8 +425,8 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct
iommu_domain *domain,
pa = dom->iop->iova_to_phys(dom->iop, iova);
spin_unlock_irqrestore(>pgtlock, flags);
 
-   if (data->enable_4GB && pa < MTK_IOMMU_4GB_MODE_REMAP_BASE)
-   pa |= BIT_ULL(32);
+

Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn

2019-08-15 Thread Christian König


Am 07.08.19 um 01:15 schrieb Jason Gunthorpe:

From: Jason Gunthorpe 

radeon is using a device global hash table to track what mmu_notifiers
have been registered on struct mm. This is better served with the new
get/put scheme instead.

radeon has a bug where it was not blocking notifier release() until all
the BO's had been invalidated. This could result in a use after free of
pages the BOs. This is tied into a second bug where radeon left the
notifiers running endlessly even once the interval tree became
empty. This could result in a use after free with module unload.

Both are fixed by changing the lifetime model, the BOs exist in the
interval tree with their natural lifetimes independent of the mm_struct
lifetime using the get/put scheme. The release runs synchronously and just
does invalidate_start across the entire interval tree to create the
required DMA fence.

Additions to the interval tree after release are already impossible as
only current->mm is used during the add.

Signed-off-by: Jason Gunthorpe 


Acked-by: Christian König 

But I'm wondering if we shouldn't completely drop radeon userptr support.

It's just to buggy,
Christian.


---
  drivers/gpu/drm/radeon/radeon.h|   3 -
  drivers/gpu/drm/radeon/radeon_device.c |   2 -
  drivers/gpu/drm/radeon/radeon_drv.c|   2 +
  drivers/gpu/drm/radeon/radeon_mn.c | 157 ++---
  4 files changed, 38 insertions(+), 126 deletions(-)

AMD team: I wonder if kfd has similar lifetime issues?

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 32808e50be12f8..918164f90b114a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2451,9 +2451,6 @@ struct radeon_device {
/* tracking pinned memory */
u64 vram_pin_size;
u64 gart_pin_size;
-
-   struct mutexmn_lock;
-   DECLARE_HASHTABLE(mn_hash, 7);
  };
  
  bool radeon_is_px(struct drm_device *dev);

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index dceb554e567446..788b1d8a80e660 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1325,8 +1325,6 @@ int radeon_device_init(struct radeon_device *rdev,
init_rwsem(>pm.mclk_lock);
init_rwsem(>exclusive_lock);
init_waitqueue_head(>irq.vblank_queue);
-   mutex_init(>mn_lock);
-   hash_init(rdev->mn_hash);
r = radeon_gem_init(rdev);
if (r)
return r;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index a6cbe11f79c611..b6535ac91fdb74 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -35,6 +35,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -624,6 +625,7 @@ static void __exit radeon_exit(void)
  {
pci_unregister_driver(pdriver);
radeon_unregister_atpx_handler();
+   mmu_notifier_synchronize();
  }
  
  module_init(radeon_init);

diff --git a/drivers/gpu/drm/radeon/radeon_mn.c 
b/drivers/gpu/drm/radeon/radeon_mn.c
index 8c3871ed23a9f0..fc8254273a800b 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -37,17 +37,8 @@
  #include "radeon.h"
  
  struct radeon_mn {

-   /* constant after initialisation */
-   struct radeon_device*rdev;
-   struct mm_struct*mm;
struct mmu_notifier mn;
  
-	/* only used on destruction */

-   struct work_struct  work;
-
-   /* protected by rdev->mn_lock */
-   struct hlist_node   node;
-
/* objects protected by lock */
struct mutexlock;
struct rb_root_cached   objects;
@@ -58,55 +49,6 @@ struct radeon_mn_node {
struct list_headbos;
  };
  
-/**

- * radeon_mn_destroy - destroy the rmn
- *
- * @work: previously sheduled work item
- *
- * Lazy destroys the notifier from a work item
- */
-static void radeon_mn_destroy(struct work_struct *work)
-{
-   struct radeon_mn *rmn = container_of(work, struct radeon_mn, work);
-   struct radeon_device *rdev = rmn->rdev;
-   struct radeon_mn_node *node, *next_node;
-   struct radeon_bo *bo, *next_bo;
-
-   mutex_lock(>mn_lock);
-   mutex_lock(>lock);
-   hash_del(>node);
-   rbtree_postorder_for_each_entry_safe(node, next_node,
->objects.rb_root, it.rb) {
-
-   interval_tree_remove(>it, >objects);
-   list_for_each_entry_safe(bo, next_bo, >bos, mn_list) {
-   bo->mn = NULL;
-   list_del_init(>mn_list);
-   }
-   kfree(node);
-   }
-   mutex_unlock(>lock);
-   mutex_unlock(>mn_lock);
-   mmu_notifier_unregister(>mn, rmn->mm);
-   kfree(rmn);
-}
-
-/**
- * radeon_mn_release - callback to notify about mm destruction
- *
- * @mn: our notifier
- *

Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP

2019-08-15 Thread Christoph Hellwig

Helger, or other parisc folks: can you take a look at this patch
in particular and the series in general?  Thanks!

Re: [PATCH v9 0/5] treewide: improve R-Car SDHI performance

2019-08-15 Thread Christoph Hellwig

So, what are we going to do with this series?  As said before I'd
volunteer to pick this up through the dma-mapping tree, but I'd like
to see ACKs from the other maintainers as well.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 06/10] iommu: Remember when default domain type was set on kernel command line

2019-08-15 Thread Joerg Roedel

Hey Lu Baolu,

thanks for your review!

On Thu, Aug 15, 2019 at 01:01:57PM +0800, Lu Baolu wrote:
> > +#define IOMMU_CMD_LINE_DMA_API (1 << 0)
> 
> Prefer BIT() micro?

Yes, I'll change that.

> > +   iommu_set_cmd_line_dma_api();
> 
> IOMMU command line is also set in other places, for example,
> iommu_setup() (arch/x86/kernel/pci-dma.c). Need to call this there as
> well?

You are right, I'll better add a 'bool cmd_line' parameter to the
iommu_set_default_*() functions and tell the IOMMU core this way. That
will also fix iommu=pt/nopt.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/10] iommu: Set default domain type at runtime

2019-08-15 Thread Lu Baolu


Hi,

On 8/14/19 9:38 PM, Joerg Roedel wrote:

From: Joerg Roedel 

Set the default domain-type at runtime, not at compile-time.
This keeps default domain type setting in one place when we
have to change it at runtime.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/iommu.c | 23 +++
  1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 233bc22b487e..96cc7cc8ab21 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,11 +26,8 @@
  
  static struct kset *iommu_group_kset;

  static DEFINE_IDA(iommu_group_ida);
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
-#else
-static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
-#endif
+
+static unsigned int iommu_def_domain_type __read_mostly;
  static bool iommu_dma_strict __read_mostly = true;
  static u32 iommu_cmd_line __read_mostly;
  
@@ -76,7 +73,7 @@ static void iommu_set_cmd_line_dma_api(void)

iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
  }
  
-static bool __maybe_unused iommu_cmd_line_dma_api(void)

+static bool iommu_cmd_line_dma_api(void)
  {
return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
  }
@@ -115,8 +112,18 @@ static const char *iommu_domain_type_str(unsigned int t)
  
  static int __init iommu_subsys_init(void)

  {
-   pr_info("Default domain type: %s\n",
-   iommu_domain_type_str(iommu_def_domain_type));
+   bool cmd_line = iommu_cmd_line_dma_api();
+
+   if (!cmd_line) {
+   if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH))
+   iommu_set_default_passthrough();
+   else
+   iommu_set_default_translated();


This overrides kernel parameters parsed in iommu_setup(), for example,
iommu=pt won't work anymore.

Best regards,
Lu Baolu


+   }
+
+   pr_info("Default domain type: %s %s\n",
+   iommu_domain_type_str(iommu_def_domain_type),
+   cmd_line ? "(set via kernel command line)" : "");
  
  	return 0;

  }

Re: [PATCH v6 5/8] iommu: Add bounce page APIs

2019-08-15 Thread Lu Baolu


Hi Joerg,

On 8/14/19 4:38 PM, Joerg Roedel wrote:

Hi Lu Baolu,

On Tue, Jul 30, 2019 at 12:52:26PM +0800, Lu Baolu wrote:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
   - Map a buffer start at DMA address @addr in bounce page
 manner. For buffer parts that doesn't cross a whole
 minimal IOMMU page, the bounce page policy is applied.
 A bounce page mapped by swiotlb will be used as the DMA
 target in the IOMMU page table. Otherwise, the physical
 address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
 page will be torn down after the bounced data get synced.

* iommu_bounce_sync(dev, addr, size, dir, target)
   - Synce the bounced data in case the bounce mapped buffer is
 reused.


I don't really get why this API extension is needed for your use-case.
Can't this just be done using iommu_map/unmap operations? Can you please
elaborate a bit why these functions are needed?



iommu_map/unmap() APIs haven't parameters for dma direction and
attributions. These parameters are elementary for DMA APIs. Say,
after map, if the dma direction is TO_DEVICE and a bounce buffer is
used, we must sync the data from the original dma buffer to the bounce
buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap,
we need to sync the data from the bounce buffer onto the original
buffer.

The code in these functions are common to all iommu drivers which want
to use bounce pages for untrusted devices. So I put them in the iommu.c.
Or, maybe drivers/iommu/dma-iommu.c is more suitable?

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

1 2 >

100 matches

Mail list logo