RE: [PATCH V4 1/3] iommu: Add support to change default domain of an iommu group

2020-06-30 Thread Prakhya, Sai Praneeth
Hi Joerg,

> -Original Message-
> From: Joerg Roedel 
> Sent: Tuesday, June 30, 2020 2:16 AM
> To: Prakhya, Sai Praneeth 
> Cc: iommu@lists.linux-foundation.org; Christoph Hellwig ; Raj,
> Ashok ; Will Deacon ; Lu Baolu
> ; Mehta, Sohil ; Robin
> Murphy ; Jacob Pan 
> Subject: Re: [PATCH V4 1/3] iommu: Add support to change default domain of
> an iommu group
> 
> On Thu, Jun 04, 2020 at 06:32:06PM -0700, Sai Praneeth Prakhya wrote:
> > +static int iommu_change_dev_def_domain(struct iommu_group *group, int
> > +type) {
> > +   struct iommu_domain *prev_dom;
> > +   struct group_device *grp_dev;
> > +   const struct iommu_ops *ops;
> > +   int ret, dev_def_dom;
> > +   struct device *dev;
> > +
> > +   if (!group)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>mutex);
> > +
> > +   if (group->default_domain != group->domain) {
> > +   pr_err_ratelimited("Group assigned to user level for direct
> > +access\n");
> 
> Make this message: "Group not assigned to default domain\n".

Sure! I will change it

> > +   ret = -EBUSY;
> > +   goto out;
> > +   }
> > +
> > +   /*
> > +* iommu group wasn't locked while acquiring device lock in
> > +* iommu_group_store_type(). So, make sure that the device count
> hasn't
> > +* changed while acquiring device lock.
> > +*
> > +* Changing default domain of an iommu group with two or more
> devices
> > +* isn't supported because there could be a potential deadlock. Consider
> > +* the following scenario. T1 is trying to acquire device locks of all
> > +* the devices in the group and before it could acquire all of them,
> > +* there could be another thread T2 (from different sub-system and use
> > +* case) that has already acquired some of the device locks and might be
> > +* waiting for T1 to release other device locks.
> > +*/
> > +   if (iommu_group_device_count(group) != 1) {
> > +   pr_err_ratelimited("Cannot change default domain of a group
> with
> > +two or more devices\n");
> 
> "Can not change default domain: Group has more than one device\n"

Ok.. make sense. I will change this.

> > +   ret = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > +   /* Since group has only one device */
> > +   list_for_each_entry(grp_dev, >devices, list)
> > +   dev = grp_dev->dev;
> > +
> > +   prev_dom = group->default_domain;
> > +   if (!prev_dom || !prev_dom->ops || !prev_dom->ops-
> >def_domain_type) {
> > +   pr_err_ratelimited("'def_domain_type' call back isn't
> > +registered\n");
> 
> This message isn't needed.

Ok. I will remove it.

> > +   ret = __iommu_attach_device(group->default_domain, dev);
> > +   if (ret)
> > +   goto free_new_domain;
> > +
> > +   group->domain = group->default_domain;
> > +
> > +   ret = iommu_create_device_direct_mappings(group, dev);
> > +   if (ret)
> > +   goto free_new_domain;
> 
> You need to create the direct mappings before you attach the device to the new
> domain. Otherwise there might be a short time-window where RMRR regions
> are not mapped.

Ok.. makes sense. I will change this accordingly.

> > +static ssize_t iommu_group_store_type(struct iommu_group *group,
> > + const char *buf, size_t count) {
> > +   struct group_device *grp_dev;
> > +   struct device *dev;
> > +   int ret, req_type;
> > +
> > +   if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
> > +   return -EACCES;
> > +
> > +   if (WARN_ON(!group))
> > +   return -EINVAL;
> > +
> > +   if (sysfs_streq(buf, "identity"))
> > +   req_type = IOMMU_DOMAIN_IDENTITY;
> > +   else if (sysfs_streq(buf, "DMA"))
> > +   req_type = IOMMU_DOMAIN_DMA;
> > +   else if (sysfs_streq(buf, "auto"))
> > +   req_type = 0;
> > +   else
> > +   return -EINVAL;
> > +
> > +   /*
> > +* Lock/Unlock the group mutex here before device lock to
> > +* 1. Make sure that the iommu group has only one device (this is a
> > +*prerequisite for step 2)
> > +* 2. Get struct *dev which is needed to lock device
> > +*/
> > +   mutex_lock(>mutex);
> > +   if (iommu_group_device_count(group) != 1) {
> > +   mutex_unlock(>mutex);
> > +   pr_err_ratelimited("Cannot change default domain of a group
> with two or more devices\n");
> > +   return -EINVAL;
> > +   }
> > +
> > +   /* Since group has only one device */
> > +   list_for_each_entry(grp_dev, >devices, list)
> > +   dev = grp_dev->dev;
> 
> Please use list_first_entry().

Ok.

> You also need to take a reference with get_device() and then drop the
> group->mutex.

Sure! I will change it.

> After device_lock() you need to verify that the device is still in the same 
> group
> and that the group has still only one device in it.

Presently, iommu_change_dev_def_domain() checks if the iommu group still has
only one device or not. Hence, checking if iommu group has one device or 

Re: [PATCH 4/4] iommu/vt-d: Add page response ops support

2020-06-30 Thread Lu Baolu

Hi Kevin,

On 6/30/20 2:19 PM, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Sunday, June 28, 2020 8:34 AM

After a page request is handled, software must response the device which
raised the page request with the handling result. This is done through
the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c |  1 +
  drivers/iommu/intel/svm.c   | 73
+
  include/linux/intel-iommu.h |  3 ++
  3 files changed, 77 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index de17952ed133..7eb29167e8f9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -6057,6 +6057,7 @@ const struct iommu_ops intel_iommu_ops = {
.sva_bind   = intel_svm_bind,
.sva_unbind = intel_svm_unbind,
.sva_get_pasid  = intel_svm_get_pasid,
+   .page_response  = intel_svm_page_response,
  #endif
  };

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4800bb6f8794..003ea9579632 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1092,3 +1092,76 @@ int intel_svm_get_pasid(struct iommu_sva *sva)

return pasid;
  }
+
+int intel_svm_page_response(struct device *dev,
+   struct iommu_fault_event *evt,
+   struct iommu_page_response *msg)
+{
+   struct iommu_fault_page_request *prm;
+   struct intel_svm_dev *sdev;
+   struct intel_iommu *iommu;
+   struct intel_svm *svm;
+   bool private_present;
+   bool pasid_present;
+   bool last_page;
+   u8 bus, devfn;
+   int ret = 0;
+   u16 sid;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;


move to the place when iommu is referenced. This place is too early.


I took this as a sanity check. If the device has no iommu backed, we
should consider it as an invalid input.




+
+   if (!msg || !evt)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+
+   prm = >fault.prm;
+   sid = PCI_DEVID(bus, devfn);
+   pasid_present = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   private_present = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   last_page = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+   if (pasid_present) {
+   /* VT-d supports devices with full 20 bit PASIDs only */
+   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) {
+   ret = -EINVAL;
+   goto out;
+   }


shouldn't we check prm->pasid here? Above is more reasonable to be
checked when page request is reported.


Yes. I will check the pasid in both places.




+
+   ret = pasid_to_svm_sdev(dev, prm->pasid, , );
+   if (ret || !sdev)


if sdev==NULL, suppose an error (-ENODEV) should be returned here?


Yes. Good catch. I should return an error if sdev==NULL.




+   goto out;
+   }
+
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must respond
+* with page group response if private data is present (PDP)
+* or last page in group (LPIG) bit is set. This is an
+* additional VT-d feature beyond PCI ATS spec.


feature->requirement


Agreed.



Thanks
Kevin


Best regards,
baolu




+*/
+   if (last_page || private_present) {
+   struct qi_desc desc;
+
+   desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid)
|
+   QI_PGRP_PASID_P(pasid_present) |
+   QI_PGRP_PDP(private_present) |
+   QI_PGRP_RESP_CODE(msg->code) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(prm->grpid) |
QI_PGRP_LPIG(last_page);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   if (private_present)
+   memcpy(, prm->private_data,
+  sizeof(prm->private_data));
+
+   qi_submit_sync(iommu, , 1, 0);
+   }
+out:
+   mutex_unlock(_mutex);
+   return ret;
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fc2cfc3db6e1..bf6009a344f5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -741,6 +741,9 @@ struct iommu_sva *intel_svm_bind(struct device
*dev, struct mm_struct *mm,
 void *drvdata);
  void intel_svm_unbind(struct iommu_sva *handle);
  int intel_svm_get_pasid(struct iommu_sva *handle);
+int 

Re: [PATCH v5 01/12] iommu: Change type of pasid to u32

2020-06-30 Thread Felix Kuehling
Am 2020-06-30 um 7:44 p.m. schrieb Fenghua Yu:
> PASID is defined as a few different types in iommu including "int",
> "u32", and "unsigned int". To be consistent and to match with uapi
> definitions, define PASID and its variations (e.g. max PASID) as "u32".
> "u32" is also shorter and a little more explicit than "unsigned int".

You didn't change the return types of amdgpu_pasid_alloc and
kfd_pasid_alloc. amdgpu_pasid_alloc returns int, because it can return
negative error codes. But kfd_pasid_alloc could be updated, because it
returns 0 for errors.

Regards,
  Felix

>
> No PASID type change in uapi although it defines PASID as __u64 in
> some places.
>
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 
> Reviewed-by: Lu Baolu 
> ---
> v5:
> - Reviewed by Lu Baolu
>
> v4:
> - Change PASID type from "unsigned int" to "u32" (Christoph)
>
> v2:
> - Create this new patch to define PASID as "unsigned int" consistently in
>   iommu (Thomas)
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
>  .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 18 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
>  .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
>  drivers/iommu/amd/amd_iommu.h | 10 +++---
>  drivers/iommu/amd/iommu.c | 31 ++-
>  drivers/iommu/amd/iommu_v2.c  | 20 ++--
>  drivers/iommu/intel/dmar.c|  7 +++--
>  drivers/iommu/intel/intel-pasid.h | 24 +++---
>  drivers/iommu/intel/iommu.c   |  4 +--
>  drivers/iommu/intel/pasid.c   | 31 +--
>  drivers/iommu/intel/svm.c | 12 +++
>  drivers/iommu/iommu.c |  2 +-
>  drivers/misc/uacce/uacce.c|  2 +-
>  include/linux/amd-iommu.h |  8 ++---
>  include/linux/intel-iommu.h   | 12 +++
>  include/linux/intel-svm.h |  2 +-
>  include/linux/iommu.h | 10 +++---
>  include/linux/uacce.h |  2 +-
>  38 files changed, 139 insertions(+), 139 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index ffe149aafc39..dfef5a7e0f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct 
> kgd_dev *dst, struct kgd_dev *s
>   })
>  
>  /* GPUVM API */
> -int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
> pasid,
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
>   void **vm, void **process_info,
>   struct dma_fence **ef);
>  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
> - struct file *filp, unsigned int pasid,
> + struct file *filp, u32 pasid,
>   void **vm, void **process_info,
>   struct dma_fence **ef);
>  void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index bf927f432506..ee531c3988d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
> *kgd, uint32_t vmid,
>   unlock_srbm(kgd);
>  }
>  
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int 
> pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev 

Re: [PATCH 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-06-30 Thread Lu Baolu

Hi Kevin,

Thanks a lot for reviewing my patches.

On 6/30/20 2:01 PM, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Sunday, June 28, 2020 8:34 AM

A pasid might be bound to a page table from a VM guest via the iommu
ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
on the physical IOMMU, we need to inject the page fault request into
the guest. After the guest completes handling the page fault, a page
response need to be sent back via the iommu ops.page_response().

This adds support to report a page request fault. Any external module
which is interested in handling this fault should regiester a notifier
callback.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/svm.c | 83
+--
  1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c23167877b2b..4800bb6f8794 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -815,6 +815,69 @@ static void intel_svm_drain_prq(struct device *dev,
int pasid)
}
  }

+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+   int prot = 0;
+
+   if (req->rd_req)
+   prot |= IOMMU_FAULT_PERM_READ;
+   if (req->wr_req)
+   prot |= IOMMU_FAULT_PERM_WRITE;
+   if (req->exe_req)
+   prot |= IOMMU_FAULT_PERM_EXEC;
+   if (req->pm_req)
+   prot |= IOMMU_FAULT_PERM_PRIV;
+
+   return prot;
+}
+
+static int
+intel_svm_prq_report(struct intel_iommu *iommu, struct page_req_dsc
*desc)
+{
+   struct iommu_fault_event event;
+   struct pci_dev *pdev;
+   u8 bus, devfn;
+   int ret = 0;
+
+   memset(, 0, sizeof(struct iommu_fault_event));
+   bus = PCI_BUS_NUM(desc->rid);
+   devfn = desc->rid & 0xff;
+   pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);


Is this step necessary? dev can be passed in (based on sdev), and more
importantly iommu_report_device_fault already handles the ref counting
e.g. get_device(dev) when fault handler is valid...


Yes, agreed. I will pass device in instead.




+
+   if (!pdev) {
+   pr_err("No PCI device found for PRQ [%02x:%02x.%d]\n",
+  bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+   return -ENODEV;
+   }
+
+   /* Fill in event data for device specific processing */
+   event.fault.type = IOMMU_FAULT_PAGE_REQ;
+   event.fault.prm.addr = desc->addr;
+   event.fault.prm.pasid = desc->pasid;
+   event.fault.prm.grpid = desc->prg_index;
+   event.fault.prm.perm = prq_to_iommu_prot(desc);
+
+   /*
+* Set last page in group bit if private data is present,
+* page response is required as it does for LPIG.
+*/
+   if (desc->lpig)
+   event.fault.prm.flags |=
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   if (desc->pasid_present)
+   event.fault.prm.flags |=
IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   if (desc->priv_data_present) {
+   event.fault.prm.flags |=
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;


why setting lpig under this condition?


/*
 * Per VT-d spec. v3.0 ch7.7, system software must
 * respond with page group response if private data
 * is present (PDP) or last page in group (LPIG) bit
 * is set. This is an additional VT-d feature beyond
 * PCI ATS spec.
 */




+   event.fault.prm.flags |=
IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   memcpy(event.fault.prm.private_data, desc->priv_data,
+  sizeof(desc->priv_data));
+   }
+
+   ret = iommu_report_device_fault(>dev, );
+   pci_dev_put(pdev);
+
+   return ret;
+}
+
  static irqreturn_t prq_event_thread(int irq, void *d)
  {
struct intel_iommu *iommu = d;
@@ -874,6 +937,19 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (!is_canonical_address(address))
goto bad_req;

+   /*
+* If prq is to be handled outside iommu driver via receiver of
+* the fault notifiers, we skip the page response here.
+*/
+   if (svm->flags & SVM_FLAG_GUEST_MODE) {
+   int res = intel_svm_prq_report(iommu, req);
+
+   if (!res)
+   goto prq_advance;
+   else
+   goto bad_req;
+   }
+


I noted in bad_req there is another reporting logic:

 if (sdev && sdev->ops && sdev->ops->fault_cb) {
 int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
 (req->exe_req << 1) | (req->pm_req);
 sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
 req->priv_data, rwxp, result);

Re: [PATCH v5 02/10] iommu/mediatek: Rename the register STANDARD_AXI_MODE(0x48) to MISC_CTRL

2020-06-30 Thread Yong Wu
On Mon, 2020-06-29 at 15:13 +0800, Chao Hao wrote:
> For iommu offset=0x48 register, only the previous mt8173/mt8183 use the
> name STANDARD_AXI_MODE, all the latest SoC extend the register more
> feature by different bits, for example: axi_mode, in_order_en, coherent_en
> and so on. So rename REG_MMU_MISC_CTRL may be more proper.
> 
> This patch only rename the register name, no functional change.
> 
> Signed-off-by: Chao Hao 
> Reviewed-by: Yong Wu 
> Reviewed-by: Matthias Brugger 
> ---
>  drivers/iommu/mtk_iommu.c | 14 +++---
>  drivers/iommu/mtk_iommu.h |  2 +-
>  2 files changed, 8 insertions(+), 8 deletions(-)

...

> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> index ea949a324e33..1b6ea839b92c 100644
> --- a/drivers/iommu/mtk_iommu.h
> +++ b/drivers/iommu/mtk_iommu.h
> @@ -18,7 +18,7 @@
>  #include 
>  
>  struct mtk_iommu_suspend_reg {
> - u32 standard_axi_mode;
> + u32 misc_ctrl;

Here will cause build fail for v1:

drivers/iommu/mtk_iommu_v1.c:675:20: error: 'struct
mtk_iommu_suspend_reg' has no member named 'standard_axi_mode'
  writel_relaxed(reg->standard_axi_mode,
^

We could change something like:

union {
u32 standard_axi_mode; /* only for v1 */
u32 misc_ctrl; /* only for v2 */
};

>   u32 dcm_dis;
>   u32 ctrl_reg;
>   u32 int_control0;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/vt-d: Move Kconfig and Makefile bits down into intel directory

2020-06-30 Thread Lu Baolu

Hi Jerry,

On 7/1/20 4:06 AM, Jerry Snitselaar wrote:

Move Intel Kconfig and Makefile bits down into intel directory
with the rest of the Intel specific files.

Cc: Joerg Roedel 
Cc: Lu Baolu 


Reviewed-by: Lu Baolu 

Best regards,
baolu


Signed-off-by: Jerry Snitselaar 
---
  drivers/iommu/Kconfig| 86 +---
  drivers/iommu/Makefile   |  8 +---
  drivers/iommu/intel/Kconfig  | 86 
  drivers/iommu/intel/Makefile |  7 +++
  4 files changed, 96 insertions(+), 91 deletions(-)
  create mode 100644 drivers/iommu/intel/Kconfig
  create mode 100644 drivers/iommu/intel/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a..281cd6bd0fe0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -176,91 +176,7 @@ config AMD_IOMMU_DEBUGFS
  This option is -NOT- intended for production environments, and should
  not generally be enabled.
  
-# Intel IOMMU support

-config DMAR_TABLE
-   bool
-
-config INTEL_IOMMU
-   bool "Support for Intel IOMMU using DMA Remapping Devices"
-   depends on PCI_MSI && ACPI && (X86 || IA64)
-   select IOMMU_API
-   select IOMMU_IOVA
-   select NEED_DMA_MAP_STATE
-   select DMAR_TABLE
-   select SWIOTLB
-   select IOASID
-   help
- DMA remapping (DMAR) devices support enables independent address
- translations for Direct Memory Access (DMA) from devices.
- These DMA remapping devices are reported via ACPI tables
- and include PCI device scope covered by these DMA
- remapping devices.
-
-config INTEL_IOMMU_DEBUGFS
-   bool "Export Intel IOMMU internals in Debugfs"
-   depends on INTEL_IOMMU && IOMMU_DEBUGFS
-   help
- !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!!!
-
- Expose Intel IOMMU internals in Debugfs.
-
- This option is -NOT- intended for production environments, and should
- only be enabled for debugging Intel IOMMU.
-
-config INTEL_IOMMU_SVM
-   bool "Support for Shared Virtual Memory with Intel IOMMU"
-   depends on INTEL_IOMMU && X86_64
-   select PCI_PASID
-   select PCI_PRI
-   select MMU_NOTIFIER
-   select IOASID
-   help
- Shared Virtual Memory (SVM) provides a facility for devices
- to access DMA resources through process address space by
- means of a Process Address Space ID (PASID).
-
-config INTEL_IOMMU_DEFAULT_ON
-   def_bool y
-   prompt "Enable Intel DMA Remapping Devices by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable a DMAR device at boot time if
- one is found. If this option is not selected, DMAR support can
- be enabled by passing intel_iommu=on to the kernel.
-
-config INTEL_IOMMU_BROKEN_GFX_WA
-   bool "Workaround broken graphics drivers (going away soon)"
-   depends on INTEL_IOMMU && BROKEN && X86
-   help
- Current Graphics drivers tend to use physical address
- for DMA and avoid using DMA APIs. Setting this config
- option permits the IOMMU driver to set a unity map for
- all the OS-visible memory. Hence the driver can continue
- to use physical addresses for DMA, at least until this
- option is removed in the 2.6.32 kernel.
-
-config INTEL_IOMMU_FLOPPY_WA
-   def_bool y
-   depends on INTEL_IOMMU && X86
-   help
- Floppy disk drivers are known to bypass DMA API calls
- thereby failing to work when IOMMU is enabled. This
- workaround will setup a 1:1 mapping for the first
- 16MiB to make floppy (an ISA device) work.
-
-config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
-   bool "Enable Intel IOMMU scalable mode by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable by default the scalable mode if
- hardware presents the capability. The scalable mode is defined in
- VT-d 3.0. The scalable mode capability could be checked by reading
- /sys/devices/virtual/iommu/dmar*/intel-iommu/ecap. If this option
- is not selected, scalable mode support could also be enabled by
- passing intel_iommu=sm_on to the kernel. If not sure, please use
- the default value.
+source "drivers/iommu/intel/Kconfig"
  
  config IRQ_REMAP

bool "Support for Interrupt Remapping"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..71dd2f382e78 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,4 +1,5 @@
  # SPDX-License-Identifier: GPL-2.0
+obj-y += intel/
  obj-$(CONFIG_IOMMU_API) += iommu.o
  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -17,13 +18,8 @@ obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
  arm_smmu-objs += arm-smmu.o 

Re: [PATCH 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-06-30 Thread Lu Baolu

Hi Jacob,

On 7/1/20 1:34 AM, Jacob Pan wrote:

On Thu, 25 Jun 2020 18:10:43 +0800
Lu Baolu  wrote:


Hi,

On 2020/6/23 23:43, Jacob Pan wrote:

For guest requested IOTLB invalidation, address and mask are
provided as part of the invalidation data. VT-d HW silently ignores
any address bits below the mask. SW shall also allow such case but
give warning if address does not align with the mask. This patch
relax the fault handling from error to warning and proceed with
invalidation request with the given mask.

Signed-off-by: Jacob Pan
---
   drivers/iommu/intel/iommu.c | 7 +++
   1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c
b/drivers/iommu/intel/iommu.c index 5ea5732d5ec4..50fc62413a35
100644 --- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct
iommu_domain *domain, struct device *dev,
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on
address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr &
((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
-   pr_err_ratelimited("Address out of
range, 0x%llx, size order %llu\n",
-
inv_info->addr_info.addr, size);
-   ret = -ERANGE;
-   goto out_unlock;
+   WARN_ONCE(1, "Address out of
range, 0x%llx, size order %llu\n",
+
inv_info->addr_info.addr, size);

I don't think WARN_ONCE() is suitable here. It makes users think it's
a kernel bug. How about pr_warn_ratelimited()?


I think pr_warn_ratelimited might still be too chatty. There is no
functional issues, we just don't to silently ignore it. Perhaps just
say:
WARN_ONCE(1, "User provided address not page aligned, alignment forced")
?



WARN() is normally used for reporting a kernel bug. It dumps kernel
trace. And the users will report bug through bugzilla.kernel.org.

In this case, it's actually an unexpected user input, we shouldn't
treat it as a kernel bug and pr_err_ratelimited() is enough?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 31/36] staging: tegra-vde: fix common struct sg_table related issues

2020-06-30 Thread Dmitry Osipenko
30.06.2020 13:07, Marek Szyprowski пишет:
> On 21.06.2020 06:00, Dmitry Osipenko wrote:
>> В Fri, 19 Jun 2020 12:36:31 +0200
>> Marek Szyprowski  пишет:
>>
>>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg()
>>> function returns the number of the created entries in the DMA address
>>> space. However the subsequent calls to the
>>> dma_sync_sg_for_{device,cpu}() and dma_unmap_sg must be called with
>>> the original number of the entries passed to the dma_map_sg().
>>>
>>> struct sg_table is a common structure used for describing a
>>> non-contiguous memory buffer, used commonly in the DRM and graphics
>>> subsystems. It consists of a scatterlist with memory pages and DMA
>>> addresses (sgl entry), as well as the number of scatterlist entries:
>>> CPU pages (orig_nents entry) and DMA mapped pages (nents entry).
>>>
>>> It turned out that it was a common mistake to misuse nents and
>>> orig_nents entries, calling DMA-mapping functions with a wrong number
>>> of entries or ignoring the number of mapped entries returned by the
>>> dma_map_sg() function.
>>>
>>> To avoid such issues, lets use a common dma-mapping wrappers operating
>>> directly on the struct sg_table objects and use scatterlist page
>>> iterators where possible. This, almost always, hides references to the
>>> nents and orig_nents entries, making the code robust, easier to follow
>>> and copy/paste safe.
>>>
>>> Signed-off-by: Marek Szyprowski 
>>> Reviewed-by: Dmitry Osipenko 
>>> ---
>>>   drivers/staging/media/tegra-vde/iommu.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/staging/media/tegra-vde/iommu.c
>>> b/drivers/staging/media/tegra-vde/iommu.c index
>>> 6af863d92123..adf8dc7ee25c 100644 ---
>>> a/drivers/staging/media/tegra-vde/iommu.c +++
>>> b/drivers/staging/media/tegra-vde/iommu.c @@ -36,8 +36,8 @@ int
>>> tegra_vde_iommu_map(struct tegra_vde *vde,
>>> addr = iova_dma_addr(>iova, iova);
>>>   
>>> -   size = iommu_map_sg(vde->domain, addr, sgt->sgl, sgt->nents,
>>> -   IOMMU_READ | IOMMU_WRITE);
>>> +   size = iommu_map_sgtable(vde->domain, addr, sgt,
>>> +IOMMU_READ | IOMMU_WRITE);
>>> if (!size) {
>>> __free_iova(>iova, iova);
>>> return -ENXIO;
>> Ahh, I saw the build failure report. You're changing the DMA API in
>> this series, while DMA API isn't used by this driver, it uses IOMMU
>> API. Hence there is no need to touch this code. Similar problem in the
>> host1x driver patch.
> 
> The issue is caused by the lack of iommu_map_sgtable() stub when no 
> IOMMU support is configured. I've posted a patch for this:
> 
> https://lore.kernel.org/lkml/20200630081756.18526-1-m.szyprow...@samsung.com/
> 
> The patch for this driver is fine, we have to wait until the above fix 
> gets merged and then it can be applied during the next release cycle.

Thank you for the clarification!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 5/7] iommu/vt-d: Fix devTLB flush for vSVA

2020-06-30 Thread Lu Baolu

On 7/1/20 5:07 AM, Jacob Pan wrote:

From: Liu Yi L 

For guest SVA usage, in order to optimize for less VMEXIT, guest request
of IOTLB flush also includes device TLB.

On the host side, IOMMU driver performs IOTLB and implicit devTLB
invalidation. When PASID-selective granularity is requested by the guest
we need to derive the equivalent address range for devTLB instead of
using the address information in the UAPI data. The reason for that is, unlike
IOTLB flush, devTLB flush does not support PASID-selective granularity.
This is to say, we need to set the following in the PASID based devTLB
invalidation descriptor:
- entire 64 bit range in address ~(0x1 << 63)
- S bit = 1 (VT-d CH 6.5.2.6).

Without this fix, device TLB flush range is not set properly for PASID
selective granularity. This patch also merged devTLB flush code for both
implicit and explicit cases.

Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 


Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel/iommu.c | 28 ++--
  1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 96340da57075..6a0c62c7395c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5408,7 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
sid = PCI_DEVID(bus, devfn);
  
  	/* Size is only valid in address selective invalidation */

-   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
+   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
size = to_vtd_size(inv_info->addr_info.granule_size,
   inv_info->addr_info.nb_granules);
  
@@ -5417,6 +5417,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev,

 IOMMU_CACHE_INV_TYPE_NR) {
int granu = 0;
u64 pasid = 0;
+   u64 addr = 0;
  
  		granu = to_vtd_granularity(cache_type, inv_info->granularity);

if (granu == -EINVAL) {
@@ -5456,24 +5457,31 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
  
+			if (!info->ats_enabled)

+   break;
/*
 * Always flush device IOTLB if ATS is enabled. vIOMMU
 * in the guest may assume IOTLB flush is inclusive,
 * which is more efficient.
 */
-   if (info->ats_enabled)
-   qi_flush_dev_iotlb_pasid(iommu, sid,
-   info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
-   size);
-   break;
+   fallthrough;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+   /*
+* There is no PASID selective flush for device TLB, so
+* the equivalent of that is we set the size to be the
+* entire range of 64 bit. User only provides PASID info
+* without address info. So we set addr to 0.
+*/
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   size = 64 - VTD_PAGE_SHIFT;
+   addr = 0;
+   } else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR)
+   addr = inv_info->addr_info.addr;
+
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
+   info->ats_qdep, addr,
size);
else
pr_warn_ratelimited("Passdown device IOTLB flush w/o 
ATS!\n");


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/7] iommu/vt-d: Handle non-page aligned address

2020-06-30 Thread Lu Baolu

On 7/1/20 5:07 AM, Jacob Pan wrote:

From: Liu Yi L 

Address information for device TLB invalidation comes from userspace
when device is directly assigned to a guest with vIOMMU support.
VT-d requires page aligned address. This patch checks and enforce
address to be page aligned, otherwise reserved bits can be set in the
invalidation descriptor. Unrecoverable fault will be reported due to
non-zero value in the reserved bits.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 


Fixes: 61a06a16e36d8 ("iommu/vt-d: Support flushing more translation 
cache types")

Acked-by: Lu Baolu 

Best regards,
baolu


---
  drivers/iommu/intel/dmar.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d9f973fa1190..3899f3161071 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1455,9 +1455,25 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
 * Max Invs Pending (MIP) is set to 0 for now until we have DIT in
 * ECAP.
 */
-   desc.qw1 |= addr & ~mask;
-   if (size_order)
+   if (addr & ~VTD_PAGE_MASK)
+   pr_warn_ratelimited("Invalidate non-page aligned address 
%llx\n", addr);
+
+   /* Take page address */
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr);
+
+   if (size_order) {
+   /*
+* Existing 0s in address below size_order may be the least
+* significant bit, we must set them to 1s to avoid having
+* smaller size than desired.
+*/
+   desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT,
+   VTD_PAGE_SHIFT);
+   /* Clear size_order bit to indicate size */
+   desc.qw1 &= ~mask;
+   /* Set the S bit to indicate flushing more than 1 page */
desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+   }
  
  	qi_submit_sync(iommu, , 1, 0);

  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-06-30 Thread Lu Baolu

Hi Jacob,

On 7/1/20 5:07 AM, Jacob Pan wrote:

DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 
---
  drivers/iommu/intel/pasid.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..70d21209dd04 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
  
-	qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);

+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid == PASID_RID2PASID)
+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);


The if/else logic is reversed.

if (pasid == PASID_RID2PASID)
qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
else
		qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);


Best regards,
baolu


  }
  
  void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-06-30 Thread Qian Cai
On Wed, Apr 29, 2020 at 03:36:38PM +0200, Joerg Roedel wrote:
> Hi,
> 
> here is the third version of this patch-set. Older versions can be found
> here:
> 
>   v1: https://lore.kernel.org/lkml/20200407183742.4344-1-j...@8bytes.org/
>   (Has some more introductory text)
> 
>   v2: https://lore.kernel.org/lkml/20200414131542.25608-1-j...@8bytes.org/
> 
> Changes v2 -> v3:
> 
>   * Rebased v5.7-rc3
> 
>   * Added a missing iommu_group_put() as reported by Lu Baolu.
> 
>   * Added a patch to consolidate more initialization work in
> __iommu_probe_device(), fixing a bug where no 'struct
> device_iommu' was allocated in the hotplug path.
> 
> There is also a git-branch available with these patches applied:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device-v3
> 
> Please review. If there are no objections I plan to put these patches
> into the IOMMU tree early next week.

Looks like this patchset introduced an use-after-free on arm-smmu-v3.

Reproduced using mlx5,

# echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs 

The .config,
https://github.com/cailca/linux-mm/blob/master/arm64.config

Looking at the free stack,

iommu_release_device->iommu_group_remove_device

was introduced in 07/34 ("iommu: Add probe_device() and release_device()
call-backs").

[ 9426.724641][ T3356] pci :0b:01.2: Removing from iommu group 3
[ 9426.731347][ T3356] 
==
[ 9426.739263][ T3356] BUG: KASAN: use-after-free in 
__lock_acquire+0x3458/0x4440
__lock_acquire at kernel/locking/lockdep.c:4250
[ 9426.746477][ T3356] Read of size 8 at addr 0089df1a6f68 by task bash/3356
[ 9426.753601][ T3356]
[ 9426.755782][ T3356] CPU: 5 PID: 3356 Comm: bash Not tainted 
5.8.0-rc3-next-20200630 #2
[ 9426.763687][ T3356] Hardware name: HPE Apollo 70 /C01_APACHE_MB  
   , BIOS L50_5.13_1.11 06/18/2019
[ 9426.774111][ T3356] Call trace:
[ 9426.777245][ T3356]  dump_backtrace+0x0/0x398
[ 9426.781593][ T3356]  show_stack+0x14/0x20
[ 9426.785596][ T3356]  dump_stack+0x140/0x1b8
[ 9426.789772][ T3356]  print_address_description.isra.12+0x54/0x4a8
[ 9426.795855][ T3356]  kasan_report+0x134/0x1b8
[ 9426.800203][ T3356]  __asan_report_load8_noabort+0x2c/0x50
[ 9426.805679][ T3356]  __lock_acquire+0x3458/0x4440
[ 9426.810373][ T3356]  lock_acquire+0x204/0xf10
[ 9426.814722][ T3356]  _raw_spin_lock_irqsave+0xf8/0x180
[ 9426.819853][ T3356]  arm_smmu_detach_dev+0xd8/0x4a0
arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
[ 9426.824721][ T3356]  arm_smmu_release_device+0xb4/0x1c8
arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
(inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
[ 9426.829937][ T3356]  iommu_release_device+0xc0/0x178
iommu_release_device at drivers/iommu/iommu.c:302
[ 9426.834892][ T3356]  iommu_bus_notifier+0x118/0x160
[ 9426.839762][ T3356]  notifier_call_chain+0xa4/0x128
[ 9426.844630][ T3356]  __blocking_notifier_call_chain+0x70/0xa8
[ 9426.850367][ T3356]  blocking_notifier_call_chain+0x14/0x20
[ 9426.855929][ T3356]  device_del+0x618/0xa00
[ 9426.860105][ T3356]  pci_remove_bus_device+0x108/0x2d8
[ 9426.865233][ T3356]  pci_stop_and_remove_bus_device+0x1c/0x28
[ 9426.870972][ T3356]  pci_iov_remove_virtfn+0x228/0x368
[ 9426.876100][ T3356]  sriov_disable+0x8c/0x348
[ 9426.880447][ T3356]  pci_disable_sriov+0x5c/0x70
[ 9426.885117][ T3356]  mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
[ 9426.891549][ T3356]  sriov_numvfs_store+0x240/0x318
[ 9426.896417][ T3356]  dev_attr_store+0x38/0x68
[ 9426.900766][ T3356]  sysfs_kf_write+0xdc/0x128
[ 9426.905200][ T3356]  kernfs_fop_write+0x23c/0x448
[ 9426.909897][ T3356]  __vfs_write+0x54/0xe8
[ 9426.913984][ T3356]  vfs_write+0x124/0x3f0
[ 9426.918070][ T3356]  ksys_write+0xe8/0x1b8
[ 9426.922157][ T3356]  __arm64_sys_write+0x68/0x98
[ 9426.926766][ T3356]  do_el0_svc+0x124/0x220
[ 9426.930941][ T3356]  el0_sync_handler+0x260/0x408
[ 9426.935634][ T3356]  el0_sync+0x140/0x180
[ 9426.939633][ T3356]
[ 9426.941810][ T3356] Allocated by task 3356:
[ 9426.945985][ T3356]  save_stack+0x24/0x50
[ 9426.949986][ T3356]  __kasan_kmalloc.isra.13+0xc4/0xe0
[ 9426.955114][ T3356]  kasan_kmalloc+0xc/0x18
[ 9426.959288][ T3356]  kmem_cache_alloc_trace+0x1ec/0x318
[ 9426.964503][ T3356]  arm_smmu_domain_alloc+0x54/0x148
[ 9426.969545][ T3356]  iommu_group_alloc_default_domain+0xc0/0x440
[ 9426.975541][ T3356]  iommu_probe_device+0x1c0/0x308
[ 9426.980409][ T3356]  iort_iommu_configure+0x434/0x518
[ 9426.985452][ T3356]  acpi_dma_configure+0xf0/0x128
[ 9426.990235][ T3356]  pci_dma_configure+0x114/0x160
[ 9426.995017][ T3356]  really_probe+0x124/0x6d8
[ 9426.999364][ T3356]  driver_probe_device+0xc4/0x180
[ 9427.004232][ T3356]

[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros

2020-06-30 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same values from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d2054178df357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be618..c7d0122a7c6ca 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-06-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..662c46e16f07d 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 3
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation

2020-06-30 Thread Krishna Reddy
Changes in v9:
Move TLB Timeout and spin count macros to arm-smmu.h header to share with 
implementation.
Set minItems and maxItems for reg property when compatible contains 
nvidia,tegra194-smmu.
Update commit message for NVIDIA implementation patch.
Fail single SMMU instance usage through NVIDIA implementation to limit the 
usage to two or three instances.
Fix checkpatch warnings with --strict checking.

v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 304 ++
 drivers/iommu/arm-smmu.c  |  20 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 349 insertions(+), 6 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of ARM MMU-500s together to interleave IOVA accesses
across them and must be programmed identically.
The third SMMU instance is used as a regular ARM MMU-500 and it
can either be programmed independently or identical to other
two ARM MMU-500s.

This implementation supports programming two or three ARM MMU-500s
identically as per DT config.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 206 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..f15571d05474e 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..5c874912e1c1a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < 

[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks

2020-06-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5c874912e1c1a..d279788eab954 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054178df357..161d68c8208a4 100644
--- a/drivers/iommu/arm-smmu.c
+++ 

[PATCH v5 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-06-30 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v3:
- Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
- Fix a couple of typos (Baolu)

v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 287 
 2 files changed, 288 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..7242a84169ef
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+  context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device tlb in sync. For example, when a page-table entry is invalidated,
+  IOMMU propagates the invalidation to device tlb. This will force any
+  future access by the device to this virtual address to participate in
+  ATS. 

[PATCH v5 01/12] iommu: Change type of pasid to u32

2020-06-30 Thread Fenghua Yu
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".

No PASID type change in uapi although it defines PASID as __u64 in
some places.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Reviewed by Lu Baolu

v4:
- Change PASID type from "unsigned int" to "u32" (Christoph)

v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 18 +--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
 drivers/iommu/amd/amd_iommu.h | 10 +++---
 drivers/iommu/amd/iommu.c | 31 ++-
 drivers/iommu/amd/iommu_v2.c  | 20 ++--
 drivers/iommu/intel/dmar.c|  7 +++--
 drivers/iommu/intel/intel-pasid.h | 24 +++---
 drivers/iommu/intel/iommu.c   |  4 +--
 drivers/iommu/intel/pasid.c   | 31 +--
 drivers/iommu/intel/svm.c | 12 +++
 drivers/iommu/iommu.c |  2 +-
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/amd-iommu.h |  8 ++---
 include/linux/intel-iommu.h   | 12 +++
 include/linux/intel-svm.h |  2 +-
 include/linux/iommu.h | 10 +++---
 include/linux/uacce.h |  2 +-
 38 files changed, 139 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ffe149aafc39..dfef5a7e0f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
})
 
 /* GPUVM API */
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-   struct file *filp, unsigned int pasid,
+   struct file *filp, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index bf927f432506..ee531c3988d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
*kgd, uint32_t vmid,
unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
unsigned int vmid)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 744366c7ee85..4d41317b9292 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -139,7 +139,7 @@ static void 

[PATCH v5 08/12] fork: Clear PASID for new mm

2020-06-30 Thread Fenghua Yu
When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch to initialize PASID value for a new mm.

 include/linux/mm_types.h | 2 ++
 kernel/fork.c| 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d61285cfe027..d60d2ec10881 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+/* Initial PASID value is 0. */
+#define INIT_PASID 0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..43b5f112604d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_IOMMU_SUPPORT
+   mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+   mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 09/12] x86/process: Clear PASID state for a newly forked/cloned thread

2020-06-30 Thread Fenghua Yu
The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify init_task_pasid().

 arch/x86/kernel/process.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned 
long tls)
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+   struct ia32_pasid_state *ppasid;
+
+   /*
+* Initialize the PASID state so that the PASID MSR will be
+* initialized to its initial state (0) by XRSTORS when the task is
+* scheduled for the first time.
+*/
+   ppasid = get_xsave_addr(>thread.fpu.state.xsave, XFEATURE_PASID);
+   if (ppasid)
+   ppasid->pasid = INIT_PASID;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
 {
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
+   if (static_cpu_has(X86_FEATURE_ENQCMD))
+   init_task_pasid(p);
+
/* Set a new TLS for the child thread? */
if (clone_flags & CLONE_SETTLS)
ret = set_new_tls(p, tls);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 00/12] x86: tag application address space for devices

2020-06-30 Thread Fenghua Yu
Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-2). These patches can be in a
   separate series.
   - Define "pasid" as "u32" consistently
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 3).
3. Enumerate support for ENQCMD in the processor (patch 4).
4. Handle FPU PASID state and the MSR during context switch (patches 5-6).
5. Define "pasid" in mm_struct (patch 7).
5. Clear PASID state for new mm and forked and cloned thread (patch 8-9).
6. Allocate and free PASID for a process (patch 10).
7. Fix up the PASID MSR in #GP handler when one thread in a process
   executes ENQCMD for the first time (patches 11-12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v5:
- Mark ENQCMD disabled when configured out and use cpu_feature_enabled()
  to simplify the feature checking code in patch 10 and 12 (PeterZ and
  Dave Hansen)
- Add Reviewed-by: Lu Baolu to patch 1, 2, 10, and 12.

v4:
- Define PASID as "u32" instead of "unsigned int" in patch 1, 7, 10, 12.
  (Christoph)
- Drop v3 patch 2 which changes PASID type in ocxl because it's not related
  to x86 and was rejected by ocxl maintainer Frederic Barrat
- A split patch which changes PASID type to u32 in crypto/hisilicon/qm.c
  was released separately to linux-crypto mailing list because it's not
  related to x86 and is a standalone patch:

v3:
- Change names of bind_mm() and unbind_mm() to match to new APIs in
  patch 4 (Baolu)
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM in patch 8 (Jean)
- Add a few sanity checks in __free_pasid() and alloc_pasid() in
  patch 11 (Baolu)
- Add patch 12 to define a new flag "has_valid_pasid" for a task and
  use the flag to identify if the task has a valid PASID MSR (PeterZ)
- Add fpu__pasid_write() to update the MSR in fixup() in patch 13
- Check if mm->pasid can be found in fixup() in patch 13

v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (9):
  iommu: Change type of pasid to u32
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  fork: Clear PASID for new mm
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/mmu: Allocate/free PASID
  x86/traps: Fix up invalid PASID

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst   |   1 +
 Documentation/x86/sva.rst | 287 ++
 arch/x86/include/asm/cpufeatures.h|   1 +
 arch/x86/include/asm/disabled-features.h  |   9 +-
 arch/x86/include/asm/fpu/types.h  |  10 +
 

[PATCH v5 10/12] x86/mmu: Allocate/free PASID

2020-06-30 Thread Fenghua Yu
A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Currently the ENQCMD feature cannot be used if CONFIG_INTEL_IOMMU_SVM
is not set. Add X86_FEATURE_ENQCMD to the disabled features mask as
appropriate and use cpu_feature_enabled() to check the feature.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Mark ENQCMD disabled when configured out and remove redundant
  CONFIG_INTEL_IOMMU_SVM check which is included in cpu_feature_enabled()
  in fixup_pasid_exception() (PeterZ and Dave Hansen)
- Reviewed by Lu Baolu

v4:
- Change PASID type to u32 (Christoph)

v3:
- Add sanity checks in alloc_pasid() and _free_pasid() (Baolu)
- Add a comment that the private PASID feature will be removed completely
  from IOMMU and don't track private PASID in mm (Thomas)

v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
  (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
  pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

 arch/x86/include/asm/disabled-features.h |   9 +-
 arch/x86/include/asm/iommu.h |   2 +
 arch/x86/include/asm/mmu_context.h   |  11 ++
 drivers/iommu/intel/svm.c| 128 ---
 4 files changed, 137 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 4ea8584682f9..588d83e9da49 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,6 +56,12 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -75,7 +81,8 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
+DISABLE_ENQCMD)
 #define DISABLED_MASK170
 #define DISABLED_MASK180
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
 }
 
+void __free_pasid(struct mm_struct *mm);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..e1e7f1df6829 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -117,9 +118,19 @@ static inline int init_new_context(struct task_struct *tsk,
init_new_context_ldt(mm);
return 0;
 }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   __free_pasid(mm);
+}
+
 static inline void destroy_context(struct mm_struct *mm)
 {
destroy_context_ldt(mm);
+   free_pasid(mm);
 }
 
 extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8a0cf2f0dd54..4c70b037 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,69 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
 }
 
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+ bool new_pasid)
+{
+   if (new_pasid)
+   ioasid_free(svm->pasid);
+   kfree(svm);
+   kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+  u32 pasid_max, bool *new_pasid,
+  unsigned int flags)
+{
+   u32 pasid;
+
+   *new_pasid = false;
+
+   /*
+* Reuse the PASID if the mm already has a PASID and not a private
+* PASID is requested.
+*/
+   if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+   void *p;
+
+   /*
+* Since the mm has a PASID already, the PASID should be
+* bound 

[PATCH v5 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-06-30 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 07/12] mm: Define pasid in mm

2020-06-30 Thread Fenghua Yu
PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM (Jean)

v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: 
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-phili...@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..d61285cfe027 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+   u32 pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-06-30 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Reviewed by Lu Baolu

v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 778089d198eb..8a0cf2f0dd54 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, );
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 643951e28dd4..d129baf7e0b8 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -760,7 +760,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-06-30 Thread Fenghua Yu
From: Yu-cheng Yu 

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 10 ++
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c  |  4 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
+   XFEATURE_PASID,
 
XFEATURE_MAX,
 };
@@ -128,6 +129,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT   (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512   (XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
u32 pad;
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+   u64 pasid;
+} __packed;
+
 struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
  XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)"  ,
"Protection Keys User registers",
+   "PASID state",
"unknown xstate feature",
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+   X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+   print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU,  struct pkru_state);
+   XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
 
/*
 * Make *SURE* to add any feature numbers in below if
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 11/12] sched: Define and initialize a flag to identify valid PASID in the task

2020-06-30 Thread Fenghua Yu
From: Peter Zijlstra 

The flag is defined for the task to identify if the task has a valid
PASID. Its initial value is 0 when the task is forked/cloned. It will
be used shortly.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
---
v2:
- Add this patch to define the flag to identify valid PASID MSR (PeterZ)

 include/linux/sched.h | 3 +++
 kernel/fork.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 692e327d7455..042d6f5cde6a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -800,6 +800,9 @@ struct task_struct {
/* Stalled due to lack of memory */
unsignedin_memstall:1;
 #endif
+#ifdef CONFIG_IOMMU_SUPPORT
+   unsignedhas_valid_pasid:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 43b5f112604d..0a962bebdf88 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+   tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 12/12] x86/traps: Fix up invalid PASID

2020-06-30 Thread Fenghua Yu
A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Use cpu_feature_enabled() and remove redundant CONFIG_INTEL_IOMMU_SVM
  check which is included in cpu_feature_enabled() in
  fixup_pasid_exception() (PeterZ and Dave Hansen)
- Reviewed by Lu Baolu

v4:
- Change PASID type to u32 (Christoph)

v3:
- Check and set current->has_valid_pasid in fixup() (PeterZ)
- Add fpu__pasid_write() to update the MSR (PeterZ)
- Add ioasid_find() sanity check in fixup()

v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c  | 12 ++
 drivers/iommu/intel/svm.c| 78 
 3 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f58679e487f6..fe0f7d00523b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -518,6 +519,14 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
@@ -530,6 +539,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && fixup_pasid_exception())
+   goto exit;
+
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
goto exit;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4c70b037..4a84c82a4f8c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1105,3 +1105,81 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
 }
+
+/*
+ * Write the current task's PASID MSR/state. This is called only when PASID
+ * is enabled.
+ */
+static void fpu__pasid_write(u32 pasid)
+{
+   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
+
+   fpregs_lock();
+
+   /*
+* If the MSR is active and owned by the current task's FPU, it can
+* be directly written.
+*
+* Otherwise, write the fpstate.
+*/
+   if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   wrmsrl(MSR_IA32_PASID, msr_val);
+   } else {
+   struct ia32_pasid_state *ppasid_state;
+
+   ppasid_state = get_xsave_addr(>thread.fpu.state.xsave,
+ XFEATURE_PASID);
+   /*
+* ppasid_state shouldn't be NULL because XFEATURE_PASID
+* is enabled.
+*/
+  

[PATCH v5 06/12] x86/msr-index: Define IA32_PASID MSR

2020-06-30 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/3] iommu/amd: I/O VA address limits

2020-06-30 Thread Sebastian Ott via iommu
The IVRS ACPI table specifies maximum address sizes for I/O virtual
addresses that can be handled by the IOMMUs in the system. Parse that
data from the IVRS header to provide aperture information for DMA
mappings and users of the iommu API.

Changes for V2:
 - use limits in iommu_setup_dma_ops()
 - rebased to current upstream

Sebastian Ott (3):
  iommu/amd: Parse supported address sizes from IVRS
  iommu/amd: Restrict aperture for domains to conform with IVRS
  iommu/amd: Actually enforce geometry aperture

 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 26 ++
 drivers/iommu/amd/iommu.c   | 12 ++--
 3 files changed, 39 insertions(+), 2 deletions(-)

-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/3] iommu/amd: Actually enforce geometry aperture

2020-06-30 Thread Sebastian Ott via iommu
Add a check to enforce that I/O virtual addresses picked by iommu API
users stay within the domains geometry aperture.

Signed-off-by: Sebastian Ott 
Cc: Benjamin Serebrin 
Cc: Filippo Sironi 

CR: https://code.amazon.com/reviews/CR-26408388
---
 drivers/iommu/amd/iommu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b3f79820fd6d..bfa9c4a1fcf8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2159,11 +2159,13 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 static void amd_iommu_probe_finalize(struct device *dev)
 {
struct iommu_domain *domain;
+   u64 base = IOVA_START_PFN << PAGE_SHIFT;
+   u64 size = amd_iommu_max_va - base;
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, base, size);
 }
 
 static void amd_iommu_release_device(struct device *dev)
@@ -2500,6 +2502,11 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (pgtable.mode == PAGE_MODE_NONE)
return -EINVAL;
 
+   if (dom->geometry.force_aperture &&
+   (iova < dom->geometry.aperture_start ||
+iova + page_size - 1 > dom->geometry.aperture_end))
+   return -EINVAL;
+
if (iommu_prot & IOMMU_READ)
prot |= IOMMU_PROT_IR;
if (iommu_prot & IOMMU_WRITE)
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/3] iommu/amd: Parse supported address sizes from IVRS

2020-06-30 Thread Sebastian Ott via iommu
The IVRS ACPI table specifies maximum address sizes for i/o virtual
addresses that can be handled by the IOMMUs in the system.  Parse that
data from the IVRS header so that it can be considered in limiting the
IO aperture (in subsequent patches).

Based on prior work by Marius Hillenbrand.

Link: https://www.amd.com/system/files/TechDocs/48882_IOMMU_3.05_PUB.pdf

Signed-off-by: Sebastian Ott 
Cc: Benjamin Serebrin 
Cc: Filippo Sironi 

CR: https://code.amazon.com/reviews/CR-26408321
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 26 ++
 drivers/iommu/amd/iommu.c   |  1 +
 3 files changed, 30 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..0946638306d6 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -754,6 +754,9 @@ extern bool amd_iommu_force_isolation;
 /* Max levels of glxval supported */
 extern int amd_iommu_max_glx_val;
 
+/* Maximum virtual address supported */
+extern u64 amd_iommu_max_va;
+
 /*
  * This function flushes all internal caches of
  * the IOMMU used by this driver.
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6ebd4825e320..ab9d226b4215 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -39,6 +40,7 @@
  * definitions for the ACPI scanning code
  */
 #define IVRS_HEADER_LENGTH 48
+#define IVRS_HEADER_IVINFO_OFFSET 36
 
 #define ACPI_IVHD_TYPE_MAX_SUPPORTED   0x40
 #define ACPI_IVMD_TYPE_ALL  0x20
@@ -2490,6 +2492,27 @@ static void __init free_dma_resources(void)
free_unity_maps();
 }
 
+static void __init get_ivrs_ivinfo(struct acpi_table_header *ivrs)
+{
+   u32 *ivinfo = (u32 *)((u8 *)ivrs + IVRS_HEADER_IVINFO_OFFSET);
+   u8 va_size = FIELD_GET(GENMASK(21, 15), *ivinfo);
+   u8 valid_va_sizes[] = {32, 40, 48, 64};
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(valid_va_sizes); i++) {
+   if (va_size == valid_va_sizes[i]) {
+   amd_iommu_max_va = DMA_BIT_MASK(va_size);
+   break;
+   }
+   }
+
+   if (!amd_iommu_max_va) {
+   pr_warn("Invalid virtual address size %u in IVRS header, use 
most restrictive %u\n",
+   va_size, valid_va_sizes[0]);
+   amd_iommu_max_va = DMA_BIT_MASK(valid_va_sizes[0]);
+   }
+}
+
 /*
  * This is the hardware init function for AMD IOMMU in the system.
  * This function is called either from amd_iommu_init or from the interrupt
@@ -2544,6 +2567,9 @@ static int __init early_amd_iommu_init(void)
if (ret)
goto out;
 
+   get_ivrs_ivinfo(ivrs_base);
+   DUMP_printk("IVRS vasize=%llx\n", amd_iommu_max_va);
+
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 74cca1757172..acab35220d98 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -88,6 +88,7 @@ const struct iommu_ops amd_iommu_ops;
 
 static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
 int amd_iommu_max_glx_val = -1;
+u64 amd_iommu_max_va;
 
 /*
  * general struct to manage commands send to an IOMMU
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/3] iommu/amd: Restrict aperture for domains to conform with IVRS

2020-06-30 Thread Sebastian Ott via iommu
The IVRS ACPI table specifies maximum address sizes for I/O virtual
addresses. When allocating new protection domains that perform
translation, propagate these limits as the domain's geometry / aperture.

Based on prior work by Marius Hillenbrand.

Signed-off-by: Sebastian Ott 
Cc: Benjamin Serebrin 
Cc: Filippo Sironi 

CR: https://code.amazon.com/reviews/CR-26408353
---
 drivers/iommu/amd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index acab35220d98..b3f79820fd6d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2382,7 +2382,7 @@ static struct iommu_domain 
*amd_iommu_domain_alloc(unsigned type)
return NULL;
 
domain->domain.geometry.aperture_start = 0;
-   domain->domain.geometry.aperture_end   = ~0ULL;
+   domain->domain.geometry.aperture_end   = amd_iommu_max_va;
domain->domain.geometry.force_aperture = true;
 
if (type == IOMMU_DOMAIN_DMA &&
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 10/12] of/irq: Make of_msi_map_rid() PCI bus agnostic

2020-06-30 Thread Rob Herring
On Fri, Jun 19, 2020 at 2:20 AM Lorenzo Pieralisi
 wrote:
>
> There is nothing PCI bus specific in the of_msi_map_rid()
> implementation other than the requester ID tag for the input
> ID space. Rename requester ID to a more generic ID so that
> the translation code can be used by all busses that require
> input/output ID translations.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Bjorn Helgaas 
> Cc: Rob Herring 
> Cc: Marc Zyngier 
> ---
>  drivers/of/irq.c   | 28 ++--
>  drivers/pci/msi.c  |  2 +-
>  include/linux/of_irq.h |  8 
>  3 files changed, 19 insertions(+), 19 deletions(-)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 08/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus

2020-06-30 Thread Rob Herring
On Fri, Jun 19, 2020 at 2:20 AM Lorenzo Pieralisi
 wrote:
>
> From: Laurentiu Tudor 
>
> The existing bindings cannot be used to specify the relationship
> between fsl-mc devices and GIC ITSes.
> Add a generic binding for mapping fsl-mc devices to GIC ITSes, using
> msi-map property.
> In addition, deprecate msi-parent property which no longer makes sense
> now that we support translating the MSIs.
>
> Signed-off-by: Laurentiu Tudor 
> Signed-off-by: Diana Craciun 
> Cc: Rob Herring 
> ---
>  .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 50 ---
>  1 file changed, 44 insertions(+), 6 deletions(-)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 09/12] of/irq: make of_msi_map_get_device_domain() bus agnostic

2020-06-30 Thread Rob Herring
On Fri, Jun 19, 2020 at 2:20 AM Lorenzo Pieralisi
 wrote:
>
> From: Diana Craciun 
>
> of_msi_map_get_device_domain() is PCI specific but it need not be and
> can be easily changed to be bus agnostic in order to be used by other
> busses by adding an IRQ domain bus token as an input parameter.
>
> Signed-off-by: Diana Craciun 
> Signed-off-by: Lorenzo Pieralisi 
> Acked-by: Bjorn Helgaas# pci/msi.c
> Cc: Bjorn Helgaas 
> Cc: Rob Herring 
> Cc: Marc Zyngier 
> ---
>  drivers/of/irq.c   | 8 +---
>  drivers/pci/msi.c  | 2 +-
>  include/linux/of_irq.h | 5 +++--
>  3 files changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 07/12] of/device: Add input id to of_dma_configure()

2020-06-30 Thread Rob Herring
On Fri, Jun 19, 2020 at 2:20 AM Lorenzo Pieralisi
 wrote:
>
> Devices sitting on proprietary busses have a device ID space that
> is owned by the respective bus and related firmware bindings. In order
> to let the generic OF layer handle the input translations to
> an IOMMU id, for such busses the current of_dma_configure() interface
> should be extended in order to allow the bus layer to provide the
> device input id parameter - that is retrieved/assigned in bus
> specific code and firmware.
>
> Augment of_dma_configure() to add an optional input_id parameter,
> leaving current functionality unchanged.
>
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Rob Herring 
> Cc: Robin Murphy 
> Cc: Joerg Roedel 
> Cc: Laurentiu Tudor 
> ---
>  drivers/bus/fsl-mc/fsl-mc-bus.c |  4 +-
>  drivers/iommu/of_iommu.c| 81 ++---
>  drivers/of/device.c |  8 ++--
>  include/linux/of_device.h   | 16 ++-
>  include/linux/of_iommu.h|  6 ++-
>  5 files changed, 70 insertions(+), 45 deletions(-)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/7] iommu/vt-d: Fix devTLB flush for vSVA

2020-06-30 Thread Jacob Pan
From: Liu Yi L 

For guest SVA usage, in order to optimize for less VMEXIT, guest request
of IOTLB flush also includes device TLB.

On the host side, IOMMU driver performs IOTLB and implicit devTLB
invalidation. When PASID-selective granularity is requested by the guest
we need to derive the equivalent address range for devTLB instead of
using the address information in the UAPI data. The reason for that is, unlike
IOTLB flush, devTLB flush does not support PASID-selective granularity.
This is to say, we need to set the following in the PASID based devTLB
invalidation descriptor:
- entire 64 bit range in address ~(0x1 << 63)
- S bit = 1 (VT-d CH 6.5.2.6).

Without this fix, device TLB flush range is not set properly for PASID
selective granularity. This patch also merged devTLB flush code for both
implicit and explicit cases.

Fixes: 6ee1b77ba3ac ("iommu/vt-d: Add svm/sva invalidate function")
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 96340da57075..6a0c62c7395c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5408,7 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
sid = PCI_DEVID(bus, devfn);
 
/* Size is only valid in address selective invalidation */
-   if (inv_info->granularity != IOMMU_INV_GRANU_PASID)
+   if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
size = to_vtd_size(inv_info->addr_info.granule_size,
   inv_info->addr_info.nb_granules);
 
@@ -5417,6 +5417,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 IOMMU_CACHE_INV_TYPE_NR) {
int granu = 0;
u64 pasid = 0;
+   u64 addr = 0;
 
granu = to_vtd_granularity(cache_type, inv_info->granularity);
if (granu == -EINVAL) {
@@ -5456,24 +5457,31 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
+   if (!info->ats_enabled)
+   break;
/*
 * Always flush device IOTLB if ATS is enabled. vIOMMU
 * in the guest may assume IOTLB flush is inclusive,
 * which is more efficient.
 */
-   if (info->ats_enabled)
-   qi_flush_dev_iotlb_pasid(iommu, sid,
-   info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
-   size);
-   break;
+   fallthrough;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+   /*
+* There is no PASID selective flush for device TLB, so
+* the equivalent of that is we set the size to be the
+* entire range of 64 bit. User only provides PASID info
+* without address info. So we set addr to 0.
+*/
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   size = 64 - VTD_PAGE_SHIFT;
+   addr = 0;
+   } else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR)
+   addr = inv_info->addr_info.addr;
+
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
info->pfsid, pasid,
-   info->ats_qdep,
-   inv_info->addr_info.addr,
+   info->ats_qdep, addr,
size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/7] iommu/vt-d: Remove global page support in devTLB flush

2020-06-30 Thread Jacob Pan
Global pages support is removed from VT-d spec 3.0 for dev TLB
invalidation. This patch is to remove the bits for vSVA. Similar change
already made for the native SVA. See the link below.

Link: https://lkml.org/lkml/2019/8/26/651
Acked-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c  | 4 +---
 drivers/iommu/intel/iommu.c | 4 ++--
 include/linux/intel-iommu.h | 3 +--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index cc46dff98fa0..d9f973fa1190 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1437,8 +1437,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 /* PASID-based device IOTLB Invalidate */
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid,  u16 qdep, u64 addr,
- unsigned int size_order, u64 granu)
+ u32 pasid,  u16 qdep, u64 addr, unsigned int 
size_order)
 {
unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
@@ -1446,7 +1445,6 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
QI_DEV_IOTLB_PFSID(pfsid);
-   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
 
/*
 * If S bit is 0, we only flush a single page. If S bit is set,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9129663a7406..96340da57075 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5466,7 +5466,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
break;
case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
if (info->ats_enabled)
@@ -5474,7 +5474,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
info->pfsid, pasid,
info->ats_qdep,
inv_info->addr_info.addr,
-   size, granu);
+   size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 729386ca8122..9a6614880773 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,7 +380,6 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
 #define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
@@ -704,7 +703,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, 
u32 pasid, u64 addr,
 
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
  u32 pasid, u16 qdep, u64 addr,
- unsigned int size_order, u64 granu);
+ unsigned int size_order);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
  int pasid);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/7] iommu/vt-d: Disable multiple GPASID-dev bind

2020-06-30 Thread Jacob Pan
For the unlikely use case where multiple aux domains from the same pdev
are attached to a single guest and then bound to a single process
(thus same PASID) within that guest, we cannot easily support this case
by refcounting the number of users. As there is only one SL page table
per PASID while we have multiple aux domains thus multiple SL page tables
for the same PASID.

Extra unbinding guest PASID can happen due to race between normal and
exception cases. Termination of one aux domain may affect others unless
we actively track and switch aux domains to ensure the validity of SL
page tables and TLB states in the shared PASID entry.

Support for sharing second level PGDs across domains can reduce the
complexity but this is not available due to the limitations on VFIO
container architecture. We can revisit this decision once sharing PGDs
are available.

Overall, the complexity and potential glitch do not warrant this unlikely
use case thereby removed by this patch.

Fixes: 56722a4398a30 ("iommu/vt-d: Add bind guest PASID support")
Acked-by: Lu Baolu 
Cc: Kevin Tian 
Cc: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..d386853121a2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -277,20 +277,16 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
goto out;
}
 
+   /*
+* Do not allow multiple bindings of the same device-PASID since
+* there is only one SL page tables per PASID. We may revisit
+* once sharing PGD across domains are supported.
+*/
for_each_svm_dev(sdev, svm, dev) {
-   /*
-* For devices with aux domains, we should allow
-* multiple bind calls with the same PASID and pdev.
-*/
-   if (iommu_dev_feature_enabled(dev,
- IOMMU_DEV_FEAT_AUX)) {
-   sdev->users++;
-   } else {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID 
%u\n",
-svm->pasid);
-   ret = -EBUSY;
-   }
+   dev_warn_ratelimited(dev,
+"Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
goto out;
}
} else {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/7] iommu/vt-d: Misc tweaks and fixes for vSVA

2020-06-30 Thread Jacob Pan
Hi Baolu and all,

This is a series to address some of the issues we found in vSVA support.
Most of the patches deal with exception handling, we also removed some bits
that are not currently supported.

Many thanks to Kevin Tian's review.

Jacob & Yi


Changelog:

v2 Address reviews from Baolu
- Fixed addr field in devTLB flush (5/7)
- Assign address for single page devTLB invalidation (4/7)
- Coding style tweaks

Jacob Pan (4):
  iommu/vt-d: Remove global page support in devTLB flush
  iommu/vt-d: Fix PASID devTLB invalidation
  iommu/vt-d: Warn on out-of-range invalidation address
  iommu/vt-d: Disable multiple GPASID-dev bind

Liu Yi L (3):
  iommu/vt-d: Enforce PASID devTLB field mask
  iommu/vt-d: Handle non-page aligned address
  iommu/vt-d: Fix devTLB flush for vSVA

 drivers/iommu/intel/dmar.c  | 24 +++-
 drivers/iommu/intel/iommu.c | 37 ++---
 drivers/iommu/intel/pasid.c | 11 ++-
 drivers/iommu/intel/svm.c   | 22 +-
 include/linux/intel-iommu.h |  5 ++---
 5 files changed, 62 insertions(+), 37 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/7] iommu/vt-d: Handle non-page aligned address

2020-06-30 Thread Jacob Pan
From: Liu Yi L 

Address information for device TLB invalidation comes from userspace
when device is directly assigned to a guest with vIOMMU support.
VT-d requires page aligned address. This patch checks and enforce
address to be page aligned, otherwise reserved bits can be set in the
invalidation descriptor. Unrecoverable fault will be reported due to
non-zero value in the reserved bits.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d9f973fa1190..3899f3161071 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1455,9 +1455,25 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, 
u16 sid, u16 pfsid,
 * Max Invs Pending (MIP) is set to 0 for now until we have DIT in
 * ECAP.
 */
-   desc.qw1 |= addr & ~mask;
-   if (size_order)
+   if (addr & ~VTD_PAGE_MASK)
+   pr_warn_ratelimited("Invalidate non-page aligned address 
%llx\n", addr);
+
+   /* Take page address */
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr);
+
+   if (size_order) {
+   /*
+* Existing 0s in address below size_order may be the least
+* significant bit, we must set them to 1s to avoid having
+* smaller size than desired.
+*/
+   desc.qw1 |= GENMASK_ULL(size_order + VTD_PAGE_SHIFT,
+   VTD_PAGE_SHIFT);
+   /* Clear size_order bit to indicate size */
+   desc.qw1 &= ~mask;
+   /* Set the S bit to indicate flushing more than 1 page */
desc.qw1 |= QI_DEV_EIOTLB_SIZE;
+   }
 
qi_submit_sync(iommu, , 1, 0);
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-06-30 Thread Jacob Pan
DevTLB flush can be used for both DMA request with and without PASIDs.
The former uses PASID#0 (RID2PASID), latter uses non-zero PASID for SVA
usage.

This patch adds a check for PASID value such that devTLB flush with
PASID is used for SVA case. This is more efficient in that multiple
PASIDs can be used by a single device, when tearing down a PASID entry
we shall flush only the devTLB specific to a PASID.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..70d21209dd04 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qdep = info->ats_qdep;
pfsid = info->pfsid;
 
-   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+   /*
+* When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
+* devTLB flush w/o PASID should be used. For non-zero PASID under
+* SVA usage, device could do DMA with multiple PASIDs. It is more
+* efficient to flush devTLB specific to the PASID.
+*/
+   if (pasid == PASID_RID2PASID)
+   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 
- VTD_PAGE_SHIFT);
+   else
+   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
VTD_PAGE_SHIFT);
 }
 
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-06-30 Thread Jacob Pan
For guest requested IOTLB invalidation, address and mask are provided as
part of the invalidation data. VT-d HW silently ignores any address bits
below the mask. SW shall also allow such case but give warning if
address does not align with the mask. This patch relax the fault
handling from error to warning and proceed with invalidation request
with the given mask.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6a0c62c7395c..88e75be5ea76 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
+   /* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
(inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
-   pr_err_ratelimited("Address out of range, 
0x%llx, size order %llu\n",
-  inv_info->addr_info.addr, 
size);
-   ret = -ERANGE;
-   goto out_unlock;
+   WARN_ONCE(1, "Address out of range, 0x%llx, 
size order %llu\n",
+ inv_info->addr_info.addr, size);
}
 
/*
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/7] iommu/vt-d: Enforce PASID devTLB field mask

2020-06-30 Thread Jacob Pan
From: Liu Yi L 

Set proper masks to avoid invalid input spillover to reserved bits.

Acked-by: Lu Baolu 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/linux/intel-iommu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4100bd224f5c..729386ca8122 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,8 +380,8 @@ enum {
 
 #define QI_DEV_EIOTLB_ADDR(a)  ((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
-#define QI_DEV_EIOTLB_GLOB(g)  ((u64)g)
-#define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
+#define QI_DEV_EIOTLB_GLOB(g)  ((u64)(g) & 0x1)
+#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xf) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
 #define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> The driver intend to support up to 3 instances. It doesn't really mandate 
>> that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> , , ; The reg can have 
>> all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have 
>up to 3 (for Tegra194). So the question is do we have a use-case where we only 
>use 2 and not 3? If not, then it still seems that we should require that all 3 
>are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let 
me fail the one instance case as it can use regular arm smmu implementation and 
don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines 
>the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will 
>get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] iommu/amd: Move Kconfig and Makefile bits down into amd directory

2020-06-30 Thread Jerry Snitselaar
Move AMD Kconfig and Makefile bits down into the amd directory
with the rest of the AMD specific files.

Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/Kconfig  | 45 +-
 drivers/iommu/Makefile |  5 +
 drivers/iommu/amd/Kconfig  | 44 +
 drivers/iommu/amd/Makefile |  4 
 4 files changed, 50 insertions(+), 48 deletions(-)
 create mode 100644 drivers/iommu/amd/Kconfig
 create mode 100644 drivers/iommu/amd/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 281cd6bd0fe0..24000e7ed0fa 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -132,50 +132,7 @@ config IOMMU_PGTABLES_L2
def_bool y
depends on MSM_IOMMU && MMU && SMP && CPU_DCACHE_DISABLE=n
 
-# AMD IOMMU support
-config AMD_IOMMU
-   bool "AMD IOMMU support"
-   select SWIOTLB
-   select PCI_MSI
-   select PCI_ATS
-   select PCI_PRI
-   select PCI_PASID
-   select IOMMU_API
-   select IOMMU_IOVA
-   select IOMMU_DMA
-   depends on X86_64 && PCI && ACPI
-   help
- With this option you can enable support for AMD IOMMU hardware in
- your system. An IOMMU is a hardware component which provides
- remapping of DMA memory accesses from devices. With an AMD IOMMU you
- can isolate the DMA memory of different devices and protect the
- system from misbehaving device drivers or hardware.
-
- You can find out if your system has an AMD IOMMU if you look into
- your BIOS for an option to enable it or if you have an IVRS ACPI
- table.
-
-config AMD_IOMMU_V2
-   tristate "AMD IOMMU Version 2 driver"
-   depends on AMD_IOMMU
-   select MMU_NOTIFIER
-   help
- This option enables support for the AMD IOMMUv2 features of the IOMMU
- hardware. Select this option if you want to use devices that support
- the PCI PRI and PASID interface.
-
-config AMD_IOMMU_DEBUGFS
-   bool "Enable AMD IOMMU internals in DebugFS"
-   depends on AMD_IOMMU && IOMMU_DEBUGFS
-   help
- !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!  !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY, -REALLY- KNOW WHAT YOU 
ARE DOING!!!
- Exposes AMD IOMMU device internals in DebugFS.
-
- This option is -NOT- intended for production environments, and should
- not generally be enabled.
-
+source "drivers/iommu/amd/Kconfig"
 source "drivers/iommu/intel/Kconfig"
 
 config IRQ_REMAP
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 71dd2f382e78..f356bc12b1c7 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y += intel/
+obj-y += amd/ intel/
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -12,9 +12,6 @@ obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU) += of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
-obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
-obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
-obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
 arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
new file mode 100644
index ..1f061d91e0b8
--- /dev/null
+++ b/drivers/iommu/amd/Kconfig
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# AMD IOMMU support
+config AMD_IOMMU
+   bool "AMD IOMMU support"
+   select SWIOTLB
+   select PCI_MSI
+   select PCI_ATS
+   select PCI_PRI
+   select PCI_PASID
+   select IOMMU_API
+   select IOMMU_IOVA
+   select IOMMU_DMA
+   depends on X86_64 && PCI && ACPI
+   help
+ With this option you can enable support for AMD IOMMU hardware in
+ your system. An IOMMU is a hardware component which provides
+ remapping of DMA memory accesses from devices. With an AMD IOMMU you
+ can isolate the DMA memory of different devices and protect the
+ system from misbehaving device drivers or hardware.
+
+ You can find out if your system has an AMD IOMMU if you look into
+ your BIOS for an option to enable it or if you have an IVRS ACPI
+ table.
+
+config AMD_IOMMU_V2
+   tristate "AMD IOMMU Version 2 driver"
+   depends on AMD_IOMMU
+   select MMU_NOTIFIER
+   help
+ This option enables support for the AMD IOMMUv2 features of the IOMMU
+ hardware. Select this option if you want to use devices that support
+ the PCI PRI and PASID interface.
+
+config AMD_IOMMU_DEBUGFS
+   bool "Enable AMD IOMMU internals in DebugFS"
+   depends on AMD_IOMMU && IOMMU_DEBUGFS
+   

[PATCH v2 0/2] iommu: Move AMD and Intel Kconfig + Makefile bits into their directories

2020-06-30 Thread Jerry Snitselaar
This patchset imeplements the suggestion from Linus to move the
Kconfig and Makefile bits for AMD and Intel into their respective
directories.

v2: Rebase against v5.8-rc3. Dropped ---help--- changes from Kconfig as that was
dealt with in systemwide cleanup.

Jerry Snitselaar (2):
  iommu/vt-d: Move Kconfig and Makefile bits down into intel directory
  iommu/amd: Move Kconfig and Makefile bits down into amd directory


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] iommu/vt-d: Move Kconfig and Makefile bits down into intel directory

2020-06-30 Thread Jerry Snitselaar
Move Intel Kconfig and Makefile bits down into intel directory
with the rest of the Intel specific files.

Cc: Joerg Roedel 
Cc: Lu Baolu 
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/Kconfig| 86 +---
 drivers/iommu/Makefile   |  8 +---
 drivers/iommu/intel/Kconfig  | 86 
 drivers/iommu/intel/Makefile |  7 +++
 4 files changed, 96 insertions(+), 91 deletions(-)
 create mode 100644 drivers/iommu/intel/Kconfig
 create mode 100644 drivers/iommu/intel/Makefile

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a..281cd6bd0fe0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -176,91 +176,7 @@ config AMD_IOMMU_DEBUGFS
  This option is -NOT- intended for production environments, and should
  not generally be enabled.
 
-# Intel IOMMU support
-config DMAR_TABLE
-   bool
-
-config INTEL_IOMMU
-   bool "Support for Intel IOMMU using DMA Remapping Devices"
-   depends on PCI_MSI && ACPI && (X86 || IA64)
-   select IOMMU_API
-   select IOMMU_IOVA
-   select NEED_DMA_MAP_STATE
-   select DMAR_TABLE
-   select SWIOTLB
-   select IOASID
-   help
- DMA remapping (DMAR) devices support enables independent address
- translations for Direct Memory Access (DMA) from devices.
- These DMA remapping devices are reported via ACPI tables
- and include PCI device scope covered by these DMA
- remapping devices.
-
-config INTEL_IOMMU_DEBUGFS
-   bool "Export Intel IOMMU internals in Debugfs"
-   depends on INTEL_IOMMU && IOMMU_DEBUGFS
-   help
- !!!WARNING!!!
-
- DO NOT ENABLE THIS OPTION UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!!!
-
- Expose Intel IOMMU internals in Debugfs.
-
- This option is -NOT- intended for production environments, and should
- only be enabled for debugging Intel IOMMU.
-
-config INTEL_IOMMU_SVM
-   bool "Support for Shared Virtual Memory with Intel IOMMU"
-   depends on INTEL_IOMMU && X86_64
-   select PCI_PASID
-   select PCI_PRI
-   select MMU_NOTIFIER
-   select IOASID
-   help
- Shared Virtual Memory (SVM) provides a facility for devices
- to access DMA resources through process address space by
- means of a Process Address Space ID (PASID).
-
-config INTEL_IOMMU_DEFAULT_ON
-   def_bool y
-   prompt "Enable Intel DMA Remapping Devices by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable a DMAR device at boot time if
- one is found. If this option is not selected, DMAR support can
- be enabled by passing intel_iommu=on to the kernel.
-
-config INTEL_IOMMU_BROKEN_GFX_WA
-   bool "Workaround broken graphics drivers (going away soon)"
-   depends on INTEL_IOMMU && BROKEN && X86
-   help
- Current Graphics drivers tend to use physical address
- for DMA and avoid using DMA APIs. Setting this config
- option permits the IOMMU driver to set a unity map for
- all the OS-visible memory. Hence the driver can continue
- to use physical addresses for DMA, at least until this
- option is removed in the 2.6.32 kernel.
-
-config INTEL_IOMMU_FLOPPY_WA
-   def_bool y
-   depends on INTEL_IOMMU && X86
-   help
- Floppy disk drivers are known to bypass DMA API calls
- thereby failing to work when IOMMU is enabled. This
- workaround will setup a 1:1 mapping for the first
- 16MiB to make floppy (an ISA device) work.
-
-config INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
-   bool "Enable Intel IOMMU scalable mode by default"
-   depends on INTEL_IOMMU
-   help
- Selecting this option will enable by default the scalable mode if
- hardware presents the capability. The scalable mode is defined in
- VT-d 3.0. The scalable mode capability could be checked by reading
- /sys/devices/virtual/iommu/dmar*/intel-iommu/ecap. If this option
- is not selected, scalable mode support could also be enabled by
- passing intel_iommu=sm_on to the kernel. If not sure, please use
- the default value.
+source "drivers/iommu/intel/Kconfig"
 
 config IRQ_REMAP
bool "Support for Interrupt Remapping"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..71dd2f382e78 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-y += intel/
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
@@ -17,13 +18,8 @@ obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
 arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
-obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o

Re: [PATCH] iommu/amd: Fix event counter availability check

2020-06-30 Thread Alexander Monakov
On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:

> > > On 6/1/20 4:01 PM, Alexander Monakov wrote:
> > > > On Mon, 1 Jun 2020, Suravee Suthikulpanit wrote:
> > > > 
> > > > > > Moving init_iommu_perf_ctr just after iommu_flush_all_caches
> > > > > > resolves the issue. This is the earliest point in amd_iommu_init_pci
> > > > > > where the call succeeds on my laptop.
> > > > > According to your description, it should just need to be anywhere
> > > > > after the pci_enable_device() is called for the IOMMU device, isn't
> > > > > it? So, on your system, what if we just move the init_iommu_perf_ctr()
> > > > > here:
> > > > No, this doesn't work, as I already said in the paragraph you are
> > > > responding to. See my last sentence in the quoted part.
> > > > 
> > > > So the implication is init_device_table_dma together with subsequent
> > > > cache flush is also setting up something that is necessary for counters
> > > > to be writable.
> > > > 
> > > > Alexander
> > > > 
> > > Instead of blindly moving the code around to a spot that would just work,
> > > I am trying to understand what might be required here. In this case,
> > > the init_device_table_dma()should not be needed. I suspect it's the IOMMU
> > > invalidate all command that's also needed here.
> > > 
> > > I'm also checking with the HW and BIOS team. Meanwhile, could you please
> > > give the following change a try:
> > Hello. Can you give any update please?
> > 
> > Alexander
> > 
> 
> Sorry for late reply. I have a reproducer and working with the HW team to
> understand the issue.
> I should be able to provide update with solution by the end of this week.

Hi, can you share any information (two more weeks have passed)?

Alexander
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: the XSK buffer pool needs be to reverted

2020-06-30 Thread Jonathan Lemon
On Mon, Jun 29, 2020 at 02:15:16PM +0100, Robin Murphy wrote:
> On 2020-06-27 08:02, Christoph Hellwig wrote:
> > On Fri, Jun 26, 2020 at 01:54:12PM -0700, Jonathan Lemon wrote:
> > > On Fri, Jun 26, 2020 at 09:47:25AM +0200, Christoph Hellwig wrote:
> > > > 
> > > > Note that this is somewhat urgent, as various of the APIs that the code
> > > > is abusing are slated to go away for Linux 5.9, so this addition comes
> > > > at a really bad time.
> > > 
> > > Could you elaborate on what is upcoming here?
> > 
> > Moving all these calls out of line, and adding a bypass flag to avoid
> > the indirect function call for IOMMUs in direct mapped mode.
> > 
> > > Also, on a semi-related note, are there limitations on how many pages
> > > can be left mapped by the iommu?  Some of the page pool work involves
> > > leaving the pages mapped instead of constantly mapping/unmapping them.
> > 
> > There are, but I think for all modern IOMMUs they are so big that they
> > don't matter.  Maintaines of the individual IOMMU drivers might know
> > more.
> 
> Right - I don't know too much about older and more esoteric stuff like POWER
> TCE, but for modern pagetable-based stuff like Intel VT-d, AMD-Vi, and Arm
> SMMU, the only "limits" are such that legitimate DMA API use should never
> get anywhere near them (you'd run out of RAM for actual buffers long
> beforehand). The most vaguely-realistic concern might be a pathological
> system topology where some old 32-bit PCI device doesn't have ACS isolation
> from your high-performance NIC such that they have to share an address
> space, where the NIC might happen to steal all the low addresses and prevent
> the soundcard or whatever from being able to map a usable buffer.
> 
> With an IOMMU, you typically really *want* to keep a full working set's
> worth of pages mapped, since dma_map/unmap are expensive while dma_sync is
> somewhere between relatively cheap and free. With no IOMMU it makes no real
> difference from the DMA API perspective since map/unmap are effectively no
> more than the equivalent sync operations anyway (I'm assuming we're not
> talking about the kind of constrained hardware that might need SWIOTLB).
> 
> > > On a heavily loaded box with iommu enabled, it seems that quite often
> > > there is contention on the iova_lock.  Are there known issues in this
> > > area?
> > 
> > I'll have to defer to the IOMMU maintainers, and for that you'll need
> > to say what code you are using.  Current mainlaine doesn't even have
> > an iova_lock anywhere.
> 
> Again I can't speak for non-mainstream stuff outside drivers/iommu, but it's
> been over 4 years now since merging the initial scalability work for the
> generic IOVA allocator there that focused on minimising lock contention, and
> it's had considerable evaluation and tweaking since. But if we can achieve
> the goal of efficiently recycling mapped buffers then we shouldn't need to
> go anywhere near IOVA allocation either way except when expanding the pool.


I'm running a set of patches which uses the page pool to try and keep
all the RX buffers mapped as the skb goes up the stack, returning the
pages to the pool when the skb is freed.

On a dual-socket 12-core Intel machine (48 processors), and 256G of
memory, when iommu is enabled, I see the following from 'perf top -U',
as the hottest function being run:

-   43.42%  worker  [k] queued_spin_lock_slowpath
   - 43.42% queued_spin_lock_slowpath
  - 41.69% _raw_spin_lock_irqsave
 + 41.39% alloc_iova 
 + 0.28% iova_magazine_free_pfns
  + 1.07% lock_sock_nested

Which likely is heavy contention on the iovad->iova_rbtree_lock.
(This is on a 5.6 based system, BTW).  More scripts and data are below.
Is there a way to reduce the contention here?



The following quick and dirty [and possibly wrong] .bpf script was used
to try and find the time spent in __alloc_and_insert_iova_range():

kprobe:alloc_iova_fast
{
@fast = count();
}

kprobe:alloc_iova
{
@iova_start[tid] = nsecs;
@iova = count();
}

kretprobe:alloc_iova / @iova_start[tid] /
{
@alloc_h = hist(nsecs - @iova_start[tid] - @mem_delta[tid]);
delete(@iova_start[tid]);
delete(@mem_delta[tid]);
}

kprobe:alloc_iova_mem / @iova_start[tid] /
{
@mem_start[tid] = nsecs;
}

kretprobe:alloc_iova_mem / @mem_start[tid] /
{
@mem_delta[tid] = nsecs - @mem_start[tid];
delete(@mem_start[tid]);
}

kprobe:iova_insert_rbtree / @iova_start[tid] /
{
@rb_start[tid] = nsecs;
@rbtree = count();
}

kretprobe:iova_insert_rbtree / @rb_start[tid] /
{
@insert_h = hist(nsecs - @rb_start[tid]);
delete(@rb_start[tid]);
}

interval:s:2
{
print(@fast);
print(@iova);
print(@rbtree);
print(@alloc_h);
print(@insert_h);
printf("\n");
}

I see the following results.

@fast: 1989223
@iova: 725269
@rbtree: 689306

@alloc_h:
[64, 128)  2 |

Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 18:16, Krishna Reddy wrote:
>> OK, well I see what you are saying, but if we intended to support all 3 for 
>> Tegra194, then we should ensure all 3 are initialised correctly.
> 
> The driver intend to support up to 3 instances. It doesn't really mandate 
> that all three instances be present in same DT node.
> Each mmio aperture in "reg" property is an instance here. reg =  size>, , ;
> The reg can have all three or less and driver just configures based on reg 
> and it works fine.

So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we
have up to 3 (for Tegra194). So the question is do we have a use-case
where we only use 2 and not 3? If not, then it still seems that we
should require that all 3 are present.

The other problem I see here is that currently the arm-smmu binding
defines the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe
that this will get caught by the 'dt_binding_check' when we try to
populate the binding.

>> It would be better to query the number of SMMUs populated in device-tree and 
>> then ensure that all are initialised correctly.
> 
> Getting the IORESOURCE_MEM is the way to count the instances driver need to 
> support.  
> In a way, It is already querying through IORESOURCE_MEM here. 

Yes I was wondering that. I think we just need to decide if the 3rd SMMU
is optional or not. The DT binding should detail and min and max supported. 

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Saravana Kannan via iommu
On Mon, Jun 29, 2020 at 9:49 PM Rajat Jain  wrote:
>
> Add a new (optional) field to denote the physical location of a device
> in the system, and expose it in sysfs. This was discussed here:
> https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
>
> (The primary choice for attribute name i.e. "location" is already
> exposed as an ABI elsewhere, so settled for "site"). Individual buses
> that want to support this new attribute can opt-in by setting a flag in
> bus_type, and then populating the location of device while enumerating
> it.
>
> Signed-off-by: Rajat Jain 
> ---
> v2: (Initial version)
>
>  drivers/base/core.c| 35 +++
>  include/linux/device.h | 42 ++
>  include/linux/device/bus.h |  8 
>  3 files changed, 85 insertions(+)
>

 I'm not CC'ed in 4/7, so just replying

> diff --git a/include/linux/device.h b/include/linux/device.h
> index 15460a5ac024a..a4143735ae712 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -428,6 +428,31 @@ enum dl_dev_state {
> DL_DEV_UNBINDING,
>  };
>
> +/**
> + * enum device_site - Physical location of the device in the system.
> + * The semantics of values depend on subsystem / bus:
> + *
> + * @SITE_UNKNOWN:  Location is Unknown (default)
> + *
> + * @SITE_INTERNAL: Device is internal to the system, and cannot be (easily)
> + * removed. E.g. SoC internal devices, onboard soldered
> + * devices, internal M.2 cards (that cannot be removed
> + * without opening the chassis).
> + * @SITE_EXTENDED: Device sits an extension of the system. E.g. devices
> + * on external PCIe trays, docking stations etc. These
> + * devices may be removable, but are generally housed
> + * internally on an extension board, so they are removed
> + * only when that whole extension board is removed.
> + * @SITE_EXTERNAL: Devices truly external to the system (i.e. plugged on
> + * an external port) that may be removed or added frequently.
> + */
> +enum device_site {
> +   SITE_UNKNOWN = 0,
> +   SITE_INTERNAL,
> +   SITE_EXTENDED,
> +   SITE_EXTERNAL,
> +};
> +
>  /**
>   * struct dev_links_info - Device data related to device links.
>   * @suppliers: List of links to supplier devices.
> @@ -513,6 +538,7 @@ struct dev_links_info {
>   * device (i.e. the bus driver that discovered the device).
>   * @iommu_group: IOMMU group the device belongs to.
>   * @iommu: Per device generic IOMMU runtime data
> + * @site:  Physical location of the device w.r.t. the system
>   *
>   * @offline_disabled: If set, the device is permanently online.
>   * @offline:   Set after successful invocation of bus type's .offline().
> @@ -613,6 +639,8 @@ struct device {
> struct iommu_group  *iommu_group;
> struct dev_iommu*iommu;
>
> +   enum device_sitesite;   /* Device physical location */
> +
> booloffline_disabled:1;
> booloffline:1;
> boolof_node_reused:1;
> @@ -806,6 +834,20 @@ static inline bool dev_has_sync_state(struct device *dev)
> return false;
>  }
>
> +static inline int dev_set_site(struct device *dev, enum device_site site)
> +{
> +   if (site < SITE_UNKNOWN || site > SITE_EXTERNAL)
> +   return -EINVAL;
> +
> +   dev->site = site;
> +   return 0;
> +}
> +
> +static inline bool dev_is_external(struct device *dev)
> +{
> +   return dev->site == SITE_EXTERNAL;
> +}

I'm not CC'ed in the rest of the patches in this series, so just
responding here. I see you use this function in patch 6/7 to decide if
the PCI device is trusted. Anything other than EXTERNAL is being
treated as trusted. I'd argue that anything that's not internal should
be distrusted. For example, I can have a hacked up laptop dock that I
can share with you when you visit my home/office and now you are
trusting it when you shouldn't be.

Also, "UNKNOWN" is treated as trusted in patch 6/7. I'm guessing this
is because some of the devices might not have the info in their
firmware? At which point, this feature isn't even protecting all the
PCI ports properly? This adds to Greg point that this should be a
userspace policy so that it can override whatever is wrong/missing in
the firmware.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] docs: IOMMU user API

2020-06-30 Thread Jacob Pan
On Tue, 30 Jun 2020 02:52:45 +
"Tian, Kevin"  wrote:

> > From: Jacob Pan
> > Sent: Tuesday, June 30, 2020 7:05 AM
> > 
> > On Fri, 26 Jun 2020 16:19:23 -0600
> > Alex Williamson  wrote:
> >   
> > > On Tue, 23 Jun 2020 10:03:53 -0700
> > > Jacob Pan  wrote:
> > >  
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > >
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > >
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 244
> > > > ++ 1 file changed, 244
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > >
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index ..f9e4ed90a413
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,244 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=
> > > > +IOMMU Userspace API
> > > > +=
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein a virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +===
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page requests
> > > > +
> > > > +Requirements
> > > > +
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +==
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable sized union in
> > > > that it +will break backward compatibility when offset moves. In
> > > > both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +Though at the same time, argsz is user provided data which is
> > > > not +trusted. The argsz field allows the user to indicate how
> > > > much data +they're providing, it's still the kernel's
> > > > responsibility to validate +whether it's correct and sufficient
> > > > for the requested operation. +
> > > > +Compatibility Checking
> > > > +--
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following cases:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on an older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain garbage.  
> > >
> > > What exactly does vfio need to do to handle these?
> > >  
> > VFIO does nothing other than returning 

Re: [PATCH 6/7] iommu/vt-d: Warn on out-of-range invalidation address

2020-06-30 Thread Jacob Pan
On Thu, 25 Jun 2020 18:10:43 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 2020/6/23 23:43, Jacob Pan wrote:
> > For guest requested IOTLB invalidation, address and mask are
> > provided as part of the invalidation data. VT-d HW silently ignores
> > any address bits below the mask. SW shall also allow such case but
> > give warning if address does not align with the mask. This patch
> > relax the fault handling from error to warning and proceed with
> > invalidation request with the given mask.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/iommu.c | 7 +++
> >   1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c
> > b/drivers/iommu/intel/iommu.c index 5ea5732d5ec4..50fc62413a35
> > 100644 --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5439,13 +5439,12 @@ intel_iommu_sva_invalidate(struct
> > iommu_domain *domain, struct device *dev, 
> > switch (BIT(cache_type)) {
> > case IOMMU_CACHE_INV_TYPE_IOTLB:
> > +   /* HW will ignore LSB bits based on
> > address mask */ if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
> > size &&
> > (inv_info->addr_info.addr &
> > ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> > -   pr_err_ratelimited("Address out of
> > range, 0x%llx, size order %llu\n",
> > -
> > inv_info->addr_info.addr, size);
> > -   ret = -ERANGE;
> > -   goto out_unlock;
> > +   WARN_ONCE(1, "Address out of
> > range, 0x%llx, size order %llu\n",
> > +
> > inv_info->addr_info.addr, size);  
> 
> I don't think WARN_ONCE() is suitable here. It makes users think it's
> a kernel bug. How about pr_warn_ratelimited()?
> 
I think pr_warn_ratelimited might still be too chatty. There is no
functional issues, we just don't to silently ignore it. Perhaps just
say:
WARN_ONCE(1, "User provided address not page aligned, alignment forced")
?

> Best regards,
> baolu
> 
> > }
> >   
> > /*
> >   

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>OK, well I see what you are saying, but if we intended to support all 3 for 
>Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that 
all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = , , ;
The reg can have all three or less and driver just configures based on reg and 
it works fine.

>It would be better to query the number of SMMUs populated in device-tree and 
>then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to 
support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] iommu/vt-d: Handle non-page aligned address

2020-06-30 Thread Jacob Pan
On Thu, 25 Jun 2020 18:05:52 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 2020/6/23 23:43, Jacob Pan wrote:
> > From: Liu Yi L 
> > 
> > Address information for device TLB invalidation comes from userspace
> > when device is directly assigned to a guest with vIOMMU support.
> > VT-d requires page aligned address. This patch checks and enforce
> > address to be page aligned, otherwise reserved bits can be set in
> > the invalidation descriptor. Unrecoverable fault will be reported
> > due to non-zero value in the reserved bits.
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/dmar.c | 19 +--
> >   1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index d9f973fa1190..53f4e5003620 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -1455,9 +1455,24 @@ void qi_flush_dev_iotlb_pasid(struct
> > intel_iommu *iommu, u16 sid, u16 pfsid,
> >  * Max Invs Pending (MIP) is set to 0 for now until we
> > have DIT in
> >  * ECAP.
> >  */
> > -   desc.qw1 |= addr & ~mask;
> > -   if (size_order)
> > +   if (addr & ~VTD_PAGE_MASK)
> > +   pr_warn_ratelimited("Invalidate non-page aligned
> > address %llx\n", addr); +
> > +   if (size_order) {
> > +   /* Take page address */
> > +   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr);  
> 
> If size_order == 0 (that means only a single page is about to be
> invalidated), do you still need to set ADDR field of the descriptor?
> 
Good catch! we should always set addr. I will move addr assignment out
of the if condition.
 .
> Best regards,
> baolu
> 
> > +   /*
> > +* Existing 0s in address below size_order may be
> > the least
> > +* significant bit, we must set them to 1s to
> > avoid having
> > +* smaller size than desired.
> > +*/
> > +   desc.qw1 |= GENMASK_ULL(size_order +
> > VTD_PAGE_SHIFT,
> > +   VTD_PAGE_SHIFT);
> > +   /* Clear size_order bit to indicate size */
> > +   desc.qw1 &= ~mask;
> > +   /* Set the S bit to indicate flushing more than 1
> > page */ desc.qw1 |= QI_DEV_EIOTLB_SIZE;
> > +   }
> >   
> > qi_submit_sync(iommu, , 1, 0);
> >   }
> >   

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit 
message.
However, Implementation supports option to configure 3 instances identically 
with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +   unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+ unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +int page, int offset, u32 val) {
>> +unsigned int i;
>> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make 
>the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope 
of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Greg Kroah-Hartman
On Tue, Jun 30, 2020 at 06:08:31PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jun 30, 2020 at 5:38 PM Greg Kroah-Hartman
>  wrote:
> >
> > On Tue, Jun 30, 2020 at 03:00:34PM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Jun 30, 2020 at 2:52 PM Greg Kroah-Hartman
> > >  wrote:
> > > >
> > > > On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> > > > > On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > > > > > Add a new (optional) field to denote the physical location of a 
> > > > > > device
> > > > > > in the system, and expose it in sysfs. This was discussed here:
> > > > > > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > > > > >
> > > > > > (The primary choice for attribute name i.e. "location" is already
> > > > > > exposed as an ABI elsewhere, so settled for "site"). Individual 
> > > > > > buses
> > > > > > that want to support this new attribute can opt-in by setting a 
> > > > > > flag in
> > > > > > bus_type, and then populating the location of device while 
> > > > > > enumerating
> > > > > > it.
> > > > >
> > > > > So why not just call it "physical_location"?
> > > >
> > > > That's better, and will allow us to put "3rd blue plug from the left,
> > > > 4th row down" in there someday :)
> > > >
> > > > All of this is "relative" to the CPU, right?  But what CPU?  Again, how
> > > > are the systems with drawers of PCI and CPUs and memory that can be
> > > > added/removed at any point in time being handled here?  What is
> > > > "internal" and "external" for them?
> > > >
> > > > What exactly is the physical boundry here that is attempting to be
> > > > described?
> > >
> > > Also, where is the "physical location" information going to come from?
> >
> > Who knows?  :)
> >
> > Some BIOS seem to provide this, but do you trust that?
> >
> > > If that is the platform firmware (which I suspect is the anticipated
> > > case), there may be problems with reliability related to that.
> >
> > s/may/will/
> >
> > which means making the kernel inact a policy like this patch series
> > tries to add, will result in a lot of broken systems, which is why I
> > keep saying that it needs to be done in userspace.
> >
> > It's as if some of us haven't been down this road before and just keep
> > being ignored...
> >
> > {sigh}
> 
> Well, to be honest, if you are a "vertical" vendor and you control the
> entire stack, *including* the platform firmware, it would be kind of
> OK for you to do that in a product kernel.
> 
> However, this is not a practical thing to do in the mainline kernel
> which must work for everybody, including people who happen to use
> systems with broken or even actively unfriendly firmware on them.
> 
> So I'm inclined to say that IMO this series "as is" would not be an
> improvement from the mainline perspective.

It can be, we have been using this for USB devices for many many years
now, quite successfully.  The key is not to trust that the platform
firmware got it right :)

> I guess it would make sense to have an attribute for user space to
> write to in order to make the kernel reject device plug-in events
> coming from a given port or connector, but the kernel has no reliable
> means to determine *which* ports or connectors are "safe", and even if
> there was a way for it to do that, it still may not agree with user
> space on which ports or connectors should be regarded as "safe".

Again, we have been doing this for USB devices for a very long time, PCI
shouldn't be any different.  Why people keep ignoring working solutions
is beyond me, there's nothing "special" about PCI devices here for this
type of "worry" or reasoning to try to create new solutions.

So, again, I ask, go do what USB does, and to do that, take the logic
out of the USB core, make it bus-agnositic, and _THEN_ add it to the PCI
code.  Why the original submitter keeps ignoring my request to do this
is beyond me, I guess they like making patches that will get rejected :(

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()

2020-06-30 Thread Robin Murphy

On 2020-06-30 02:03, Lu Baolu wrote:

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:

On 2020-06-27 04:15, Lu Baolu wrote:

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the aux-domain
api's.


Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.


Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...


Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich new 
opportunities to fundamentally break correct operation of the API:


dom = iommu_domain_alloc();
iommu_attach_group(dom, grp);
...
iommu_group_set_domain(grp, NULL);
// oops, leaked and can't ever detach properly now

or perhaps:

grp = iommu_group_alloc();
iommu_group_add_device(grp, dev);
iommu_group_set_domain(grp, dom);
...
iommu_detach_group(dom, grp);
// oops, IOMMU driver might not handle this

If a regular device is attached to one or more aux domains for PASID 
use, iommu_get_domain_for_dev() is still going to return the primary 
domain, so why should it be expected to behave differently for mediated


Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver mentioned
above.

Yes. All you mentioned are right for the parent device. But for mediated
device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,


Only if they want to follow this very specific notion of using made-up 
devices and groups to represent aux attachments. Even if a driver 
managing its own aux domains entirely privately does create child 
devices for them, it's not like it can't keep its domain pointers in 
drvdata if it wants to ;)


Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.



   struct iommu_domain *domain;
   struct device *dev = mdev_dev(mdev);
   unsigned long pasid;

   domain = iommu_get_domain_for_dev(dev);
   if (!domain)
   return -ENODEV;

   pasid = iommu_aux_get_pasid(domain, dev->parent);
   if (pasid == IOASID_INVALID)
   return -EINVAL;

   /* Program the device context with the PASID value */
   

Without this fix, iommu_get_domain_for_dev() always returns NULL and the
device driver has no means to support aux-domain.


So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. Either 
way, simply punching holes in the API for mdev to hack around its own 
mess doesn't seem like the best thing to do.


The initial impression I got was that it's implicitly assumed here that 
the mdev itself is attached to exactly one aux domain and nothing else, 
at which point I would wonder why it's using aux at all, but are you 
saying that in fact no attach happens with the mdev group either way, 
only to the parent device?


I'll admit I'm not hugely familiar with any of this, but it seems to me 
that the logical flow should be:


- allocate domain
- attach as aux to parent
- retrieve aux domain PASID
- create mdev child based on PASID
- attach mdev to domain (normally)

Of course that might require giving the 

Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 17:32, Jon Hunter wrote:
> On 30/06/2020 17:23, Krishna Reddy wrote:
 +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
 +*smmu) {
 +  unsigned int i;
>> 
 +  for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
 +  struct resource *res;
 +
 +  res = platform_get_resource(pdev, IORESOURCE_MEM, i);
 +  if (!res)
 +  break;
>>
>>> Currently this driver is only supported for Tegra194 which I understand has 
>>> 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think 
>>> it is better to return an error if all 3 cannot be initialised.
>>
>> Initialization of all the three SMMU instances is not necessary here.
> 
> That is not what I am saying.
> 
>> The driver can work with all the possible number of instances 1, 2 and 3 
>> based on the DT config though it doesn't make much sense to use it with 1 
>> instance.
>> There is no silent failure here from driver point of view. If there is 
>> misconfig in DT, SMMU faults would catch issues.
> 
> I disagree and you should return a proper error here.

OK, well I see what you are saying, but if we intended to support all 3
for Tegra194, then we should ensure all 3 are initialised correctly.

My concern here is testing, because when things break in upstream I am
usually the one that tracks it down. Not having clear warning/error
messages when something is not initialised as expected makes it harder.

It would be better to query the number of SMMUs populated in device-tree
and then ensure that all are initialised correctly.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter



On 30/06/2020 17:23, Krishna Reddy wrote:
>>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>>> +*smmu) {
>>> +   unsigned int i;
> 
>>> +   for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +   struct resource *res;
>>> +
>>> +   res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +   if (!res)
>>> +   break;
> 
>> Currently this driver is only supported for Tegra194 which I understand has 
>> 3 SMMUs. Therefore, I don't feel that we should fail silently here, I think 
>> it is better to return an error if all 3 cannot be initialised.
> 
> Initialization of all the three SMMU instances is not necessary here.

That is not what I am saying.

> The driver can work with all the possible number of instances 1, 2 and 3 
> based on the DT config though it doesn't make much sense to use it with 1 
> instance.
> There is no silent failure here from driver point of view. If there is 
> misconfig in DT, SMMU faults would catch issues.

I disagree and you should return a proper error here.

>>> +   nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +   if (IS_ERR(nvidia_smmu->bases[i]))
>>> +   return ERR_CAST(nvidia_smmu->bases[i]);
> 
>> You want to use PTR_ERR() here.
> 
> PTR_ERR() returns long integer. 
> This function returns a pointer. ERR_CAST is the right one to use here. 

Ah yes, indeed. OK that's fine.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +unsigned int i;

>> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +struct resource *res;
>> +
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +if (!res)
>> +break;

>Currently this driver is only supported for Tegra194 which I understand has 3 
>SMMUs. Therefore, I don't feel that we should fail silently here, I think it 
>is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based 
on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is 
misconfig in DT, SMMU faults would catch issues.

>> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +if (IS_ERR(nvidia_smmu->bases[i]))
>> +return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Rafael J. Wysocki
On Tue, Jun 30, 2020 at 5:38 PM Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 30, 2020 at 03:00:34PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jun 30, 2020 at 2:52 PM Greg Kroah-Hartman
> >  wrote:
> > >
> > > On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> > > > On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > > > > Add a new (optional) field to denote the physical location of a device
> > > > > in the system, and expose it in sysfs. This was discussed here:
> > > > > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > > > >
> > > > > (The primary choice for attribute name i.e. "location" is already
> > > > > exposed as an ABI elsewhere, so settled for "site"). Individual buses
> > > > > that want to support this new attribute can opt-in by setting a flag 
> > > > > in
> > > > > bus_type, and then populating the location of device while enumerating
> > > > > it.
> > > >
> > > > So why not just call it "physical_location"?
> > >
> > > That's better, and will allow us to put "3rd blue plug from the left,
> > > 4th row down" in there someday :)
> > >
> > > All of this is "relative" to the CPU, right?  But what CPU?  Again, how
> > > are the systems with drawers of PCI and CPUs and memory that can be
> > > added/removed at any point in time being handled here?  What is
> > > "internal" and "external" for them?
> > >
> > > What exactly is the physical boundry here that is attempting to be
> > > described?
> >
> > Also, where is the "physical location" information going to come from?
>
> Who knows?  :)
>
> Some BIOS seem to provide this, but do you trust that?
>
> > If that is the platform firmware (which I suspect is the anticipated
> > case), there may be problems with reliability related to that.
>
> s/may/will/
>
> which means making the kernel inact a policy like this patch series
> tries to add, will result in a lot of broken systems, which is why I
> keep saying that it needs to be done in userspace.
>
> It's as if some of us haven't been down this road before and just keep
> being ignored...
>
> {sigh}

Well, to be honest, if you are a "vertical" vendor and you control the
entire stack, *including* the platform firmware, it would be kind of
OK for you to do that in a product kernel.

However, this is not a practical thing to do in the mainline kernel
which must work for everybody, including people who happen to use
systems with broken or even actively unfriendly firmware on them.

So I'm inclined to say that IMO this series "as is" would not be an
improvement from the mainline perspective.

I guess it would make sense to have an attribute for user space to
write to in order to make the kernel reject device plug-in events
coming from a given port or connector, but the kernel has no reliable
means to determine *which* ports or connectors are "safe", and even if
there was a way for it to do that, it still may not agree with user
space on which ports or connectors should be regarded as "safe".

Cheers!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Greg Kroah-Hartman
On Tue, Jun 30, 2020 at 03:00:34PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jun 30, 2020 at 2:52 PM Greg Kroah-Hartman
>  wrote:
> >
> > On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> > > On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > > > Add a new (optional) field to denote the physical location of a device
> > > > in the system, and expose it in sysfs. This was discussed here:
> > > > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > > >
> > > > (The primary choice for attribute name i.e. "location" is already
> > > > exposed as an ABI elsewhere, so settled for "site"). Individual buses
> > > > that want to support this new attribute can opt-in by setting a flag in
> > > > bus_type, and then populating the location of device while enumerating
> > > > it.
> > >
> > > So why not just call it "physical_location"?
> >
> > That's better, and will allow us to put "3rd blue plug from the left,
> > 4th row down" in there someday :)
> >
> > All of this is "relative" to the CPU, right?  But what CPU?  Again, how
> > are the systems with drawers of PCI and CPUs and memory that can be
> > added/removed at any point in time being handled here?  What is
> > "internal" and "external" for them?
> >
> > What exactly is the physical boundry here that is attempting to be
> > described?
> 
> Also, where is the "physical location" information going to come from?

Who knows?  :)

Some BIOS seem to provide this, but do you trust that?

> If that is the platform firmware (which I suspect is the anticipated
> case), there may be problems with reliability related to that.

s/may/will/

which means making the kernel inact a policy like this patch series
tries to add, will result in a lot of broken systems, which is why I
keep saying that it needs to be done in userspace.

It's as if some of us haven't been down this road before and just keep
being ignored...

{sigh}

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter

On 30/06/2020 15:53, Robin Murphy wrote:
> On 2020-06-30 09:19, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
>>> IOVA accesses across them.
>>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
>>> string for Tegra194 SoC SMMU topology.
>>
>> There is no description here of the 3rd SMMU that you mention below.
>> I think that we should describe the full picture here.
>>  
>>> Signed-off-by: Krishna Reddy 

...

>>> +static void nvidia_smmu_tlb_sync(struct arm_smmu_device *smmu, int
>>> page,
>>> +   int sync, int status)
>>> +{
>>> +    unsigned int delay;
>>> +
>>> +    arm_smmu_writel(smmu, page, sync, 0);
>>> +
>>> +    for (delay = 1; delay < TLB_LOOP_TIMEOUT_IN_US; delay *= 2) {
>>
>> So we are doubling the delay every time? Is this better than just using
>> the same on each loop?
> 
> This is the same logic as the main driver (see 8513c8930069) - the sync
> is expected to complete relatively quickly, hence why we have the inner
> spin loop to avoid the delay entirely in the typical case, and the
> longer it's taking, the more likely it is that something's wrong and it
> will never complete anyway. Realistically, a heavily loaded SMMU at a
> modest clock rate might take us through a couple of iterations of the
> outer loop, but beyond that we're pretty much just killing time until we
> declare it wedged and give up, and by then there's not much point in
> burning power frantically hamering on the interconnect.

Ah OK. Then maybe we should move the definitions for TLB_LOOP_TIMEOUT
and TLB_SPIN_COUNT into the arm-smmu.h so that we can use them directly
in this file instead of redefining them. Then it maybe clear that these
are part of the main driver.

 >>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device
>>> *smmu)
>>> +{
>>> +    unsigned int i;
>>> +    struct nvidia_smmu *nvidia_smmu;
>>> +    struct platform_device *pdev = to_platform_device(smmu->dev);
>>> +
>>> +    nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu),
>>> GFP_KERNEL);
>>> +    if (!nvidia_smmu)
>>> +    return ERR_PTR(-ENOMEM);
>>> +
>>> +    nvidia_smmu->smmu = *smmu;
>>> +    /* Instance 0 is ioremapped by arm-smmu.c after this function
>>> returns */
>>> +    nvidia_smmu->num_inst = 1;
>>> +
>>> +    for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>>> +    struct resource *res;
>>> +
>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>>> +    if (!res)
>>> +    break;
>>> +
>>> +    nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>>> +    if (IS_ERR(nvidia_smmu->bases[i]))
>>> +    return ERR_CAST(nvidia_smmu->bases[i]);
>>> +
>>> +    nvidia_smmu->num_inst++;
>>> +    }
>>> +
>>> +    nvidia_smmu->smmu.impl = _smmu_impl;
>>> +    /*
>>> + * Free the arm_smmu_device struct allocated in arm-smmu.c.
>>> + * Once this function returns, arm-smmu.c would use arm_smmu_device
>>> + * allocated as part of nvidia_smmu struct.
>>> + */
>>> +    devm_kfree(smmu->dev, smmu);
>>
>> Why don't we just store the pointer of the smmu struct passed to this
>> function
>> in the nvidia_smmu struct and then we do not need to free this here.
>> In other
>> words make ...
>>
>>   struct nvidia_smmu {
>> struct arm_smmu_device    *smmu;
>> unsigned int    num_inst;
>> void __iomem    *bases[MAX_SMMU_INSTANCES];
>>   };
>>
>> This seems more appropriate, than copying the struct and freeing memory
>> allocated else-where.
> 
> But then how do you get back to struct nvidia_smmu given just a pointer
> to struct arm_smmu_device?

Ah yes of course that is what I was missing. I wondered what was going
on here. So I think we should add a nice comment in the above function
of why we are copying this and cannot simply store the pointer.

Cheers
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Robin Murphy

On 2020-06-30 09:19, Jon Hunter wrote:


On 30/06/2020 01:10, Krishna Reddy wrote:

NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.


There is no description here of the 3rd SMMU that you mention below.
I think that we should describe the full picture here.
  

Signed-off-by: Krishna Reddy 
---
  MAINTAINERS |   2 +
  drivers/iommu/Makefile  |   2 +-
  drivers/iommu/arm-smmu-impl.c   |   3 +
  drivers/iommu/arm-smmu-nvidia.c | 196 
  drivers/iommu/arm-smmu.h|   1 +
  5 files changed, 203 insertions(+), 1 deletion(-)
  create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
  
  TEGRA IOMMU DRIVERS

  M:Thierry Reding 
+R: Krishna Reddy 
  L:linux-te...@vger.kernel.org
  S:Supported
+F: drivers/iommu/arm-smmu-nvidia.c
  F:drivers/iommu/tegra*
  
  TEGRA KBC DRIVER

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
  
+	if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))


Nit: please use "np" like all the surrounding code does.


+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.


I don't understand the "When" there - the driver has always supported 
multiple independent SMMUs, and it's not something that could be 
configured out or otherwise disabled. Plus I really don't see why you 
would ever want to force unrelated SMMUs to be programmed together - 
beyond the TLB thing mentioned it would also waste precious context bank 
resources and might lead to weird device grouping via false stream ID 
aliasing, with no obvious upside at all.



+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)


If you run checkpatch --strict on these you will get a lot 

Re: [PATCH net] xsk: remove cheap_dma optimization

2020-06-30 Thread Daniel Borkmann

On 6/30/20 7:07 AM, Christoph Hellwig wrote:

On Mon, Jun 29, 2020 at 05:18:38PM +0200, Daniel Borkmann wrote:

On 6/29/20 5:10 PM, Björn Töpel wrote:

On 2020-06-29 15:52, Daniel Borkmann wrote:


Ok, fair enough, please work with DMA folks to get this properly integrated and
restored then. Applied, thanks!


Daniel, you were too quick! Please revert this one; Christoph just submitted a 
4-patch-series that addresses both the DMA API, and the perf regression!


Nice, tossed from bpf tree then! (Looks like it didn't land on the bpf list yet,
but seems other mails are currently stuck as well on vger. I presume it will be
routed to Linus via Christoph?)


I send the patches to the bpf list, did you get them now that vger
is unclogged?  Thinking about it the best route might be through
bpf/net, so if that works for you please pick it up.


Yeah, that's fine, I just applied your series to the bpf tree. Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/12] ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC

2020-06-30 Thread Hanjun Guo

On 2020/6/30 18:24, Lorenzo Pieralisi wrote:

On Tue, Jun 30, 2020 at 11:06:41AM +0800, Hanjun Guo wrote:

[...]


For devices that aren't described in the DSDT - IORT translations
are determined by their ACPI parent device. Do you see/Have you
found any issue with this approach ?


The spec says "Describes the IO relationships between devices
represented in the ACPI namespace.", and in section 3.1.1.3 Named
component node, it says:


PCI devices aren't necessarily described in the ACPI namespace and we
still use IORT to describe them - through the RC node.


"Named component nodes are used to describe devices that are also
included in the Differentiated System Description Table (DSDT). See
[ACPI]."

So from my understanding, the IORT spec for now, can only do ID
translations for devices in the DSDT.


I think you can read this multiple ways but this patch does not
change this concept. What changes, is applying parent's node IORT
mapping to child nodes with no associated DSDT nodes, it is the
same thing we do with PCI and the _DMA method - we could update
the wording in the specs if that clarifies but I don't think this
deliberately disregards the specifications.


I agree, but it's better to update the wording of the spec.




For a platform device, if I use its parent's full path name for
its named component entry, then it will match, but this will violate
the IORT spec.


Can you elaborate on this please I don't get the point you
are making.


For example, device A is not described in DSDT so can't represent
as a NC node in IORT. Device B can be described in DSDT and it
is the parent of device A, so device B can be represented in IORT
with memory access properties and node flags with Substream width
and Stall supported info.

When we trying to translate device A's ID, we reuse all the memory
access properties and node flags from its parent (device B), but
will it the same?


I assume so why wouldn't it be ? Why would be describe them in
a parent-child relationship if that's not how the system looks like
in HW ?


The point I'm making is that I'm not sure all the memory access and
stall properties are the same for the parent and the device itself.



Do you have a specific example in mind that we should be aware of ?


So the IORT spec don't support this, at least it's pretty vague
I think.


I think that's a matter of wording, it can be updated if it needs be,
reach out if you see any issue with the current approach please.


If the all the properties for parent and device itself are the same,
I have no strong opinion for this patch, but it's better to update
the wording of the spec as well.

Thanks
Hanjun

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Rafael J. Wysocki
On Tue, Jun 30, 2020 at 2:52 PM Greg Kroah-Hartman
 wrote:
>
> On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> > On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > > Add a new (optional) field to denote the physical location of a device
> > > in the system, and expose it in sysfs. This was discussed here:
> > > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > >
> > > (The primary choice for attribute name i.e. "location" is already
> > > exposed as an ABI elsewhere, so settled for "site"). Individual buses
> > > that want to support this new attribute can opt-in by setting a flag in
> > > bus_type, and then populating the location of device while enumerating
> > > it.
> >
> > So why not just call it "physical_location"?
>
> That's better, and will allow us to put "3rd blue plug from the left,
> 4th row down" in there someday :)
>
> All of this is "relative" to the CPU, right?  But what CPU?  Again, how
> are the systems with drawers of PCI and CPUs and memory that can be
> added/removed at any point in time being handled here?  What is
> "internal" and "external" for them?
>
> What exactly is the physical boundry here that is attempting to be
> described?

Also, where is the "physical location" information going to come from?

If that is the platform firmware (which I suspect is the anticipated
case), there may be problems with reliability related to that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Greg Kroah-Hartman
On Tue, Jun 30, 2020 at 01:49:48PM +0300, Heikki Krogerus wrote:
> On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> > Add a new (optional) field to denote the physical location of a device
> > in the system, and expose it in sysfs. This was discussed here:
> > https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> > 
> > (The primary choice for attribute name i.e. "location" is already
> > exposed as an ABI elsewhere, so settled for "site"). Individual buses
> > that want to support this new attribute can opt-in by setting a flag in
> > bus_type, and then populating the location of device while enumerating
> > it.
> 
> So why not just call it "physical_location"?

That's better, and will allow us to put "3rd blue plug from the left,
4th row down" in there someday :)

All of this is "relative" to the CPU, right?  But what CPU?  Again, how
are the systems with drawers of PCI and CPUs and memory that can be
added/removed at any point in time being handled here?  What is
"internal" and "external" for them?

What exactly is the physical boundry here that is attempting to be
described?

thanks,

greg "not all the world is your laptop" k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Make amd_iommu_apply_ivrs_quirks() static inline

2020-06-30 Thread Joerg Roedel
From: Joerg Roedel 

At least the version in the header file to fix a compile warning about
the function being unused.

Reported-by: Borislav Petkov 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/amd_iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f892992c8744..57309716fd18 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -102,7 +102,7 @@ extern int __init add_special_device(u8 type, u8 id, u16 
*devid,
 #ifdef CONFIG_DMI
 void amd_iommu_apply_ivrs_quirks(void);
 #else
-static void amd_iommu_apply_ivrs_quirks(void) { }
+static inline void amd_iommu_apply_ivrs_quirks(void) { }
 #endif
 
 #endif
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-30 Thread Jon Hunter

On 30/06/2020 13:13, Robin Murphy wrote:
> On 2020-06-30 09:37, Jon Hunter wrote:
>>
>> On 30/06/2020 01:10, Krishna Reddy wrote:
>>> Add global/context fault hooks to allow NVIDIA SMMU implementation
>>> handle faults across multiple SMMUs.
>>
>> Nit ... this is not just for NVIDIA, but this allows anyone to add
>> custom global/context and fault hooks. So I think that the changelog
>> should be clear that this change permits custom fault hooks and that
>> custom fault hooks are needed for the Tegra194 SMMU. You may also want
>> to say why.
>>
>>>
>>> Signed-off-by: Krishna Reddy 
>>> ---
>>>   drivers/iommu/arm-smmu-nvidia.c | 98 +
>>>   drivers/iommu/arm-smmu.c    | 17 +-
>>>   drivers/iommu/arm-smmu.h    |  3 +
>>>   3 files changed, 116 insertions(+), 2 deletions(-)

...

>>> @@ -835,7 +836,13 @@ static int arm_smmu_init_domain_context(struct
>>> iommu_domain *domain,
>>>    * handler seeing a half-initialised domain state.
>>>    */
>>>   irq = smmu->irqs[smmu->num_global_irqs + cfg->irptndx];
>>> -    ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault,
>>> +
>>> +    if (smmu->impl && smmu->impl->context_fault)
>>> +    context_fault = smmu->impl->context_fault;
>>> +    else
>>> +    context_fault = arm_smmu_context_fault;
>>
>> Why not see the default smmu->impl->context_fault to
>> arm_smmu_context_fault in arm_smmu_impl_init() and then allow the
>> various implementations to override as necessary? Then you can get rid
>> of this context_fault variable here and just use
>> smmu->impl->context_fault below.
> 
> Because the default smmu->impl is NULL. And as I've said before, NAK to
> forcing the common case to allocate a set of "quirks" purely to override
> the default IRQ handler with the default IRQ handler ;)


Ah OK, makes sense. Sorry I am a bit late to the review :-)

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-30 Thread Robin Murphy

On 2020-06-30 01:10, Krishna Reddy wrote:

Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
  1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
- qcom,sc7180-smmu-500
- qcom,sdm845-smmu-500
- const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"


Hmm, there must be a better way to word that to express that it only 
applies to the sets of SMMUs that must be programmed identically, and 
not any other independent MMU-500s that might also happen to be in the 
same SoC.



+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500


Is the fallback compatible appropriate here? If software treats this as 
a standard MMU-500 it will only program the first instance (because the 
second isn't presented as a separate MMU-500) - is there any way that 
isn't going to blow up?


Robin.


- items:
- const: arm,mmu-500
- const: arm,smmu-v2


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-30 Thread Robin Murphy

On 2020-06-30 09:37, Jon Hunter wrote:


On 30/06/2020 01:10, Krishna Reddy wrote:

Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.


Nit ... this is not just for NVIDIA, but this allows anyone to add
custom global/context and fault hooks. So I think that the changelog
should be clear that this change permits custom fault hooks and that
custom fault hooks are needed for the Tegra194 SMMU. You may also want
to say why.



Signed-off-by: Krishna Reddy 
---
  drivers/iommu/arm-smmu-nvidia.c | 98 +
  drivers/iommu/arm-smmu.c| 17 +-
  drivers/iommu/arm-smmu.h|  3 +
  3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 1124f0ac1823a..c9423b4199c65 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
  }
  
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)

+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;


Should be unsigned


+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;


Any chance there could be more than one SMMU faulting by the time we
service the interrupt?


It certainly seems plausible if the interconnect is automatically 
load-balancing requests across the SMMU instances - say a driver bug 
caused a buffer to be unmapped too early, there could be many in-flight 
accesses to parts of that buffer that aren't all taking the same path 
and thus could now fault in parallel.


[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
buffer/tear down a domain/ ;) ]


Either way I think it would be easier to reason about if we just handled 
these like a typical shared interrupt and always checked all the instances.



+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;


Unsigned


+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; 

Re: [PATCH v5 03/10] iommu/mediatek: Modify the usage of mtk_iommu_plat_data structure

2020-06-30 Thread chao hao
On Tue, 2020-06-30 at 18:56 +0800, Yong Wu wrote:
> Hi Chao,
> 
> This is also ok for me. Only two format nitpick.
> 
> On Mon, 2020-06-29 at 15:13 +0800, Chao Hao wrote:
> > Given the fact that we are adding more and more plat_data bool values,
> > it would make sense to use a u32 flags register and add the appropriate
> > macro definitions to set and check for a flag present.
> > No functional change.
> > 
> > Suggested-by: Matthias Brugger 
> > Signed-off-by: Chao Hao 
> > ---
> 
> [snip]
> 
> >  static const struct mtk_iommu_plat_data mt2712_data = {
> > .m4u_plat = M4U_MT2712,
> > -   .has_4gb_mode = true,
> > -   .has_bclk = true,
> > -   .has_vld_pa_rng   = true,
> > +   .flags= HAS_4GB_MODE |
> > +   HAS_BCLK |
> > +   HAS_VLD_PA_RNG,
> 
> short enough. we can put it in one line?

ok, I will try to put it in one line in next version, thanks

> 
> > .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
> >  };
> >  
> >  static const struct mtk_iommu_plat_data mt8173_data = {
> > .m4u_plat = M4U_MT8173,
> > -   .has_4gb_mode = true,
> > -   .has_bclk = true,
> > -   .reset_axi= true,
> > +   .flags= HAS_4GB_MODE |
> > +   HAS_BCLK |
> > +   RESET_AXI,
> > .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
> >  };
> >  
> >  static const struct mtk_iommu_plat_data mt8183_data = {
> > .m4u_plat = M4U_MT8183,
> > -   .reset_axi= true,
> > +   .flags= RESET_AXI,
> > .larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
> >  };
> >  
> > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> > index 1b6ea839b92c..7cc39f729263 100644
> > --- a/drivers/iommu/mtk_iommu.h
> > +++ b/drivers/iommu/mtk_iommu.h
> > @@ -17,6 +17,15 @@
> >  #include 
> >  #include 
> >  
> > +#define HAS_4GB_MODE   BIT(0)
> > +/* HW will use the EMI clock if there isn't the "bclk". */
> > +#define HAS_BCLK   BIT(1)
> > +#define HAS_VLD_PA_RNG BIT(2)
> > +#define RESET_AXI  BIT(3)
> > +
> > +#define MTK_IOMMU_HAS_FLAG(pdata, _x) \
> > +   pdata)->flags) & (_x)) == (_x))
> 
> If these definitions are not used in mtk_iommu_v1.c(also no this plan),
> then we can put them in the mtk_iommu.c.
> 

ok, mtk_iommu_v1.c doesn't use these definitions.
I will move them to mtk_iommu.c in next version, thanks.

> 
> BTW, the patch title "modify the usage of mtk_iommu_plat_data structure"
> isn't so clear, we could write what the detailed modification is.
> something like:
> iommu/mediatek: Use a u32 flags to describe different HW features
> 
got it , thanks for you advice.


> > +
> >  struct mtk_iommu_suspend_reg {
> > u32 misc_ctrl;
> > u32 dcm_dis;
> > @@ -36,12 +45,7 @@ enum mtk_iommu_plat {
> >  
> >  struct mtk_iommu_plat_data {
> > enum mtk_iommu_plat m4u_plat;
> > -   boolhas_4gb_mode;
> > -
> > -   /* HW will use the EMI clock if there isn't the "bclk". */
> > -   boolhas_bclk;
> > -   boolhas_vld_pa_rng;
> > -   boolreset_axi;
> > +   u32 flags;
> > unsigned char   larbid_remap[MTK_LARB_NR_MAX];
> >  };
> >  
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/10] iommu/mediatek: Add sub_comm id in translation fault

2020-06-30 Thread chao hao
On Tue, 2020-06-30 at 18:55 +0800, Yong Wu wrote:
> On Mon, 2020-06-29 at 15:13 +0800, Chao Hao wrote:
> > The max larb number that a iommu HW support is 8(larb0~larb7 in the below
> > diagram).
> > If the larb's number is over 8, we use a sub_common for merging
> > several larbs into one larb. At this case, we will extend larb_id:
> > bit[11:9] means common-id;
> > bit[8:7] means subcommon-id;
> > From these two variables, we could get the real larb number when
> > translation fault happen.
> > The diagram is as below:
> >  EMI
> >   |
> > IOMMU
> >   |
> >-
> >|   |
> > common1 common0
> >|   |
> >-
> >   |
> >  smi common
> >   |
> >   
> >   |   |   |   | ||
> >  3'd03'd13'd23'd3  ...  3'd7   <-common_id(max is 8)
> >   |   |   |   | ||
> > Larb0   Larb1 | Larb3  ... Larb7
> >   |
> > smi sub common
> >   |
> >  --
> >  ||   |   |
> > 2'd0 2'd12'd22'd3   <-sub_common_id(max is 4)
> >  ||   |   |
> >Larb8Larb9   Larb10  Larb11
> > 
> > In this patch we extend larb_remap[] to larb_remap[8][4] for this.
> > larb_remap[x][y]: x means common-id above, y means subcommon_id above.
> > 
> > We can also distinguish if the M4U HW has sub_common by HAS_SUB_COMM
> > macro.
> > 
> > Cc: Matthias Brugger 
> > Signed-off-by: Chao Hao 
> > Reviewed-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c  | 20 +---
> >  drivers/iommu/mtk_iommu.h  |  3 ++-
> >  include/soc/mediatek/smi.h |  2 ++
> >  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> [snip]
> 
> > @@ -48,7 +49,7 @@ struct mtk_iommu_plat_data {
> > enum mtk_iommu_plat m4u_plat;
> > u32 flags;
> > u32 inv_sel_reg;
> > -   unsigned char   larbid_remap[MTK_LARB_NR_MAX];
> > +   unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
> >  };
> >  
> >  struct mtk_iommu_domain;
> > diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
> > index 5a34b87d89e3..fa65a55468e2 100644
> > --- a/include/soc/mediatek/smi.h
> > +++ b/include/soc/mediatek/smi.h
> > @@ -12,6 +12,8 @@
> >  #ifdef CONFIG_MTK_SMI
> >  
> >  #define MTK_LARB_NR_MAX16
> > +#define MTK_LARB_COM_MAX   8
> > +#define MTK_LARB_SUBCOM_MAX4
> 
> Both are only used in mtk_iommu.h, and I don't think smi has plan to use
> them. thus we could move them into mtk_iommu.h
> 
ok, got it. Thanks for your advice.

> >  
> >  #define MTK_SMI_MMU_EN(port)   BIT(port)
> >  
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/10] iommu/mediatek: Modify MMU_CTRL register setting

2020-06-30 Thread chao hao
On Mon, 2020-06-29 at 12:28 +0200, Matthias Brugger wrote:
> 
> On 29/06/2020 09:13, Chao Hao wrote:
> > MT8173 is different from other SoCs for MMU_CTRL register.
> > For mt8173, its bit9 is in_order_write_en and doesn't use its
> > default 1'b1.> For other SoCs, bit[12] represents victim_tlb_en feature and
> > victim_tlb is enable defaultly(bit[12]=1), if we use
> > "regval = F_MMU_TF_PROT_TO_PROGRAM_ADDR", victim_tlb will be
> > disabled, it will drop iommu performace.
> > So we need to deal with the setting of MMU_CTRL separately
> > for mt8173 and others.
> > 
> 
> My proposal to rewrite the commit message:
> 
> The MMU_CTRL regiser of MT8173 is different from other SoCs. The 
> in_order_wr_en
> is bit[9] which is zero by default.
> Other SoCs have the vitcim_tlb_en feature mapped to bit[12]. This bit is set 
> to
> one by default. We need to preserve the bit when setting
> F_MMU_TF_PROT_TO_PROGRAM_ADDR as otherwise the bit will be cleared and IOMMU
> performance will drop.

got it, thanks for your advice very much.

> 
> 
> > Suggested-by: Matthias Brugger 
> > Suggested-by: Yong Wu 
> > Signed-off-by: Chao Hao 
> > ---
> >  drivers/iommu/mtk_iommu.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 8299a3299090..e46e2deee3fd 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -543,11 +543,12 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> > return ret;
> > }
> >  
> > +   regval = readl_relaxed(data->base + REG_MMU_CTRL_REG);
> 
> The read is only needed in the else branch.
> 
ok, thanks

> > if (data->plat_data->m4u_plat == M4U_MT8173)
> > regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
> >  F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
> > else
> > -   regval = F_MMU_TF_PROT_TO_PROGRAM_ADDR;
> > +   regval |= F_MMU_TF_PROT_TO_PROGRAM_ADDR;
> > writel_relaxed(regval, data->base + REG_MMU_CTRL_REG);
> >  
> > regval = F_L2_MULIT_HIT_EN |
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/10] iommu/mediatek: Add sub_comm id in translation fault

2020-06-30 Thread Yong Wu
On Mon, 2020-06-29 at 15:13 +0800, Chao Hao wrote:
> The max larb number that a iommu HW support is 8(larb0~larb7 in the below
> diagram).
> If the larb's number is over 8, we use a sub_common for merging
> several larbs into one larb. At this case, we will extend larb_id:
> bit[11:9] means common-id;
> bit[8:7] means subcommon-id;
> From these two variables, we could get the real larb number when
> translation fault happen.
> The diagram is as below:
>EMI
> |
>   IOMMU
> |
>-
>  |   |
>   common1 common0
>  |   |
>  -
> |
>  smi common
> |
>   
>   |   |   |   | ||
>  3'd03'd13'd23'd3  ...  3'd7   <-common_id(max is 8)
>   |   |   |   | ||
> Larb0   Larb1 | Larb3  ... Larb7
> |
>   smi sub common
> |
>  --
>  ||   |   |
> 2'd0 2'd12'd22'd3   <-sub_common_id(max is 4)
>  ||   |   |
>Larb8Larb9   Larb10  Larb11
> 
> In this patch we extend larb_remap[] to larb_remap[8][4] for this.
> larb_remap[x][y]: x means common-id above, y means subcommon_id above.
> 
> We can also distinguish if the M4U HW has sub_common by HAS_SUB_COMM
> macro.
> 
> Cc: Matthias Brugger 
> Signed-off-by: Chao Hao 
> Reviewed-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c  | 20 +---
>  drivers/iommu/mtk_iommu.h  |  3 ++-
>  include/soc/mediatek/smi.h |  2 ++
>  3 files changed, 17 insertions(+), 8 deletions(-)

[snip]

> @@ -48,7 +49,7 @@ struct mtk_iommu_plat_data {
>   enum mtk_iommu_plat m4u_plat;
>   u32 flags;
>   u32 inv_sel_reg;
> - unsigned char   larbid_remap[MTK_LARB_NR_MAX];
> + unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
>  };
>  
>  struct mtk_iommu_domain;
> diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
> index 5a34b87d89e3..fa65a55468e2 100644
> --- a/include/soc/mediatek/smi.h
> +++ b/include/soc/mediatek/smi.h
> @@ -12,6 +12,8 @@
>  #ifdef CONFIG_MTK_SMI
>  
>  #define MTK_LARB_NR_MAX  16
> +#define MTK_LARB_COM_MAX 8
> +#define MTK_LARB_SUBCOM_MAX  4

Both are only used in mtk_iommu.h, and I don't think smi has plan to use
them. thus we could move them into mtk_iommu.h

>  
>  #define MTK_SMI_MMU_EN(port) BIT(port)
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/10] iommu/mediatek: Add REG_MMU_WR_LEN register definition

2020-06-30 Thread chao hao
On Mon, 2020-06-29 at 12:16 +0200, Matthias Brugger wrote:
> 
> On 29/06/2020 09:13, Chao Hao wrote:
> > Some platforms(ex: mt6779) need to improve performance by setting
> > REG_MMU_WR_LEN register. And we can use WR_THROT_EN macro to control
> > whether we need to set the register. If the register uses default value,
> > iommu will send command to EMI without restriction, when the number of
> > commands become more and more, it will drop the EMI performance. So when
> > more than ten_commands(default value) don't be handled for EMI, iommu will
> > stop send command to EMI for keeping EMI's performace by enabling write
> > throttling mechanism(bit[5][21]=0) in MMU_WR_LEN_CTRL register.
> > 
> > Cc: Matthias Brugger 
> > Signed-off-by: Chao Hao 
> > ---
> >  drivers/iommu/mtk_iommu.c | 10 ++
> >  drivers/iommu/mtk_iommu.h |  2 ++
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index ec1f86913739..92316c4175a9 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -46,6 +46,8 @@
> >  #define F_MMU_STANDARD_AXI_MODE_BIT(BIT(3) | BIT(19))
> >  
> >  #define REG_MMU_DCM_DIS0x050
> > +#define REG_MMU_WR_LEN 0x054
> 
> The register name is confusing. For me it seems to describe the length of a
> write but it is used for controlling the write throttling. Is this the name
> that's used in the datasheet?
> 

Thanks for your review carefully, we can name it to REG_MMU_WR_LEN_CTRL


> > +#define F_MMU_WR_THROT_DIS_BIT (BIT(5) |  BIT(21))
> 
> There are two spaces between '|' and 'BIT(21)', should be one.
> 
> Regarding the name of the define, what does the 'F_' statnds for? 

F_ is used to described some bits in register and doesn't have other
meanings. The format is refer to other bits definition

> Also I think
> it should be called '_MASK' instead of '_BIT' as it defines a mask of bits.
> 

Thanks for your advice.
For F_MMU_WR_THROT_DIS_BIT:
1'b0: Enable write throttling mechanism
1'b1: Disable write throttling mechanism
So I think we can name "F_MMU_WR_THROT_DIS  BIT(5) | BIT(21)" directly,
it maybe more clearer.

> Regards,
> Matthias
> 
> >  
> >  #define REG_MMU_CTRL_REG   0x110
> >  #define F_MMU_TF_PROT_TO_PROGRAM_ADDR  (2 << 4)
> > @@ -582,6 +584,12 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> > writel_relaxed(regval, data->base + REG_MMU_VLD_PA_RNG);
> > }
> > writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
> > +   if (MTK_IOMMU_HAS_FLAG(data->plat_data, WR_THROT_EN)) {
> > +   /* write command throttling mode */
> > +   regval = readl_relaxed(data->base + REG_MMU_WR_LEN);
> > +   regval &= ~F_MMU_WR_THROT_DIS_BIT;
> > +   writel_relaxed(regval, data->base + REG_MMU_WR_LEN);
> > +   }
> >  
> > regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
> > if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
> > @@ -737,6 +745,7 @@ static int __maybe_unused mtk_iommu_suspend(struct 
> > device *dev)
> > struct mtk_iommu_suspend_reg *reg = >reg;
> > void __iomem *base = data->base;
> >  
> > +   reg->wr_len = readl_relaxed(base + REG_MMU_WR_LEN);
> > reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL);
> > reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
> > reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
> > @@ -761,6 +770,7 @@ static int __maybe_unused mtk_iommu_resume(struct 
> > device *dev)
> > dev_err(data->dev, "Failed to enable clk(%d) in resume\n", ret);
> > return ret;
> > }
> > +   writel_relaxed(reg->wr_len, base + REG_MMU_WR_LEN);
> > writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL);
> > writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
> > writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
> > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> > index be6d32ee5bda..ce4f4e8f03aa 100644
> > --- a/drivers/iommu/mtk_iommu.h
> > +++ b/drivers/iommu/mtk_iommu.h
> > @@ -24,6 +24,7 @@
> >  #define RESET_AXI  BIT(3)
> >  #define OUT_ORDER_EN   BIT(4)
> >  #define HAS_SUB_COMM   BIT(5)
> > +#define WR_THROT_ENBIT(6)
> >  
> >  #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
> > pdata)->flags) & (_x)) == (_x))
> > @@ -36,6 +37,7 @@ struct mtk_iommu_suspend_reg {
> > u32 int_main_control;
> > u32 ivrp_paddr;
> > u32 vld_pa_rng;
> > +   u32 wr_len;
> >  };
> >  
> >  enum mtk_iommu_plat {
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 03/10] iommu/mediatek: Modify the usage of mtk_iommu_plat_data structure

2020-06-30 Thread Yong Wu
Hi Chao,

This is also ok for me. Only two format nitpick.

On Mon, 2020-06-29 at 15:13 +0800, Chao Hao wrote:
> Given the fact that we are adding more and more plat_data bool values,
> it would make sense to use a u32 flags register and add the appropriate
> macro definitions to set and check for a flag present.
> No functional change.
> 
> Suggested-by: Matthias Brugger 
> Signed-off-by: Chao Hao 
> ---

[snip]

>  static const struct mtk_iommu_plat_data mt2712_data = {
>   .m4u_plat = M4U_MT2712,
> - .has_4gb_mode = true,
> - .has_bclk = true,
> - .has_vld_pa_rng   = true,
> + .flags= HAS_4GB_MODE |
> + HAS_BCLK |
> + HAS_VLD_PA_RNG,

short enough. we can put it in one line?

>   .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
>  };
>  
>  static const struct mtk_iommu_plat_data mt8173_data = {
>   .m4u_plat = M4U_MT8173,
> - .has_4gb_mode = true,
> - .has_bclk = true,
> - .reset_axi= true,
> + .flags= HAS_4GB_MODE |
> + HAS_BCLK |
> + RESET_AXI,
>   .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
>  };
>  
>  static const struct mtk_iommu_plat_data mt8183_data = {
>   .m4u_plat = M4U_MT8183,
> - .reset_axi= true,
> + .flags= RESET_AXI,
>   .larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
>  };
>  
> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> index 1b6ea839b92c..7cc39f729263 100644
> --- a/drivers/iommu/mtk_iommu.h
> +++ b/drivers/iommu/mtk_iommu.h
> @@ -17,6 +17,15 @@
>  #include 
>  #include 
>  
> +#define HAS_4GB_MODE BIT(0)
> +/* HW will use the EMI clock if there isn't the "bclk". */
> +#define HAS_BCLK BIT(1)
> +#define HAS_VLD_PA_RNG   BIT(2)
> +#define RESET_AXIBIT(3)
> +
> +#define MTK_IOMMU_HAS_FLAG(pdata, _x) \
> + pdata)->flags) & (_x)) == (_x))

If these definitions are not used in mtk_iommu_v1.c(also no this plan),
then we can put them in the mtk_iommu.c.


BTW, the patch title "modify the usage of mtk_iommu_plat_data structure"
isn't so clear, we could write what the detailed modification is.
something like:
iommu/mediatek: Use a u32 flags to describe different HW features

> +
>  struct mtk_iommu_suspend_reg {
>   u32 misc_ctrl;
>   u32 dcm_dis;
> @@ -36,12 +45,7 @@ enum mtk_iommu_plat {
>  
>  struct mtk_iommu_plat_data {
>   enum mtk_iommu_plat m4u_plat;
> - boolhas_4gb_mode;
> -
> - /* HW will use the EMI clock if there isn't the "bclk". */
> - boolhas_bclk;
> - boolhas_vld_pa_rng;
> - boolreset_axi;
> + u32 flags;
>   unsigned char   larbid_remap[MTK_LARB_NR_MAX];
>  };
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/10] iommu/mediatek: Setting MISC_CTRL register

2020-06-30 Thread chao hao
On Mon, 2020-06-29 at 11:28 +0200, Matthias Brugger wrote:
> 
> On 29/06/2020 09:13, Chao Hao wrote:
> > Add F_MMU_IN_ORDER_WR_EN and F_MMU_STANDARD_AXI_MODE_BIT definition
> > in MISC_CTRL register.
> > F_MMU_STANDARD_AXI_MODE_BIT:
> >   If we set F_MMU_STANDARD_AXI_MODE_BIT(bit[3][19] = 0, not follow
> > standard AXI protocol), iommu will send urgent read command firstly
> > compare with normal read command to improve performance.
> 
> Can you please help me to understand the phrase. Sorry I'm not a AXI 
> specialist.
> Does this mean that you will send a 'urgent read command' which is not 
> described
> in the specifications instead of a normal read command?

ok.
iommu sends read command to next bus_node normally(we can name it to
cmd1), when cmd1 isn't handled by next bus_node, iommu has a urgent read
command is needed to be sent(we can name it to cmd2), iommu will send
cmd2 and replace cmd1. So cmd2 is handled by next bus_node firstly and
cmd2 will be handled secondly.
But for standard AXI protocol, it will ignore the priority of read
command and only be handled in order. So cmd2 is handled by next
bus_node after cmd1 is done.

> 
> > F_MMU_IN_ORDER_WR_EN:
> >   If we set F_MMU_IN_ORDER_WR_EN(bit[1][17] = 0, out-of-order write), iommu
> > will re-order write command and send more higher priority write command
> > instead of sending write command in order. The feature be controlled
> > by OUT_ORDER_EN macro definition.
> > 
> > Cc: Matthias Brugger 
> > Suggested-by: Yong Wu 
> > Signed-off-by: Chao Hao 
> > ---
> >  drivers/iommu/mtk_iommu.c | 12 +++-
> >  drivers/iommu/mtk_iommu.h |  1 +
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 8f81df6cbe51..67b46b5d83d9 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -42,6 +42,9 @@
> >  #define F_INVLD_EN1BIT(1)
> >  
> >  #define REG_MMU_MISC_CTRL  0x048
> > +#define F_MMU_IN_ORDER_WR_EN   (BIT(1) | BIT(17))
> > +#define F_MMU_STANDARD_AXI_MODE_BIT(BIT(3) | BIT(19))
> 
> Wouldn't it make more sense to name it F_MMU_STANDARD_AXI_MODE_EN?
ok, you are right.
1'b1: follow standard axi protocol

> 
> > +
> >  #define REG_MMU_DCM_DIS0x050
> >  
> >  #define REG_MMU_CTRL_REG   0x110
> > @@ -574,10 +577,17 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> > }
> > writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
> >  
> > +   regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
> 
> We only need to read regval in the else branch.

ok, I got it. thanks

> 
> > if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
> > /* The register is called STANDARD_AXI_MODE in this case */
> > -   writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
> > +   regval = 0;
> > +   } else {
> > +   /* For mm_iommu, it can improve performance by the setting */
> > +   regval &= ~F_MMU_STANDARD_AXI_MODE_BIT;
> > +   if (MTK_IOMMU_HAS_FLAG(data->plat_data, OUT_ORDER_EN))
> > +   regval &= ~F_MMU_IN_ORDER_WR_EN;
> > }
> > +   writel_relaxed(regval, data->base + REG_MMU_MISC_CTRL);
> >  
> > if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
> >  dev_name(data->dev), (void *)data)) {
> > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> > index 7cc39f729263..4b780b651ef4 100644
> > --- a/drivers/iommu/mtk_iommu.h
> > +++ b/drivers/iommu/mtk_iommu.h
> > @@ -22,6 +22,7 @@
> >  #define HAS_BCLK   BIT(1)
> >  #define HAS_VLD_PA_RNG BIT(2)
> >  #define RESET_AXI  BIT(3)
> > +#define OUT_ORDER_EN   BIT(4)
> 
> Maybe something like OUT_ORDER_WR_EN, to make clear that it's about the the
> write path.
> 
ok, thanks for your advice.

> >  
> >  #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
> > pdata)->flags) & (_x)) == (_x))
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-06-30 Thread Heikki Krogerus
On Mon, Jun 29, 2020 at 09:49:41PM -0700, Rajat Jain wrote:
> Add a new (optional) field to denote the physical location of a device
> in the system, and expose it in sysfs. This was discussed here:
> https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/
> 
> (The primary choice for attribute name i.e. "location" is already
> exposed as an ABI elsewhere, so settled for "site"). Individual buses
> that want to support this new attribute can opt-in by setting a flag in
> bus_type, and then populating the location of device while enumerating
> it.

So why not just call it "physical_location"?


thanks,

-- 
heikki
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: SUN50I_IOMMU should depend on HAS_DMA

2020-06-30 Thread Robin Murphy

On 2020-06-30 11:09, Joerg Roedel wrote:

On Mon, Jun 29, 2020 at 05:29:36PM +0100, Robin Murphy wrote:

On 2020-06-29 13:11, Geert Uytterhoeven wrote:

If NO_DMA=y (e.g. Sun-3 all{mod,yes}-config):

  drivers/iommu/dma-iommu.o: In function `iommu_dma_mmap':
  dma-iommu.c:(.text+0x92e): undefined reference to `dma_pgprot'

IOMMU_DMA must not be selected, unless HAS_DMA=y.


Wait, no, IOMMU_DMA should not be selected by drivers at all - it's for arch
code to choose.


Okay, but that is a different fix, right? I queued this patch for v5.8
for now.


If the driver didn't select IOMMU_DMA (completely unnecessarily, I might 
add), there wouldn't be any problem to fix in the first place ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Robin Murphy

On 2020-06-30 11:23, Jon Hunter wrote:


On 29/06/2020 23:49, Krishna Reddy wrote:

+ if (!nvidia_smmu->bases[0])
+ nvidia_smmu->bases[0] = smmu->base;
+
+ return nvidia_smmu->bases[inst] + (page << smmu->pgshift); }



Not critical -- just a nit: why not put the bases[0] in init()?


smmu->base is not available during nvidia_smmu_impl_init() call. It is set 
afterwards in arm-smmu.c.
It can't be avoided without changing the devm_ioremap() and impl_init() call 
order in arm-smmu.c.



Why don't we move the call to devm_ioremap_resource() to before
arm_smmu_impl_init() in arm_smmu_device_probe()? From a quick look I
don't see why we cannot do this and seems better than what we are
currently doing which is quite confusing and hard to understand.


Yeah, I don't see any problem with adding a patch to do that. 
impl_init() does need to happen before generic probe starts touching any 
registers, but it wouldn't have any business overriding the platform 
resources or anything that would affect the ioremap itself. Plus it's 
reasonable that some theoretical future impl_init() might want to check 
registers for, say, a particular hardware revision, so having 
smmmu->base mapped and valid at that point would be no bad thing.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8] videobuf2: use sgtable-based scatterlist wrappers

2020-06-30 Thread Marek Szyprowski
Use recently introduced common wrappers operating directly on the struct
sg_table objects and scatterlist page iterators to make the code a bit
more compact, robust, easier to follow and copy/paste safe.

No functional change, because the code already properly did all the
scaterlist related calls.

Signed-off-by: Marek Szyprowski 
---
v8:
- rebased after recent changes in the code
---
 .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
 .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
 .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
 3 files changed, 31 insertions(+), 47 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8..1b242d844dde 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
sg_table *sgt)
unsigned int i;
unsigned long size = 0;
 
-   for_each_sg(sgt->sgl, s, sgt->nents, i) {
+   for_each_sgtable_dma_sg(sgt, s, i) {
if (sg_dma_address(s) != expected)
break;
-   expected = sg_dma_address(s) + sg_dma_len(s);
+   expected += sg_dma_len(s);
size += sg_dma_len(s);
}
return size;
@@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir);
+   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
if (!sgt)
return;
 
-   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
+   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
 /*/
@@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
 * memory locations do not require any explicit cache
 * maintenance prior or after being used by the device.
 */
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
kfree(attach);
db_attach->priv = NULL;
@@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 
/* release any previous cache */
if (attach->dma_dir != DMA_NONE) {
-   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
-  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
attach->dma_dir = DMA_NONE;
}
 
@@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
 * mapping to the client with new direction, no cache sync
 * required see comment in vb2_dc_dmabuf_ops_detach()
 */
-   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
- dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (!sgt->nents) {
+   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {
pr_err("failed to map scatterlist\n");
mutex_unlock(lock);
return ERR_PTR(-EIO);
@@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
 * No need to sync to CPU, it's already synced to the CPU
 * since the finish() memop will have been called before this.
 */
-   dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+   dma_unmap_sgtable(buf->dev, sgt, buf->dma_dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
pages = frame_vector_pages(buf->vec);
/* sgt should exist only if vector contains pages... */
BUG_ON(IS_ERR(pages));
@@ -553,9 +551,8 @@ static void *vb2_dc_get_userptr(struct device *dev, 
unsigned long vaddr,
 * No need to sync to the device, this will happen later when the
 * prepare() memop is called.
 */
-   sgt->nents = dma_map_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
- buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
-   if (sgt->nents <= 0) {
+   if (dma_map_sgtable(buf->dev, sgt, buf->dma_dir,
+   DMA_ATTR_SKIP_CPU_SYNC)) {

Re: [PATCH v2 01/12] ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC

2020-06-30 Thread Lorenzo Pieralisi
On Tue, Jun 30, 2020 at 11:06:41AM +0800, Hanjun Guo wrote:

[...]

> > For devices that aren't described in the DSDT - IORT translations
> > are determined by their ACPI parent device. Do you see/Have you
> > found any issue with this approach ?
> 
> The spec says "Describes the IO relationships between devices
> represented in the ACPI namespace.", and in section 3.1.1.3 Named
> component node, it says:

PCI devices aren't necessarily described in the ACPI namespace and we
still use IORT to describe them - through the RC node.

> "Named component nodes are used to describe devices that are also
> included in the Differentiated System Description Table (DSDT). See
> [ACPI]."
> 
> So from my understanding, the IORT spec for now, can only do ID
> translations for devices in the DSDT.

I think you can read this multiple ways but this patch does not
change this concept. What changes, is applying parent's node IORT
mapping to child nodes with no associated DSDT nodes, it is the
same thing we do with PCI and the _DMA method - we could update
the wording in the specs if that clarifies but I don't think this
deliberately disregards the specifications.

> > > For a platform device, if I use its parent's full path name for
> > > its named component entry, then it will match, but this will violate
> > > the IORT spec.
> > 
> > Can you elaborate on this please I don't get the point you
> > are making.
> 
> For example, device A is not described in DSDT so can't represent
> as a NC node in IORT. Device B can be described in DSDT and it
> is the parent of device A, so device B can be represented in IORT
> with memory access properties and node flags with Substream width
> and Stall supported info.
> 
> When we trying to translate device A's ID, we reuse all the memory
> access properties and node flags from its parent (device B), but
> will it the same?

I assume so why wouldn't it be ? Why would be describe them in
a parent-child relationship if that's not how the system looks like
in HW ?

Do you have a specific example in mind that we should be aware of ?

> So the IORT spec don't support this, at least it's pretty vague
> I think.

I think that's a matter of wording, it can be updated if it needs be,
reach out if you see any issue with the current approach please.

Thanks,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 29/06/2020 23:49, Krishna Reddy wrote:
>>> + if (!nvidia_smmu->bases[0])
>>> + nvidia_smmu->bases[0] = smmu->base;
>>> +
>>> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); }
> 
>> Not critical -- just a nit: why not put the bases[0] in init()?
> 
> smmu->base is not available during nvidia_smmu_impl_init() call. It is set 
> afterwards in arm-smmu.c.
> It can't be avoided without changing the devm_ioremap() and impl_init() call 
> order in arm-smmu.c.


Why don't we move the call to devm_ioremap_resource() to before
arm_smmu_impl_init() in arm_smmu_device_probe()? From a quick look I
don't see why we cannot do this and seems better than what we are
currently doing which is quite confusing and hard to understand.

Jon


-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: add include/uapi/linux/iommu.h to MAINTAINERS file

2020-06-30 Thread Joerg Roedel
On Fri, Jun 05, 2020 at 12:00:25AM -0700, Jerry Snitselaar wrote:
> When include/uapi/linux/iommu.h was created it was never
> added to the file list in MAINTAINERS.
> 
> Cc: Joerg Roedel 
> Signed-off-by: Jerry Snitselaar 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Jon Hunter


On 30/06/2020 01:10, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
> IOVA accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  MAINTAINERS |   2 +
>  drivers/iommu/Makefile  |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 196 
>  drivers/iommu/arm-smmu.h|   1 +
>  5 files changed, 203 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c

...

> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> + unsigned int i;
> + struct nvidia_smmu *nvidia_smmu;
> + struct platform_device *pdev = to_platform_device(smmu->dev);
> +
> + nvidia_smmu = devm_kzalloc(smmu->dev, sizeof(*nvidia_smmu), GFP_KERNEL);
> + if (!nvidia_smmu)
> + return ERR_PTR(-ENOMEM);
> +
> + nvidia_smmu->smmu = *smmu;
> + /* Instance 0 is ioremapped by arm-smmu.c after this function returns */
> + nvidia_smmu->num_inst = 1;
> +
> + for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
> + struct resource *res;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> + if (!res)
> + break;

Currently this driver is only supported for Tegra194 which I understand
has 3 SMMUs. Therefore, I don't feel that we should fail silently here,
I think it is better to return an error if all 3 cannot be initialised.
In the future if there is an SoC that has less (hopefully not more) than
Tegra194 then we should handle this via the DT compatible string. In
other words, we should always know how many SMMUs there are for a given
SoC and how many we should initialise.

> +
> + nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
> + if (IS_ERR(nvidia_smmu->bases[i]))
> + return ERR_CAST(nvidia_smmu->bases[i]);

You want to use PTR_ERR() here.

Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: move sg_table wrapper out of CONFIG_IOMMU_SUPPORT

2020-06-30 Thread Joerg Roedel
On Tue, Jun 30, 2020 at 10:17:56AM +0200, Marek Szyprowski wrote:
> Move the recently added sg_table wrapper out of CONFIG_IOMMU_SUPPORT to
> let the client code copile also when IOMMU support is disabled.
> 
> Fixes: 48530d9fab0d ("iommu: add generic helper for mapping sgtable objects")
> Signed-off-by: Marek Szyprowski 
> ---
>  include/linux/iommu.h | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)

Applied, thanks (not for v5.8, as there seem to be no users yet).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: SUN50I_IOMMU should depend on HAS_DMA

2020-06-30 Thread Joerg Roedel
On Mon, Jun 29, 2020 at 05:29:36PM +0100, Robin Murphy wrote:
> On 2020-06-29 13:11, Geert Uytterhoeven wrote:
> > If NO_DMA=y (e.g. Sun-3 all{mod,yes}-config):
> > 
> >  drivers/iommu/dma-iommu.o: In function `iommu_dma_mmap':
> >  dma-iommu.c:(.text+0x92e): undefined reference to `dma_pgprot'
> > 
> > IOMMU_DMA must not be selected, unless HAS_DMA=y.
> 
> Wait, no, IOMMU_DMA should not be selected by drivers at all - it's for arch
> code to choose.

Okay, but that is a different fix, right? I queued this patch for v5.8
for now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 31/36] staging: tegra-vde: fix common struct sg_table related issues

2020-06-30 Thread Marek Szyprowski
On 21.06.2020 06:00, Dmitry Osipenko wrote:
> В Fri, 19 Jun 2020 12:36:31 +0200
> Marek Szyprowski  пишет:
>
>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg()
>> function returns the number of the created entries in the DMA address
>> space. However the subsequent calls to the
>> dma_sync_sg_for_{device,cpu}() and dma_unmap_sg must be called with
>> the original number of the entries passed to the dma_map_sg().
>>
>> struct sg_table is a common structure used for describing a
>> non-contiguous memory buffer, used commonly in the DRM and graphics
>> subsystems. It consists of a scatterlist with memory pages and DMA
>> addresses (sgl entry), as well as the number of scatterlist entries:
>> CPU pages (orig_nents entry) and DMA mapped pages (nents entry).
>>
>> It turned out that it was a common mistake to misuse nents and
>> orig_nents entries, calling DMA-mapping functions with a wrong number
>> of entries or ignoring the number of mapped entries returned by the
>> dma_map_sg() function.
>>
>> To avoid such issues, lets use a common dma-mapping wrappers operating
>> directly on the struct sg_table objects and use scatterlist page
>> iterators where possible. This, almost always, hides references to the
>> nents and orig_nents entries, making the code robust, easier to follow
>> and copy/paste safe.
>>
>> Signed-off-by: Marek Szyprowski 
>> Reviewed-by: Dmitry Osipenko 
>> ---
>>   drivers/staging/media/tegra-vde/iommu.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/media/tegra-vde/iommu.c
>> b/drivers/staging/media/tegra-vde/iommu.c index
>> 6af863d92123..adf8dc7ee25c 100644 ---
>> a/drivers/staging/media/tegra-vde/iommu.c +++
>> b/drivers/staging/media/tegra-vde/iommu.c @@ -36,8 +36,8 @@ int
>> tegra_vde_iommu_map(struct tegra_vde *vde,
>>  addr = iova_dma_addr(>iova, iova);
>>   
>> -size = iommu_map_sg(vde->domain, addr, sgt->sgl, sgt->nents,
>> -IOMMU_READ | IOMMU_WRITE);
>> +size = iommu_map_sgtable(vde->domain, addr, sgt,
>> + IOMMU_READ | IOMMU_WRITE);
>>  if (!size) {
>>  __free_iova(>iova, iova);
>>  return -ENXIO;
> Ahh, I saw the build failure report. You're changing the DMA API in
> this series, while DMA API isn't used by this driver, it uses IOMMU
> API. Hence there is no need to touch this code. Similar problem in the
> host1x driver patch.

The issue is caused by the lack of iommu_map_sgtable() stub when no 
IOMMU support is configured. I've posted a patch for this:

https://lore.kernel.org/lkml/20200630081756.18526-1-m.szyprow...@samsung.com/

The patch for this driver is fine, we have to wait until the above fix 
gets merged and then it can be applied during the next release cycle.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu/sun50i: Change the readl timeout to the atomic variant

2020-06-30 Thread Joerg Roedel
On Sun, Jun 28, 2020 at 08:08:43PM +0200, Maxime Ripard wrote:
> The flush_all_tlb call back can be called from an atomic context, so using
> readl_poll_timeout that embeds a udelay doesn't work.
> 
> Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver")
> Signed-off-by: Maxime Ripard 

Applied both for v5.8, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] iommu: Remove usage of dev->archdata.iommu

2020-06-30 Thread Joerg Roedel
On Thu, Jun 25, 2020 at 03:08:23PM +0200, Joerg Roedel wrote:
> Joerg Roedel (13):
>   iommu/exynos: Use dev_iommu_priv_get/set()
>   iommu/vt-d: Use dev_iommu_priv_get/set()
>   iommu/msm: Use dev_iommu_priv_get/set()
>   iommu/omap: Use dev_iommu_priv_get/set()
>   iommu/rockchip: Use dev_iommu_priv_get/set()
>   iommu/tegra: Use dev_iommu_priv_get/set()
>   iommu/pamu: Use dev_iommu_priv_get/set()
>   iommu/mediatek: Do no use dev->archdata.iommu
>   x86: Remove dev->archdata.iommu pointer
>   ia64: Remove dev->archdata.iommu pointer
>   arm: Remove dev->archdata.iommu pointer
>   arm64: Remove dev->archdata.iommu pointer
>   powerpc/dma: Remove dev->archdata.iommu_domain

Applied.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] iommu/amd: Don't use atomic64_t for domain->pt_root

2020-06-30 Thread Joerg Roedel
On Fri, Jun 26, 2020 at 08:30:21AM -0400, Qian Cai wrote:
> BTW, from the previous discussion, Linus mentioned,
>  
> “
> The thing is, the 64-bit atomic reads/writes are very expensive on
> 32-bit x86. If it was just a native pointer, it would be much cheaper
> than an "atomic64_t".
> “
> 
> However, here we have AMD_IOMMU depend on x86_64, so I am wondering if
> it makes any sense to run this code on 32-bit x86 at all?

No, it doesn't, the driver is not supported on 32bit and probably never
will. I skip this patch and only apply the first one, as it is an
improvement in itself.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: Print extended features in one line to fix divergent log levels

2020-06-30 Thread Jörg Rödel
On Wed, Jun 17, 2020 at 12:04:20AM +0200, Paul Menzel wrote:
>  drivers/iommu/amd_iommu_init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Allow page responses without PASID

2020-06-30 Thread Joerg Roedel
On Tue, Jun 16, 2020 at 04:47:14PM +0200, Jean-Philippe Brucker wrote:
> Some PCIe devices do not expect a PASID value in PRI Page Responses.
> If the "PRG Response PASID Required" bit in the PRI capability is zero,
> then the OS should not set the PASID field. Similarly on Arm SMMU,
> responses to stall events do not have a PASID.
> 
> Currently iommu_page_response() systematically checks that the PASID in
> the page response corresponds to the one in the page request. This can't
> work with virtualization because a page response coming from a guest OS
> won't have a PASID if the passed-through device does not require one.
> 
> Add a flag to page requests that declares whether the corresponding
> response needs to have a PASID. When this flag isn't set, allow page
> responses without PASID.
> 
> Reported-by: Shameerali Kolothum Thodi 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  include/uapi/linux/iommu.h |  6 +-
>  drivers/iommu/iommu.c  | 23 +--
>  2 files changed, 22 insertions(+), 7 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Move Kconfig and Makefile bits down into intel directory

2020-06-30 Thread Joerg Roedel
Hi Jerry,

On Fri, Jun 12, 2020 at 04:10:59PM -0700, Jerry Snitselaar wrote:
> Move Intel Kconfig and Makefile bits down into intel directory
> with the rest of the Intel specific files.
> 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Jerry Snitselaar 
> ---
>  drivers/iommu/Kconfig| 86 +---
>  drivers/iommu/Makefile   |  8 +---
>  drivers/iommu/intel/Kconfig  | 86 
>  drivers/iommu/intel/Makefile |  7 +++

The patches do not apply to v5.8-rc3, can you please rebase them and
resend?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >