RE: [PATCH v2 4/4] iommu/vt-d: Add page response ops support

2020-07-05 Thread Tian, Kevin
> From: Lu Baolu
> Sent: Monday, July 6, 2020 8:26 AM
> 
> After a page request is handled, software must response the device which
> raised the page request with the handling result. This is done through

'response' is a noun. 

> the iommu ops.page_response if the request was reported to outside of
> vendor iommu driver through iommu_report_device_fault(). This adds the
> VT-d implementation of page_response ops.
> 
> Co-developed-by: Jacob Pan 
> Signed-off-by: Jacob Pan 
> Co-developed-by: Liu Yi L 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c |  1 +
>  drivers/iommu/intel/svm.c   | 74
> +
>  include/linux/intel-iommu.h |  3 ++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index de17952ed133..7eb29167e8f9 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -6057,6 +6057,7 @@ const struct iommu_ops intel_iommu_ops = {
>   .sva_bind   = intel_svm_bind,
>   .sva_unbind = intel_svm_unbind,
>   .sva_get_pasid  = intel_svm_get_pasid,
> + .page_response  = intel_svm_page_response,
>  #endif
>  };
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 08c58c2b1a06..1c7d8a9ea124 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -1078,3 +1078,77 @@ int intel_svm_get_pasid(struct iommu_sva *sva)
> 
>   return pasid;
>  }
> +
> +int intel_svm_page_response(struct device *dev,
> + struct iommu_fault_event *evt,
> + struct iommu_page_response *msg)
> +{
> + struct iommu_fault_page_request *prm;
> + struct intel_svm_dev *sdev;
> + struct intel_iommu *iommu;
> + struct intel_svm *svm;
> + bool private_present;
> + bool pasid_present;
> + bool last_page;
> + u8 bus, devfn;
> + int ret = 0;
> + u16 sid;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;

but we didn't do same check when reporting fault?

> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> +
> + if (!msg || !evt)
> + return -EINVAL;
> +
> + mutex_lock(_mutex);
> +
> + prm = >fault.prm;
> + sid = PCI_DEVID(bus, devfn);
> + pasid_present = prm->flags &
> IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> + private_present = prm->flags &
> IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
> + last_page = prm->flags &
> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> +
> + if (pasid_present) {
> + if (prm->pasid == 0 || prm->pasid >= PASID_MAX) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + ret = pasid_to_svm_sdev(dev, prm->pasid, , );
> + if (ret || !sdev) {
> + ret = -ENODEV;
> + goto out;
> + }
> + }

what about pasid_present==0? Do we support recoverable fault now
with this patch?

and who guarantees that the external fault handler (e.g. guest)
cannot do bad thing with this interface, e.g. by specifying a PASID
belonging to other guests (when Scalable IOV is enabled)?

Thanks
Kevin

> +
> + /*
> +  * Per VT-d spec. v3.0 ch7.7, system software must respond
> +  * with page group response if private data is present (PDP)
> +  * or last page in group (LPIG) bit is set. This is an
> +  * additional VT-d requirement beyond PCI ATS spec.
> +  */
> + if (last_page || private_present) {
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid)
> |
> + QI_PGRP_PASID_P(pasid_present) |
> + QI_PGRP_PDP(private_present) |
> + QI_PGRP_RESP_CODE(msg->code) |
> + QI_PGRP_RESP_TYPE;
> + desc.qw1 = QI_PGRP_IDX(prm->grpid) |
> QI_PGRP_LPIG(last_page);
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> + if (private_present)
> + memcpy(, prm->private_data,
> +sizeof(prm->private_data));
> +
> + qi_submit_sync(iommu, , 1, 0);
> + }
> +out:
> + mutex_unlock(_mutex);
> + return ret;
> +}
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index fc2cfc3db6e1..bf6009a344f5 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -741,6 +741,9 @@ struct iommu_sva *intel_svm_bind(struct device
> *dev, struct mm_struct *mm,
>void *drvdata);
>  void intel_svm_unbind(struct iommu_sva *handle);
>  int intel_svm_get_pasid(struct iommu_sva *handle);
> +int intel_svm_page_response(struct device *dev, struct iommu_fault_event
> *evt,
> + struct iommu_page_response *msg);
> +
>  struct 

RE: [PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-07-05 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Monday, July 6, 2020 9:30 AM
> 
> > From: Lu Baolu 
> > Sent: Monday, July 6, 2020 8:26 AM
> >
> > A pasid might be bound to a page table from a VM guest via the iommu
> > ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
> > on the physical IOMMU, we need to inject the page fault request into
> > the guest. After the guest completes handling the page fault, a page
> > response need to be sent back via the iommu ops.page_response().
> >
> > This adds support to report a page request fault. Any external module
> > which is interested in handling this fault should regiester a notifier
> > callback.
> 
> be specific on which notifier to be registered...
> 
> >
> > Co-developed-by: Jacob Pan 
> > Signed-off-by: Jacob Pan 
> > Co-developed-by: Liu Yi L 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Lu Baolu 
> > ---
> >  drivers/iommu/intel/svm.c | 99 -
> --
> >  1 file changed, 81 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index c23167877b2b..08c58c2b1a06 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -815,6 +815,57 @@ static void intel_svm_drain_prq(struct device *dev,
> > int pasid)
> > }
> >  }
> >
> > +static int prq_to_iommu_prot(struct page_req_dsc *req)
> > +{
> > +   int prot = 0;
> > +
> > +   if (req->rd_req)
> > +   prot |= IOMMU_FAULT_PERM_READ;
> > +   if (req->wr_req)
> > +   prot |= IOMMU_FAULT_PERM_WRITE;
> > +   if (req->exe_req)
> > +   prot |= IOMMU_FAULT_PERM_EXEC;
> > +   if (req->pm_req)
> > +   prot |= IOMMU_FAULT_PERM_PRIV;
> > +
> > +   return prot;
> > +}
> > +
> > +static int
> > +intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
> > +{
> > +   struct iommu_fault_event event;
> > +   u8 bus, devfn;
> > +
> > +   memset(, 0, sizeof(struct iommu_fault_event));
> > +   bus = PCI_BUS_NUM(desc->rid);
> > +   devfn = desc->rid & 0xff;
> 
> not required.
> 
> > +
> > +   /* Fill in event data for device specific processing */
> > +   event.fault.type = IOMMU_FAULT_PAGE_REQ;
> > +   event.fault.prm.addr = desc->addr;
> > +   event.fault.prm.pasid = desc->pasid;
> > +   event.fault.prm.grpid = desc->prg_index;
> > +   event.fault.prm.perm = prq_to_iommu_prot(desc);
> > +
> > +   /*
> > +* Set last page in group bit if private data is present,
> > +* page response is required as it does for LPIG.
> > +*/
> 
> move to priv_data_present check?
> 
> > +   if (desc->lpig)
> > +   event.fault.prm.flags |=
> > IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> > +   if (desc->pasid_present)
> > +   event.fault.prm.flags |=
> > IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> > +   if (desc->priv_data_present) {
> > +   event.fault.prm.flags |=
> > IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;

btw earlier comment is more about the behavior of the fault
handler (e.g. the guest), but not about why we need convert
to last_page prm flag. Let's make it clear that doing so is 
because iommu_report_device_fault doesn't understand this
vt-d specific requirement thus we set last_page as a workaround.

Thanks
Kevin

> > +   event.fault.prm.flags |=
> > IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
> > +   memcpy(event.fault.prm.private_data, desc->priv_data,
> > +  sizeof(desc->priv_data));
> > +   }
> > +
> > +   return iommu_report_device_fault(dev, );
> > +}
> > +
> >  static irqreturn_t prq_event_thread(int irq, void *d)
> >  {
> > struct intel_iommu *iommu = d;
> > @@ -828,7 +879,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> > tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> > PRQ_RING_MASK;
> > head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> > PRQ_RING_MASK;
> > while (head != tail) {
> > -   struct intel_svm_dev *sdev;
> > +   struct intel_svm_dev *sdev = NULL;
> 
> move to outside of the loop, otherwise later check always hit "if (!sdev)"
> 
> > struct vm_area_struct *vma;
> > struct page_req_dsc *req;
> > struct qi_desc resp;
> > @@ -864,6 +915,20 @@ static irqreturn_t prq_event_thread(int irq, void
> *d)
> > }
> > }
> >
> > +   if (!sdev || sdev->sid != req->rid) {
> > +   struct intel_svm_dev *t;
> > +
> > +   sdev = NULL;
> > +   rcu_read_lock();
> > +   list_for_each_entry_rcu(t, >devs, list) {
> > +   if (t->sid == req->rid) {
> > +   sdev = t;
> > +   break;
> > +   }
> > +   }
> > +   rcu_read_unlock();
> > +   }
> > +
> > result = QI_RESP_INVALID;
> > /* Since we're using init_mm.pgd directly, we should never
> > take
> >  * any faults on kernel addresses. 

RE: [PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-07-05 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Monday, July 6, 2020 8:26 AM
> 
> A pasid might be bound to a page table from a VM guest via the iommu
> ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
> on the physical IOMMU, we need to inject the page fault request into
> the guest. After the guest completes handling the page fault, a page
> response need to be sent back via the iommu ops.page_response().
> 
> This adds support to report a page request fault. Any external module
> which is interested in handling this fault should regiester a notifier
> callback.

be specific on which notifier to be registered...

> 
> Co-developed-by: Jacob Pan 
> Signed-off-by: Jacob Pan 
> Co-developed-by: Liu Yi L 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/svm.c | 99 ---
>  1 file changed, 81 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index c23167877b2b..08c58c2b1a06 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -815,6 +815,57 @@ static void intel_svm_drain_prq(struct device *dev,
> int pasid)
>   }
>  }
> 
> +static int prq_to_iommu_prot(struct page_req_dsc *req)
> +{
> + int prot = 0;
> +
> + if (req->rd_req)
> + prot |= IOMMU_FAULT_PERM_READ;
> + if (req->wr_req)
> + prot |= IOMMU_FAULT_PERM_WRITE;
> + if (req->exe_req)
> + prot |= IOMMU_FAULT_PERM_EXEC;
> + if (req->pm_req)
> + prot |= IOMMU_FAULT_PERM_PRIV;
> +
> + return prot;
> +}
> +
> +static int
> +intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
> +{
> + struct iommu_fault_event event;
> + u8 bus, devfn;
> +
> + memset(, 0, sizeof(struct iommu_fault_event));
> + bus = PCI_BUS_NUM(desc->rid);
> + devfn = desc->rid & 0xff;

not required.

> +
> + /* Fill in event data for device specific processing */
> + event.fault.type = IOMMU_FAULT_PAGE_REQ;
> + event.fault.prm.addr = desc->addr;
> + event.fault.prm.pasid = desc->pasid;
> + event.fault.prm.grpid = desc->prg_index;
> + event.fault.prm.perm = prq_to_iommu_prot(desc);
> +
> + /*
> +  * Set last page in group bit if private data is present,
> +  * page response is required as it does for LPIG.
> +  */

move to priv_data_present check?

> + if (desc->lpig)
> + event.fault.prm.flags |=
> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + if (desc->pasid_present)
> + event.fault.prm.flags |=
> IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> + if (desc->priv_data_present) {
> + event.fault.prm.flags |=
> IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + event.fault.prm.flags |=
> IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
> + memcpy(event.fault.prm.private_data, desc->priv_data,
> +sizeof(desc->priv_data));
> + }
> +
> + return iommu_report_device_fault(dev, );
> +}
> +
>  static irqreturn_t prq_event_thread(int irq, void *d)
>  {
>   struct intel_iommu *iommu = d;
> @@ -828,7 +879,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   tail = dmar_readq(iommu->reg + DMAR_PQT_REG) &
> PRQ_RING_MASK;
>   head = dmar_readq(iommu->reg + DMAR_PQH_REG) &
> PRQ_RING_MASK;
>   while (head != tail) {
> - struct intel_svm_dev *sdev;
> + struct intel_svm_dev *sdev = NULL;

move to outside of the loop, otherwise later check always hit "if (!sdev)"

>   struct vm_area_struct *vma;
>   struct page_req_dsc *req;
>   struct qi_desc resp;
> @@ -864,6 +915,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   }
>   }
> 
> + if (!sdev || sdev->sid != req->rid) {
> + struct intel_svm_dev *t;
> +
> + sdev = NULL;
> + rcu_read_lock();
> + list_for_each_entry_rcu(t, >devs, list) {
> + if (t->sid == req->rid) {
> + sdev = t;
> + break;
> + }
> + }
> + rcu_read_unlock();
> + }
> +
>   result = QI_RESP_INVALID;
>   /* Since we're using init_mm.pgd directly, we should never
> take
>* any faults on kernel addresses. */
> @@ -874,6 +939,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   if (!is_canonical_address(address))
>   goto bad_req;
> 
> + /*
> +  * If prq is to be handled outside iommu driver via receiver of
> +  * the fault notifiers, we skip the page response here.
> +  */
> + if (svm->flags & SVM_FLAG_GUEST_MODE) {
> + if (sdev && !intel_svm_prq_report(sdev->dev, req))
> +

RE: [PATCH v2 2/4] iommu/vt-d: Add a helper to get svm and sdev for pasid

2020-07-05 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Monday, July 6, 2020 8:26 AM
> 
> There are several places in the code that need to get the pointers of
> svm and sdev according to a pasid and device. Add a helper to achieve
> this for code consolidation and readability.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/svm.c | 121 +-
>  1 file changed, 68 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 25dd74f27252..c23167877b2b 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -228,6 +228,50 @@ static LIST_HEAD(global_svm_list);
>   list_for_each_entry((sdev), &(svm)->devs, list) \
>   if ((d) != (sdev)->dev) {} else
> 
> +static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
> +  struct intel_svm **rsvm,
> +  struct intel_svm_dev **rsdev)
> +{
> + struct intel_svm_dev *d, *sdev = NULL;
> + struct intel_svm *svm;
> +
> + /* The caller should hold the pasid_mutex lock */
> + if (WARN_ON(!mutex_is_locked(_mutex)))
> + return -EINVAL;
> +
> + if (pasid == INVALID_IOASID || pasid >= PASID_MAX)
> + return -EINVAL;
> +
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm))
> + return PTR_ERR(svm);
> +
> + if (!svm)
> + goto out;
> +
> + /*
> +  * If we found svm for the PASID, there must be at least one device
> +  * bond.
> +  */
> + if (WARN_ON(list_empty(>devs)))
> + return -EINVAL;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(d, >devs, list) {
> + if (d->dev == dev) {
> + sdev = d;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> +out:
> + *rsvm = svm;
> + *rsdev = sdev;
> +
> + return 0;
> +}
> +
>  int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device
> *dev,
> struct iommu_gpasid_bind_data *data)
>  {
> @@ -261,39 +305,27 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain, struct device *dev,
>   dmar_domain = to_dmar_domain(domain);
> 
>   mutex_lock(_mutex);
> - svm = ioasid_find(NULL, data->hpasid, NULL);
> - if (IS_ERR(svm)) {
> - ret = PTR_ERR(svm);
> + ret = pasid_to_svm_sdev(dev, data->hpasid, , );
> + if (ret)
>   goto out;
> - }
> 
> - if (svm) {
> + if (sdev) {
>   /*
> -  * If we found svm for the PASID, there must be at
> -  * least one device bond, otherwise svm should be freed.
> +  * For devices with aux domains, we should allow
> +  * multiple bind calls with the same PASID and pdev.
>*/
> - if (WARN_ON(list_empty(>devs))) {
> - ret = -EINVAL;
> - goto out;
> + if (iommu_dev_feature_enabled(dev,
> IOMMU_DEV_FEAT_AUX)) {
> + sdev->users++;
> + } else {
> + dev_warn_ratelimited(dev,
> +  "Already bound with PASID %u\n",
> +  svm->pasid);
> + ret = -EBUSY;
>   }
> + goto out;
> + }
> 
> - for_each_svm_dev(sdev, svm, dev) {
> - /*
> -  * For devices with aux domains, we should allow
> -  * multiple bind calls with the same PASID and pdev.
> -  */
> - if (iommu_dev_feature_enabled(dev,
> -   IOMMU_DEV_FEAT_AUX))
> {
> - sdev->users++;
> - } else {
> - dev_warn_ratelimited(dev,
> -  "Already bound with
> PASID %u\n",
> -  svm->pasid);
> - ret = -EBUSY;
> - }
> - goto out;
> - }
> - } else {
> + if (!svm) {
>   /* We come here when PASID has never been bond to a
> device. */
>   svm = kzalloc(sizeof(*svm), GFP_KERNEL);
>   if (!svm) {
> @@ -376,25 +408,17 @@ int intel_svm_unbind_gpasid(struct device *dev,
> int pasid)
>   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>   struct intel_svm_dev *sdev;
>   struct intel_svm *svm;
> - int ret = -EINVAL;
> + int ret;
> 
>   if (WARN_ON(!iommu))
>   return -EINVAL;
> 
>   mutex_lock(_mutex);
> - svm = ioasid_find(NULL, pasid, NULL);
> - if (!svm) {
> - ret = -EINVAL;
> - goto out;
> - }
> -
> - if (IS_ERR(svm)) {
> - ret = PTR_ERR(svm);
> + ret = 

RE: [PATCH v2 1/4] iommu/vt-d: Refactor device_to_iommu() helper

2020-07-05 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Monday, July 6, 2020 8:26 AM
> 
> It is refactored in two ways:
> 
> - Make it global so that it could be used in other files.
> 
> - Make bus/devfn optional so that callers could ignore these two returned
> values when they only want to get the coresponding iommu pointer.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 55 +++--
>  drivers/iommu/intel/svm.c   |  8 +++---
>  include/linux/intel-iommu.h |  3 +-
>  3 files changed, 21 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..de17952ed133 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -778,16 +778,16 @@ is_downstream_to_pci_bridge(struct device *dev,
> struct device *bridge)
>   return false;
>  }
> 
> -static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8
> *devfn)
> +struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8
> *devfn)
>  {
>   struct dmar_drhd_unit *drhd = NULL;
> + struct pci_dev *pdev = NULL;
>   struct intel_iommu *iommu;
>   struct device *tmp;
> - struct pci_dev *pdev = NULL;
>   u16 segment = 0;
>   int i;
> 
> - if (iommu_dummy(dev))
> + if (!dev || iommu_dummy(dev))
>   return NULL;
> 
>   if (dev_is_pci(dev)) {
> @@ -818,8 +818,10 @@ static struct intel_iommu *device_to_iommu(struct
> device *dev, u8 *bus, u8 *devf
>   if (pdev && pdev->is_virtfn)
>   goto got_pdev;
> 
> - *bus = drhd->devices[i].bus;
> - *devfn = drhd->devices[i].devfn;
> + if (bus && devfn) {
> + *bus = drhd->devices[i].bus;
> + *devfn = drhd->devices[i].devfn;
> + }
>   goto out;
>   }
> 
> @@ -829,8 +831,10 @@ static struct intel_iommu *device_to_iommu(struct
> device *dev, u8 *bus, u8 *devf
> 
>   if (pdev && drhd->include_all) {
>   got_pdev:
> - *bus = pdev->bus->number;
> - *devfn = pdev->devfn;
> + if (bus && devfn) {
> + *bus = pdev->bus->number;
> + *devfn = pdev->devfn;
> + }
>   goto out;
>   }
>   }
> @@ -5146,11 +5150,10 @@ static int aux_domain_add_dev(struct
> dmar_domain *domain,
> struct device *dev)
>  {
>   int ret;
> - u8 bus, devfn;
>   unsigned long flags;
>   struct intel_iommu *iommu;
> 
> - iommu = device_to_iommu(dev, , );
> + iommu = device_to_iommu(dev, NULL, NULL);
>   if (!iommu)
>   return -ENODEV;
> 
> @@ -5236,9 +5239,8 @@ static int prepare_domain_attach_device(struct
> iommu_domain *domain,
>   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>   struct intel_iommu *iommu;
>   int addr_width;
> - u8 bus, devfn;
> 
> - iommu = device_to_iommu(dev, , );
> + iommu = device_to_iommu(dev, NULL, NULL);
>   if (!iommu)
>   return -ENODEV;
> 
> @@ -5658,9 +5660,8 @@ static bool intel_iommu_capable(enum
> iommu_cap cap)
>  static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>  {
>   struct intel_iommu *iommu;
> - u8 bus, devfn;
> 
> - iommu = device_to_iommu(dev, , );
> + iommu = device_to_iommu(dev, NULL, NULL);
>   if (!iommu)
>   return ERR_PTR(-ENODEV);
> 
> @@ -5673,9 +5674,8 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev)
>  static void intel_iommu_release_device(struct device *dev)
>  {
>   struct intel_iommu *iommu;
> - u8 bus, devfn;
> 
> - iommu = device_to_iommu(dev, , );
> + iommu = device_to_iommu(dev, NULL, NULL);
>   if (!iommu)
>   return;
> 
> @@ -5825,37 +5825,14 @@ static struct iommu_group
> *intel_iommu_device_group(struct device *dev)
>   return generic_device_group(dev);
>  }
> 
> -#ifdef CONFIG_INTEL_IOMMU_SVM
> -struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
> -{
> - struct intel_iommu *iommu;
> - u8 bus, devfn;
> -
> - if (iommu_dummy(dev)) {
> - dev_warn(dev,
> -  "No IOMMU translation for device; cannot enable
> SVM\n");
> - return NULL;
> - }
> -
> - iommu = device_to_iommu(dev, , );
> - if ((!iommu)) {
> - dev_err(dev, "No IOMMU for device; cannot enable SVM\n");
> - return NULL;
> - }
> -
> - return iommu;
> -}
> -#endif /* CONFIG_INTEL_IOMMU_SVM */
> -
>  static int intel_iommu_enable_auxd(struct device *dev)
>  {
>   struct device_domain_info *info;
>   struct intel_iommu *iommu;
>  

[PATCH v2 2/4] iommu/vt-d: Add a helper to get svm and sdev for pasid

2020-07-05 Thread Lu Baolu
There are several places in the code that need to get the pointers of
svm and sdev according to a pasid and device. Add a helper to achieve
this for code consolidation and readability.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 121 +-
 1 file changed, 68 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 25dd74f27252..c23167877b2b 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -228,6 +228,50 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry((sdev), &(svm)->devs, list) \
if ((d) != (sdev)->dev) {} else
 
+static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
+struct intel_svm **rsvm,
+struct intel_svm_dev **rsdev)
+{
+   struct intel_svm_dev *d, *sdev = NULL;
+   struct intel_svm *svm;
+
+   /* The caller should hold the pasid_mutex lock */
+   if (WARN_ON(!mutex_is_locked(_mutex)))
+   return -EINVAL;
+
+   if (pasid == INVALID_IOASID || pasid >= PASID_MAX)
+   return -EINVAL;
+
+   svm = ioasid_find(NULL, pasid, NULL);
+   if (IS_ERR(svm))
+   return PTR_ERR(svm);
+
+   if (!svm)
+   goto out;
+
+   /*
+* If we found svm for the PASID, there must be at least one device
+* bond.
+*/
+   if (WARN_ON(list_empty(>devs)))
+   return -EINVAL;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(d, >devs, list) {
+   if (d->dev == dev) {
+   sdev = d;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+out:
+   *rsvm = svm;
+   *rsdev = sdev;
+
+   return 0;
+}
+
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
  struct iommu_gpasid_bind_data *data)
 {
@@ -261,39 +305,27 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
dmar_domain = to_dmar_domain(domain);
 
mutex_lock(_mutex);
-   svm = ioasid_find(NULL, data->hpasid, NULL);
-   if (IS_ERR(svm)) {
-   ret = PTR_ERR(svm);
+   ret = pasid_to_svm_sdev(dev, data->hpasid, , );
+   if (ret)
goto out;
-   }
 
-   if (svm) {
+   if (sdev) {
/*
-* If we found svm for the PASID, there must be at
-* least one device bond, otherwise svm should be freed.
+* For devices with aux domains, we should allow
+* multiple bind calls with the same PASID and pdev.
 */
-   if (WARN_ON(list_empty(>devs))) {
-   ret = -EINVAL;
-   goto out;
+   if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)) {
+   sdev->users++;
+   } else {
+   dev_warn_ratelimited(dev,
+"Already bound with PASID %u\n",
+svm->pasid);
+   ret = -EBUSY;
}
+   goto out;
+   }
 
-   for_each_svm_dev(sdev, svm, dev) {
-   /*
-* For devices with aux domains, we should allow
-* multiple bind calls with the same PASID and pdev.
-*/
-   if (iommu_dev_feature_enabled(dev,
- IOMMU_DEV_FEAT_AUX)) {
-   sdev->users++;
-   } else {
-   dev_warn_ratelimited(dev,
-"Already bound with PASID 
%u\n",
-svm->pasid);
-   ret = -EBUSY;
-   }
-   goto out;
-   }
-   } else {
+   if (!svm) {
/* We come here when PASID has never been bond to a device. */
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
@@ -376,25 +408,17 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct intel_svm_dev *sdev;
struct intel_svm *svm;
-   int ret = -EINVAL;
+   int ret;
 
if (WARN_ON(!iommu))
return -EINVAL;
 
mutex_lock(_mutex);
-   svm = ioasid_find(NULL, pasid, NULL);
-   if (!svm) {
-   ret = -EINVAL;
-   goto out;
-   }
-
-   if (IS_ERR(svm)) {
-   ret = PTR_ERR(svm);
+   ret = pasid_to_svm_sdev(dev, pasid, , );
+   if (ret)
goto out;
-   }
 
-   for_each_svm_dev(sdev, svm, dev) {
-   ret = 0;
+  

[PATCH v2 4/4] iommu/vt-d: Add page response ops support

2020-07-05 Thread Lu Baolu
After a page request is handled, software must response the device which
raised the page request with the handling result. This is done through
the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c |  1 +
 drivers/iommu/intel/svm.c   | 74 +
 include/linux/intel-iommu.h |  3 ++
 3 files changed, 78 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index de17952ed133..7eb29167e8f9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -6057,6 +6057,7 @@ const struct iommu_ops intel_iommu_ops = {
.sva_bind   = intel_svm_bind,
.sva_unbind = intel_svm_unbind,
.sva_get_pasid  = intel_svm_get_pasid,
+   .page_response  = intel_svm_page_response,
 #endif
 };
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 08c58c2b1a06..1c7d8a9ea124 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,77 @@ int intel_svm_get_pasid(struct iommu_sva *sva)
 
return pasid;
 }
+
+int intel_svm_page_response(struct device *dev,
+   struct iommu_fault_event *evt,
+   struct iommu_page_response *msg)
+{
+   struct iommu_fault_page_request *prm;
+   struct intel_svm_dev *sdev;
+   struct intel_iommu *iommu;
+   struct intel_svm *svm;
+   bool private_present;
+   bool pasid_present;
+   bool last_page;
+   u8 bus, devfn;
+   int ret = 0;
+   u16 sid;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   if (!msg || !evt)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+
+   prm = >fault.prm;
+   sid = PCI_DEVID(bus, devfn);
+   pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   private_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+   if (pasid_present) {
+   if (prm->pasid == 0 || prm->pasid >= PASID_MAX) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = pasid_to_svm_sdev(dev, prm->pasid, , );
+   if (ret || !sdev) {
+   ret = -ENODEV;
+   goto out;
+   }
+   }
+
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must respond
+* with page group response if private data is present (PDP)
+* or last page in group (LPIG) bit is set. This is an
+* additional VT-d requirement beyond PCI ATS spec.
+*/
+   if (last_page || private_present) {
+   struct qi_desc desc;
+
+   desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) |
+   QI_PGRP_PASID_P(pasid_present) |
+   QI_PGRP_PDP(private_present) |
+   QI_PGRP_RESP_CODE(msg->code) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   if (private_present)
+   memcpy(, prm->private_data,
+  sizeof(prm->private_data));
+
+   qi_submit_sync(iommu, , 1, 0);
+   }
+out:
+   mutex_unlock(_mutex);
+   return ret;
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fc2cfc3db6e1..bf6009a344f5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -741,6 +741,9 @@ struct iommu_sva *intel_svm_bind(struct device *dev, struct 
mm_struct *mm,
 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
 int intel_svm_get_pasid(struct iommu_sva *handle);
+int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
+   struct iommu_page_response *msg);
+
 struct svm_dev_ops;
 
 struct intel_svm_dev {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-07-05 Thread Lu Baolu
A pasid might be bound to a page table from a VM guest via the iommu
ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
on the physical IOMMU, we need to inject the page fault request into
the guest. After the guest completes handling the page fault, a page
response need to be sent back via the iommu ops.page_response().

This adds support to report a page request fault. Any external module
which is interested in handling this fault should regiester a notifier
callback.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 99 ---
 1 file changed, 81 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c23167877b2b..08c58c2b1a06 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -815,6 +815,57 @@ static void intel_svm_drain_prq(struct device *dev, int 
pasid)
}
 }
 
+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+   int prot = 0;
+
+   if (req->rd_req)
+   prot |= IOMMU_FAULT_PERM_READ;
+   if (req->wr_req)
+   prot |= IOMMU_FAULT_PERM_WRITE;
+   if (req->exe_req)
+   prot |= IOMMU_FAULT_PERM_EXEC;
+   if (req->pm_req)
+   prot |= IOMMU_FAULT_PERM_PRIV;
+
+   return prot;
+}
+
+static int
+intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
+{
+   struct iommu_fault_event event;
+   u8 bus, devfn;
+
+   memset(, 0, sizeof(struct iommu_fault_event));
+   bus = PCI_BUS_NUM(desc->rid);
+   devfn = desc->rid & 0xff;
+
+   /* Fill in event data for device specific processing */
+   event.fault.type = IOMMU_FAULT_PAGE_REQ;
+   event.fault.prm.addr = desc->addr;
+   event.fault.prm.pasid = desc->pasid;
+   event.fault.prm.grpid = desc->prg_index;
+   event.fault.prm.perm = prq_to_iommu_prot(desc);
+
+   /*
+* Set last page in group bit if private data is present,
+* page response is required as it does for LPIG.
+*/
+   if (desc->lpig)
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   if (desc->pasid_present)
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   if (desc->priv_data_present) {
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   memcpy(event.fault.prm.private_data, desc->priv_data,
+  sizeof(desc->priv_data));
+   }
+
+   return iommu_report_device_fault(dev, );
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
struct intel_iommu *iommu = d;
@@ -828,7 +879,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
while (head != tail) {
-   struct intel_svm_dev *sdev;
+   struct intel_svm_dev *sdev = NULL;
struct vm_area_struct *vma;
struct page_req_dsc *req;
struct qi_desc resp;
@@ -864,6 +915,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
}
}
 
+   if (!sdev || sdev->sid != req->rid) {
+   struct intel_svm_dev *t;
+
+   sdev = NULL;
+   rcu_read_lock();
+   list_for_each_entry_rcu(t, >devs, list) {
+   if (t->sid == req->rid) {
+   sdev = t;
+   break;
+   }
+   }
+   rcu_read_unlock();
+   }
+
result = QI_RESP_INVALID;
/* Since we're using init_mm.pgd directly, we should never take
 * any faults on kernel addresses. */
@@ -874,6 +939,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (!is_canonical_address(address))
goto bad_req;
 
+   /*
+* If prq is to be handled outside iommu driver via receiver of
+* the fault notifiers, we skip the page response here.
+*/
+   if (svm->flags & SVM_FLAG_GUEST_MODE) {
+   if (sdev && !intel_svm_prq_report(sdev->dev, req))
+   goto prq_advance;
+   else
+   goto bad_req;
+   }
+
/* If the mm is already defunct, don't handle faults. */
if (!mmget_not_zero(svm->mm))
goto bad_req;
@@ -892,24 +968,10 @@ static irqreturn_t prq_event_thread(int irq, void 

[PATCH v2 1/4] iommu/vt-d: Refactor device_to_iommu() helper

2020-07-05 Thread Lu Baolu
It is refactored in two ways:

- Make it global so that it could be used in other files.

- Make bus/devfn optional so that callers could ignore these two returned
values when they only want to get the coresponding iommu pointer.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 55 +++--
 drivers/iommu/intel/svm.c   |  8 +++---
 include/linux/intel-iommu.h |  3 +-
 3 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..de17952ed133 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -778,16 +778,16 @@ is_downstream_to_pci_bridge(struct device *dev, struct 
device *bridge)
return false;
 }
 
-static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
*devfn)
+struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
+   struct pci_dev *pdev = NULL;
struct intel_iommu *iommu;
struct device *tmp;
-   struct pci_dev *pdev = NULL;
u16 segment = 0;
int i;
 
-   if (iommu_dummy(dev))
+   if (!dev || iommu_dummy(dev))
return NULL;
 
if (dev_is_pci(dev)) {
@@ -818,8 +818,10 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
if (pdev && pdev->is_virtfn)
goto got_pdev;
 
-   *bus = drhd->devices[i].bus;
-   *devfn = drhd->devices[i].devfn;
+   if (bus && devfn) {
+   *bus = drhd->devices[i].bus;
+   *devfn = drhd->devices[i].devfn;
+   }
goto out;
}
 
@@ -829,8 +831,10 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
 
if (pdev && drhd->include_all) {
got_pdev:
-   *bus = pdev->bus->number;
-   *devfn = pdev->devfn;
+   if (bus && devfn) {
+   *bus = pdev->bus->number;
+   *devfn = pdev->devfn;
+   }
goto out;
}
}
@@ -5146,11 +5150,10 @@ static int aux_domain_add_dev(struct dmar_domain 
*domain,
  struct device *dev)
 {
int ret;
-   u8 bus, devfn;
unsigned long flags;
struct intel_iommu *iommu;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return -ENODEV;
 
@@ -5236,9 +5239,8 @@ static int prepare_domain_attach_device(struct 
iommu_domain *domain,
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
int addr_width;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return -ENODEV;
 
@@ -5658,9 +5660,8 @@ static bool intel_iommu_capable(enum iommu_cap cap)
 static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 {
struct intel_iommu *iommu;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return ERR_PTR(-ENODEV);
 
@@ -5673,9 +5674,8 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
 static void intel_iommu_release_device(struct device *dev)
 {
struct intel_iommu *iommu;
-   u8 bus, devfn;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return;
 
@@ -5825,37 +5825,14 @@ static struct iommu_group 
*intel_iommu_device_group(struct device *dev)
return generic_device_group(dev);
 }
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
-struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
-{
-   struct intel_iommu *iommu;
-   u8 bus, devfn;
-
-   if (iommu_dummy(dev)) {
-   dev_warn(dev,
-"No IOMMU translation for device; cannot enable 
SVM\n");
-   return NULL;
-   }
-
-   iommu = device_to_iommu(dev, , );
-   if ((!iommu)) {
-   dev_err(dev, "No IOMMU for device; cannot enable SVM\n");
-   return NULL;
-   }
-
-   return iommu;
-}
-#endif /* CONFIG_INTEL_IOMMU_SVM */
-
 static int intel_iommu_enable_auxd(struct device *dev)
 {
struct device_domain_info *info;
struct intel_iommu *iommu;
unsigned long flags;
-   u8 bus, devfn;
int ret;
 
-   iommu = device_to_iommu(dev, , );
+   iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu 

[PATCH v2 0/4] iommu/vt-d: Add prq report and response support

2020-07-05 Thread Lu Baolu
Hi,

This series adds page request event reporting and response support to
the Intel IOMMU driver. This is necessary when the page requests must
be processed by any component other than the vendor IOMMU driver. For
example, when a guest page table was bound to a PASID through the
iommu_ops->sva_bind_gpasid() api, the page requests should be routed to
the guest, and after the page is served, the device should be responded
with the result.

Your review comments are very appreciated.

Best regards,
baolu

Change log:
v1->v2:
  - v1 posted at https://lkml.org/lkml/2020/6/27/387
  - Remove unnecessary pci_get_domain_bus_and_slot()
  - Return error when sdev == NULL in intel_svm_page_response()

Lu Baolu (4):
  iommu/vt-d: Refactor device_to_iommu() helper
  iommu/vt-d: Add a helper to get svm and sdev for pasid
  iommu/vt-d: Report page request faults for guest SVA
  iommu/vt-d: Add page response ops support

 drivers/iommu/intel/iommu.c |  56 ++-
 drivers/iommu/intel/svm.c   | 302 +++-
 include/linux/intel-iommu.h |   6 +-
 3 files changed, 248 insertions(+), 116 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node

2020-07-05 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: Saturday, July 4, 2020 4:22 AM
> To: Song Bao Hua (Barry Song) 
> Cc: h...@lst.de; m.szyprow...@samsung.com; robin.mur...@arm.com;
> linux-arm-ker...@lists.infradead.org; iommu@lists.linux-foundation.org;
> Linuxarm 
> Subject: Re: [PATCH] iommu/arm-smmu-v3: allocate the memory of queues in
> local numa node
> 
> On Mon, Jun 01, 2020 at 11:31:41PM +1200, Barry Song wrote:
> > dmam_alloc_coherent() will usually allocate memory from the default CMA.
> For
> > a common arm64 defconfig without reserved memory in device tree, there is
> only
> > one CMA close to address 0.
> > dma_alloc_contiguous() will allocate memory without any idea of  NUMA
> and smmu
> > has no customized per-numa cma_area.
> > struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t 
> > gfp)
> > {
> > size_t count = size >> PAGE_SHIFT;
> > struct page *page = NULL;
> > struct cma *cma = NULL;
> >
> > if (dev && dev->cma_area)
> > cma = dev->cma_area;
> > else if (count > 1)
> > cma = dma_contiguous_default_area;
> >
> > ...
> > return page;
> > }
> >
> > if there are N numa nodes, N-1 nodes will put command/evt queues etc in a
> > remote node the default CMA belongs to,probably node 0. Tests show, after
> > sending CMD_SYNC in an empty command queue,
> > on Node2, without this patch, it takes 550ns to wait for the completion
> > of CMD_SYNC; with this patch, it takes 250ns to wait for the completion
> > of CMD_SYNC.
> >
> > Signed-off-by: Barry Song 
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 63
> -
> >  1 file changed, 48 insertions(+), 15 deletions(-)
> 
> I would prefer that the coherent DMA allocator learned about NUMA, rather
> than we bodge drivers to use the streaming API where it doesn't really
> make sense.
> 
> I see that you've posted other patches to do that (thanks!), so I'll
> disregard this series.

Thanks for taking a look, Will. For sure I am using the per-numa cma patchset to
replace this patch. So it is ok to ignore this one.


> 
> Cheers,
> 
> Will

Thanks
Barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs

2020-07-05 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: Saturday, July 4, 2020 4:22 AM
> To: Song Bao Hua (Barry Song) 
> Cc: robin.mur...@arm.com; h...@lst.de; m.szyprow...@samsung.com;
> iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
> Linuxarm 
> Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to
> users in sysfs
> 
> On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote:
> > As tests show the latency of dma_unmap can increase dramatically while
> > calling them cross NUMA nodes, especially cross CPU packages, eg.
> > 300ns vs 800ns while waiting for the completion of CMD_SYNC in an
> > empty command queue. The large latency causing by remote node will
> > in turn make contention of the command queue more serious, and enlarge
> > the latency of DMA users within local NUMA nodes.
> >
> > Users might intend to enforce NUMA locality with the consideration of
> > the position of SMMU. The patch provides minor benefit by presenting
> > this information to users directly, as they might want to know it without
> > checking hardware spec at all.
> 
> I don't think that's a very good reason to expose things to userspace.
> I know sysfs shouldn't be treated as ABI, but the grim reality is that
> once somebody relies on this stuff then we can't change it, so I'd
> rather avoid exposing it unless it's absolutely necessary.

Will, thanks for taking a look!

I am not sure if it is absolutely necessary, but it is useful to users. The 
whole story started
from some users who wanted to know the hardware topology very clear by reading 
some
sysfs node just like they are able to do that for pci devices. The intention is 
that users can
know hardware topology of various devices easily from linux since they maybe 
don't know
all the hardware details.

For pci devices, kernel has done that. And there are some other drivers out of 
pci
exposing numa_node as well. It seems it is hard to say it is absolutely 
necessary
for them too since sysfs shouldn't be treated as ABI. 

I got some input from Linux users who also wanted to know the numa node for
other devices which are not PCI, for example, platform devices. And I thought 
the
requirement is kind of reasonable. So I also had another patch to generally 
support
this kind of requirements, with the below patch, this smmu patch is not 
necessary
any more:
https://lkml.org/lkml/2020/6/18/1257

for platform device created by ARM ACPI/IORT and general 
acpi_create_platform_device()
drivers/acpi/scan.c:
static void acpi_default_enumeration(struct acpi_device *device)
{
...
if (!device->flags.enumeration_by_parent) {
acpi_create_platform_device(device, NULL);
acpi_device_set_enumerated(device);
}
}

struct platform_device *acpi_create_platform_device(struct acpi_device *adev,
struct property_entry *properties)
{
...

pdev = platform_device_register_full();
if (IS_ERR(pdev))
...
else {
set_dev_node(>dev, acpi_get_node(adev->handle));
...
}
...
}
numa_node is set for this kind of devices.

Anyway, just want to explain to you the background some people want to know the 
hardware topology from Linux in same simple way. And it seems it is a reasonable
requirement to me :-)

> 
> Thanks,
> 
> Will

Thanks
barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu