[PATCH 1/2] dma-mapping: add a dma_addressing_limited helper

2019-07-16 Thread Christoph Hellwig
This helper returns if the device has issues addressing all present
memory in the system.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8d13e28a8e07..e11b115dd0e4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -679,6 +679,20 @@ static inline int dma_coerce_mask_and_coherent(struct 
device *dev, u64 mask)
return dma_set_mask_and_coherent(dev, mask);
 }
 
+/**
+ * dma_addressing_limited - return if the device is addressing limited
+ * @dev:   device to check
+ *
+ * Return %true if the devices DMA mask is too small to address all memory in
+ * the system, else %false.  Lack of addressing bits is the prime reason for
+ * bounce buffering, but might not be the only one.
+ */
+static inline bool dma_addressing_limited(struct device *dev)
+{
+   return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
+   dma_get_required_mask(dev);
+}
+
 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] dma-direct: only limit the mapping size if swiotlb could be used

2019-07-16 Thread Christoph Hellwig
Don't just check for a swiotlb buffer, but also if buffering might
be required for this particular device.

Fixes: 133d624b1cee ("dma: Introduce dma_max_mapping_size()")
Reported-by: Benjamin Herrenschmidt 
Signed-off-by: Christoph Hellwig 
Tested-by: Benjamin Herrenschmidt 
---
 kernel/dma/direct.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d7cec866d16b..e269b6f9b444 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -399,11 +399,9 @@ int dma_direct_supported(struct device *dev, u64 mask)
 
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
-   size_t size = SIZE_MAX;
-
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active())
-   size = swiotlb_max_mapping_size(dev);
-
-   return size;
+   if (is_swiotlb_active() &&
+   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   return swiotlb_max_mapping_size(dev);
+   return SIZE_MAX;
 }
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


fix nvme performance regression due to dma_max_mapping_size()

2019-07-16 Thread Christoph Hellwig
Hi all,

the new dma_max_mapping_size function is a little to eager to limit
the I/O size if the swiotlb buffer is present, but the device is
not addressing limited.  Fix this by adding an additional check.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


x86-64 kernel dma issue; bisected

2019-07-16 Thread Al Farleigh

re: the dma-direct code commit below

I have bisected the kernel to isolate a PCI board problem on my AMD 
x86-64 ASROCK system. The board worked at (Fedora kernel) 4.18.16 but 
stopped when moving to (Fedora kernel) 5.0. I then used 
(git/torvalds/linux) 4.20-rc4 or so to locate the fault via bisect.


I now have two kernels, good/bad, that straddle the commit.

I was asked to try v5.2 just in case it was fixed; I compiled it and the 
fault appears to still be present.


Simply, mpeg video does not stream from board; no errors, but no video.

My work is documented at
https://bugs.kde.org/show_bug.cgi?id=408004

including dmesgs, lspcis, and (second) git bisect log.

Could I please ask if and where I should submit a bug? and am willing to 
assist if I can.


Thank You!


-

commit 55897af63091ebc2c3f239c6af748113ac50
Author: Christoph Hellwig 
Date:   Mon Dec 3 11:43:54 2018 +0100

dma-direct: merge swiotlb_dma_ops into the dma_direct code

While the dma-direct code is (relatively) clean and simple we
actually have to use the swiotlb ops for the mapping on many
architectures due to devices with addressing limits.  Instead
of keeping two implementations around this commit allows the
dma-direct implementation to call the swiotlb bounce buffering
functions and thus share the guts of the mapping implementation.
This also simplified the dma-mapping setup on a few architectures
where we don't have to differenciate which implementation to use.





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-16 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 13cf01744608 iommu/vt-d: Make use of iova deferred flushing.

The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59, v4.14.133.

v5.2.1: Build OK!
v5.1.18: Build OK!
v4.19.59: Failed to apply! Possible dependencies:
02b4da5f84d1 ("intel-iommu: mark intel_dma_ops static")
0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables")
524a669bdd5f ("iommu/vt-d: remove the mapping_error dma_map_ops method")
932a6523ce39 ("iommu/vt-d: Use dev_printk() when possible")
964f2311a686 ("iommu/intel: small map_page cleanup")
ef848b7e5a6a ("iommu/vt-d: Setup pasid entry for RID2PASID support")
f7b0c4ce8cb3 ("iommu/vt-d: Flush IOTLB for untrusted device in time")

v4.14.133: Failed to apply! Possible dependencies:
0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables")
2e2e35d51279 ("iommu/vt-d: Missing checks for pasid tables if allocation 
fails")
2f13eb7c580f ("iommu/vt-d: Enable 5-level paging mode in the PASID entry")
3e781fcafedb ("iommu/vt-d: Remove unnecessary WARN_ON()")
4774cc524570 ("iommu/vt-d: Apply per pci device pasid table in SVA")
4fa064b26c2e ("iommu/vt-d: Clear pasid table entry when memory unbound")
524a669bdd5f ("iommu/vt-d: remove the mapping_error dma_map_ops method")
562831747f62 ("iommu/vt-d: Global PASID name space")
7ec916f82c48 ("Revert "iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y 
and clean up intel_{alloc,free}_coherent()"")
85319dcc8955 ("iommu/vt-d: Add for_each_device_domain() helper")
932a6523ce39 ("iommu/vt-d: Use dev_printk() when possible")
964f2311a686 ("iommu/intel: small map_page cleanup")
971401015d14 ("iommu/vt-d: Use real PASID for flush in caching mode")
9ddbfb42138d ("iommu/vt-d: Move device_domain_info to header")
a7fc93fed94b ("iommu/vt-d: Allocate and free pasid table")
af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
be9e6598aeb0 ("iommu/vt-d: Handle memory shortage on pasid table 
allocation")
cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces")
d657c5c73ca9 ("iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean 
up intel_{alloc,free}_coherent()")
ef848b7e5a6a ("iommu/vt-d: Setup pasid entry for RID2PASID support")
f7b0c4ce8cb3 ("iommu/vt-d: Flush IOTLB for untrusted device in time")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/vt-d: Check if domain->pgd was allocated

2019-07-16 Thread Dmitry Safonov
There is a couple of places where on domain_init() failure domain_exit()
is called. While currently domain_init() can fail only if
alloc_pgtable_page() has failed.

Make domain_exit() check if domain->pgd present, before calling
domain_unmap(), as it theoretically should crash on clearing pte entries
in dma_pte_clear_level().

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d1510284d21..698cc40355ef 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1835,7 +1835,6 @@ static inline int guestwidth_to_adjustwidth(int gaw)
 
 static void domain_exit(struct dmar_domain *domain)
 {
-   struct page *freelist;
 
/* Remove associated devices and clear attached or cached domains */
domain_remove_dev_info(domain);
@@ -1843,9 +1842,12 @@ static void domain_exit(struct dmar_domain *domain)
/* destroy iovas */
put_iova_domain(&domain->iovad);
 
-   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   if (domain->pgd) {
+   struct page *freelist;
 
-   dma_free_pagelist(freelist);
+   freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw));
+   dma_free_pagelist(freelist);
+   }
 
free_domain_mem(domain);
 }
-- 
2.22.0



[PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue

2019-07-16 Thread Dmitry Safonov
Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.

Before deferring a flush, the queue should be allocated and initialized.

Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.

Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.

On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:

 BUG: spinlock bad magic on CPU#6, swapper/0/1
  lock: 0x88844f582008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x45/0x115
  intel_unmap+0x107/0x113
  intel_unmap_sg+0x6b/0x76
  __ata_qc_complete+0x7f/0x103
  ata_qc_complete+0x9b/0x26a
  ata_qc_complete_multiple+0xd0/0xe3
  ahci_handle_port_interrupt+0x3ee/0x48a
  ahci_handle_port_intr+0x73/0xa9
  ahci_single_level_irq_intr+0x40/0x60
  __handle_irq_event_percpu+0x7f/0x19a
  handle_irq_event_percpu+0x32/0x72
  handle_irq_event+0x38/0x56
  handle_edge_irq+0x102/0x121
  handle_irq+0x147/0x15c
  do_IRQ+0x66/0xf2
  common_interrupt+0xf/0xf
 RIP: 0010:__do_softirq+0x8c/0x2df

The same for usb devices that use ehci-pci:
 BUG: spinlock bad magic on CPU#0, swapper/0/1
  lock: 0x88844f402008, .magic: , .owner: /-1, .owner_cpu: 0
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
 Call Trace:
  
  dump_stack+0x61/0x7e
  spin_bug+0x9d/0xa3
  do_raw_spin_lock+0x22/0x8e
  _raw_spin_lock_irqsave+0x32/0x3a
  queue_iova+0x77/0x145
  intel_unmap+0x107/0x113
  intel_unmap_page+0xe/0x10
  usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
  usb_hcd_unmap_urb_for_dma+0x17/0x100
  unmap_urb_for_dma+0x22/0x24
  __usb_hcd_giveback_urb+0x51/0xc3
  usb_giveback_urb_bh+0x97/0xde
  tasklet_action_common.isra.4+0x5f/0xa1
  tasklet_action+0x2d/0x30
  __do_softirq+0x138/0x2df
  irq_exit+0x7d/0x8b
  smp_apic_timer_interrupt+0x10f/0x151
  apic_timer_interrupt+0xf/0x20
  
 RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39

Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc:  # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov 
---
 drivers/iommu/intel-iommu.c |  3 ++-
 drivers/iommu/iova.c| 18 ++
 include/linux/iova.h|  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ac4172c02244..6d1510284d21 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3564,7 +3564,8 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
 
freelist = domain_unmap(domain, start_pfn, last_pfn);
 
-   if (intel_iommu_strict || (pdev && pdev->untrusted)) {
+   if (intel_iommu_strict || (pdev && pdev->untrusted) ||
+   !has_iova_flush_queue(&domain->iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
/* free iova */
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index d499b2621239..8413ae54904a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -54,9 +54,14 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
+bool has_iova_flush_queue(struct iova_domain *iovad)
+{
+   return !!iovad->fq;
+}
+
 static void free_iova_flush_queue(struct iova_domain *iovad)
 {
-   if (!iovad->fq)
+   if (!has_iova_flush_queue(iovad))
return;
 
if (timer_pending(&iovad->fq_timer))
@@ -74,13 +79,14 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
 int init_iova_flush_queue(struct iova_domain *iovad,
  iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
 {
+   struct iova_fq __percpu *queue;
int cpu;
 
atomic64_set(&iovad->fq_flush_start_cnt,  0);
atomic64_set(&iovad->fq_flush_finish_cnt, 0);
 
-   iovad->fq = alloc_percpu(struct iova_fq);
-   if (!iovad->fq)
+   queue = alloc_percpu(struct iova_fq);
+   if (!queue)
return -ENOMEM;
 
iovad->flush_cb   = flush_cb;
@@ -89,13 +95,17 @@ int init_iova_flush_queue(struct iova_domain *iovad,
for_each_possible_cpu(cpu) {
struct iova_fq *fq;
 
-   fq = per_cpu_ptr(iovad->fq, cpu);
+   fq = per_cpu_ptr(queue, cpu);
fq->head = 0;
fq->tail = 0;

Re: [RFC v1 3/4] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)

2019-07-16 Thread Auger Eric
Hi Yi,

On 7/5/19 1:06 PM, Liu, Yi L wrote:
> From: Liu Yi L 
> 
> This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims
> to passdown PASID allocation/free request from the virtual
> iommu. This is required to get PASID managed in system-wide.
> 
> Cc: Kevin Tian 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Yi Sun 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 125 
> 
>  include/uapi/linux/vfio.h   |  25 
>  2 files changed, 150 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 6fda4fb..d5e0c01 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1832,6 +1832,94 @@ static int vfio_cache_inv_fn(struct device *dev, void 
> *data)
>   return iommu_cache_invalidate(dc->domain, dev, &ustruct->info);
>  }
>  
> +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> +  int min_pasid,
> +  int max_pasid)
> +{
> + int ret;
> + ioasid_t pasid;
> + struct mm_struct *mm = NULL;
> +
> + mutex_lock(&iommu->lock);
> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
Is this check really mandated and do you really need to hold the iommu lock?
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + mm = get_task_mm(current);
> + /* Jacob: track ioasid allocation owner by mm */
> + pasid = ioasid_alloc((struct ioasid_set *)mm, min_pasid,
> + max_pasid, NULL);
Shouldn't we have a PASID number limit per mm to prevent a guest from
consuming all PASIDs and induce DoS?
> + if (pasid == INVALID_IOASID) {
> + ret = -ENOSPC;
> + goto out_unlock;
> + }
> + ret = pasid;
> +out_unlock:
> + mutex_unlock(&iommu->lock);
> + if (mm)
> + mmput(mm);
> + return ret;
> +}
> +
> +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, int pasid)
> +{
> + struct mm_struct *mm = NULL;
> + void *pdata;
> + int ret = 0;
> +
> + mutex_lock(&iommu->lock);
> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
same here
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + pr_debug("%s: pasid: %d\n", __func__, pasid);
> +
> + /**
> +  * TODO:
> +  * a) for pasid free, needs to return error if free failed
> +  * b) Sanity check: check if the pasid is allocated to the
> +  *  current process such check may be in
> +  *  vendor specific pasid_free callback or
> +  *  in generic layer
> +  * c) clean up device list and free p_alloc structure
> +  *
> +  * Jacob:
> +  * There are two cases free could fail:
> +  * 1. free pasid by non-owner, we can use ioasid_set to track mm, if
> +  * the set does not match, caller is not permitted to free.
> +  * 2. free before unbind all devices, we can check if ioasid private
> +  * data, if data != NULL, then fail to free.
> +  */
who is going to do the garbage collection of PASIDs used by the guest in
general as we cannot rely on the userspace to do that in general?
> +
> + mm = get_task_mm(current);
> + pdata = ioasid_find((struct ioasid_set *)mm, pasid, NULL);
> + if (IS_ERR(pdata)) {
> + if (pdata == ERR_PTR(-ENOENT))
> + pr_debug("pasid %d is not allocated\n", pasid);
> + else if (pdata == ERR_PTR(-EACCES))
> + pr_debug("Not owner of pasid %d,"
> +  "no pasid free allowed\n", pasid);
> + else
> + pr_debug("error happened during searching"
> +  " pasid: %d\n", pasid);
> + ret = -EPERM;
return actual pdata error?
> + goto out_unlock;
> + }
> + if (pdata) {
> + pr_debug("Cannot free pasid %d with private data\n", pasid);
> + /* Expect PASID has no private data if not bond */> +   
> ret = -EBUSY;
> + goto out_unlock;
> + }
> + ioasid_free(pasid);
> +
> +out_unlock:
> + if (mm)
> + mmput(mm);
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1936,6 +2024,43 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   &ustruct);
>   mutex_unlock(&iommu->lock);
>   return ret;
> +
> + } else if (cmd == VFIO_IOMMU_PASID_REQUEST) {
> + struct vfio_iommu_type1_pasid_request req;
> + int min_pasid, max_pasid, pasid;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_pasid_request,
> + flag);
> +
> + if (

Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support

2019-07-16 Thread Raj, Ashok
Hi Eric

Jacob is on sabbatical, so i'll give it my best shot :-)

Yi/Kevin can jump in...

On Tue, Jul 16, 2019 at 06:45:51PM +0200, Auger Eric wrote:
> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> > .-.  .---.
> > |   vIOMMU|  | Guest process CR3, FL only|
> > | |  '---'
> > ./
> > | PASID Entry |--- PASID cache flush -
> > '-'   |
> > | |   V
> > | |CR3 in GPA
> > '-'
> > Guest
> > --| Shadow |--|
> >   vv  v
> > Host
> > .-.  .--.
> > |   pIOMMU|  | Bind FL for GVA-GPA  |
> > | |  '--'
> > ./  |
> > | PASID Entry | V (Nested xlate)
> > '\.--.
> > | |   |SL for GPA-HPA, default domain|
> > | |   '--'
> > '-'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  drivers/iommu/intel-iommu.c |   4 +
> >  drivers/iommu/intel-svm.c   | 187 
> > 
> >  include/linux/intel-iommu.h |  13 ++-
> >  include/linux/intel-svm.h   |  17 
> >  4 files changed, 219 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 7cfa0eb..3b4d712 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
> > .dev_enable_feat= intel_iommu_dev_enable_feat,
> > .dev_disable_feat   = intel_iommu_dev_disable_feat,
> > .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   .sva_bind_gpasid= intel_svm_bind_gpasid,
> > +   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> > +#endif
> >  };
> >  
> >  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 66d98e1..f06a82f 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry(sdev, &svm->devs, list) \
> > if (dev == sdev->dev)   \
> >  
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +   struct device *dev,
> > +   struct gpasid_bind_data *data)
> > +{
> > +   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +   struct intel_svm_dev *sdev;
> > +   struct intel_svm *svm = NULL;
> not requested
> > +   struct dmar_domain *ddomain;
> > +   int ret = 0;
> > +
> > +   if (WARN_ON(!iommu) || !data)
> > +   return -EINVAL;
> > +
> > +   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +   return -EINVAL;
> > +
> > +   if (dev_is_pci(dev)) {
> > +   /* VT-d supports devices with full 20 bit PASIDs only */
> > +   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +   return -EINVAL;
> > +   }
> > +
> > +   /*
> > +* We only check host PASID range, we have no knowledge to check
> > +* guest PASID range nor do we use the guest PASID.
> guest pasid is set below in svm->gpasid.
> So I am confused, do you handle gpasid or not? If you don't use gpasid
> at the moment, then you may return -EINVAL if data->flags &
> IOMMU_SVA_GPASID_VAL.
> 
> I confess I don't really understand gpasid as I thought you use
> enlightened PASID allocation to use a system wide PASID allocator on
> host so in which case guest and host PASID do differ?

Correct, from within the guest, we only use the enlightned allocator.

Guest PASID can be managed via Qemu, and it will also associate
guest pasid's with appropriate host pasids. Sort of how guest bdf
and host bdf are managed for page-request etc.



Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support

2019-07-16 Thread Auger Eric
Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
> .-.  .---.
> |   vIOMMU|  | Guest process CR3, FL only|
> | |  '---'
> ./
> | PASID Entry |--- PASID cache flush -
> '-'   |
> | |   V
> | |CR3 in GPA
> '-'
> Guest
> --| Shadow |--|
>   vv  v
> Host
> .-.  .--.
> |   pIOMMU|  | Bind FL for GVA-GPA  |
> | |  '--'
> ./  |
> | PASID Entry | V (Nested xlate)
> '\.--.
> | |   |SL for GPA-HPA, default domain|
> | |   '--'
> '-'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 187 
> 
>  include/linux/intel-iommu.h |  13 ++-
>  include/linux/intel-svm.h   |  17 
>  4 files changed, 219 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7cfa0eb..3b4d712 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
>   .dev_enable_feat= intel_iommu_dev_enable_feat,
>   .dev_disable_feat   = intel_iommu_dev_disable_feat,
>   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid= intel_svm_bind_gpasid,
> + .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> +#endif
>  };
>  
>  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 66d98e1..f06a82f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
>   list_for_each_entry(sdev, &svm->devs, list) \
>   if (dev == sdev->dev)   \
>  
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev,
> + struct gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm = NULL;
not requested
> + struct dmar_domain *ddomain;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;
> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + }
> +
> + /*
> +  * We only check host PASID range, we have no knowledge to check
> +  * guest PASID range nor do we use the guest PASID.
guest pasid is set below in svm->gpasid.
So I am confused, do you handle gpasid or not? If you don't use gpasid
at the moment, then you may return -EINVAL if data->flags &
IOMMU_SVA_GPASID_VAL.

I confess I don't really understand gpasid as I thought you use
enlightened PASID allocation to use a system wide PASID allocator on
host so in which case guest and host PASID do differ?
> +  */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + ddomain = to_dmar_domain(domain);
> + /* REVISIT:
> +  * Sanity check adddress width and paging mode support
> +  * width matching in two dimensions:
> +  * 1. paging mode CPU <= IOMMU
> +  * 2. address width Guest <= Host.
> +  */
> + mutex_lock(&pasid_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> + if (svm) {
> + /*
> +  * If we found svm for the PASID, there must be at
> +  * least one device bond, otherwise svm should be freed.
> +  */
> + BUG_ON(list_empty(&svm->devs));
> +
> + for_each_svm_dev() {
> + 

Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function

2019-07-16 Thread Auger Eric
Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> Guest shared virtual address (SVA) may require host to shadow guest
> PASID tables. Guest PASID can also be allocated from the host via
> enlightened interfaces. In this case, guest needs to bind the guest
> mm, i.e. cr3 in guest physical address to the actual PASID table in
> the host IOMMU. Nesting will be turned on such that guest virtual
> address can go through a two level translation:
> - 1st level translates GVA to GPA
> - 2nd level translates GPA to HPA
> This patch introduces APIs to bind guest PASID data to the assigned
> device entry in the physical IOMMU. See the diagram below for usage
> explaination.
> 
> .-.  .---.
> |   vIOMMU|  | Guest process mm, FL only |
> | |  '---'
> ./
> | PASID Entry |--- PASID cache flush -
> '-'   |
> | |   V
> | |  GP
> '-'
> Guest
> --| Shadow |--- GP->HP* -
>   vv  |
> Host  v
> .-.  .--.
> |   pIOMMU|  | Bind FL for GVA-GPA  |
> | |  '--'
> ./  |
> | PASID Entry | V (Nested xlate)
> '\.-.
> | |   |Set SL to GPA-HPA|
> | |   '-'
> '-'
> 
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
>  - GP = Guest PASID
>  - HP = Host PASID
> * Conversion needed if non-identity GP-HP mapping option is chosen.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/iommu/iommu.c  | 20 
>  include/linux/iommu.h  | 21 +
>  include/uapi/linux/iommu.h | 58 
> ++
>  3 files changed, 99 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1758b57..d0416f60 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain 
> *domain, struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>  
> +int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev, struct gpasid_bind_data *data)
> +{
> + if (unlikely(!domain->ops->sva_bind_gpasid))
> + return -ENODEV;
> +
> + return domain->ops->sva_bind_gpasid(domain, dev, data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> +
> +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)
> +{
> + if (unlikely(!domain->ops->sva_unbind_gpasid))
> + return -ENODEV;
> +
> + return domain->ops->sva_unbind_gpasid(dev, pasid);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8d766a8..560c8c8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #define IOMMU_READ   (1 << 0)
> @@ -267,6 +268,8 @@ struct page_response_msg {
>   * @detach_pasid_table: detach the pasid table
>   * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @sva_bind_gpasid: bind guest pasid and mm
> + * @sva_unbind_gpasid: unbind guest pasid and mm
comment vs field ordering as pointed out by Jonathan on other patches
>   */
>  struct iommu_ops {
>   bool (*capable)(enum iommu_cap);
> @@ -332,6 +335,10 @@ struct iommu_ops {
>   int (*page_response)(struct device *dev, struct page_response_msg *msg);
>   int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
>   struct iommu_cache_invalidate_info *inv_info);
> + int (*sva_bind_gpasid)(struct iommu_domain *domain,
> + struct device *dev, struct gpasid_bind_data *data);
> +
> + int (*sva_unbind_gpasid)(struct device *dev, int pasid);
>  
>   unsigned long pgsize_bitmap;
>  };
> @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain 
> *domain);
>  extern int iommu_cache_invalidate(struct iommu_domain *domain,
> struct device *dev,
> struct iommu_cache_invalidate_info *inv_info);
> +extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev, struct gpasid_bind_data *data);
> +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> +  

Re: cma_remap when using dma_alloc_attr :- DMA_ATTR_NO_KERNEL_MAPPING

2019-07-16 Thread Christoph Hellwig
On Tue, Jul 16, 2019 at 01:02:19PM +0100, Robin Murphy wrote:
>> Lets say 4k video allocation required 300MB cma memory but not required
>> virtual mapping for all the 300MB, its require only 20MB virtually mapped
>> at some specific use case/point of video, and unmap virtual mapping after
>> uses, at that time this functions will be useful, it works like ioremap()
>> for cma_alloc() using dma apis.
>
> Hmm, is there any significant reason that this case couldn't be handled 
> with just get_vm_area() plus dma_mmap_attrs(). I know it's only *intended* 
> for userspace mappings, but since the basic machinery is there...

Because the dma helper really are a black box abstraction.

That being said DMA_ATTR_NO_KERNEL_MAPPING and DMA_ATTR_NON_CONSISTENT
have been a constant pain in the b**t.  I've been toying with replacing
them with a dma_alloc_pages or similar abstraction that just returns
a struct page that is guaranteed to be dma addressable by the passed
in device.  Then the driver can call dma_map_page / dma_unmap_page /
dma_sync_* on it at well.  This would replace DMA_ATTR_NON_CONSISTENT
with a sensible API, and also DMA_ATTR_NO_KERNEL_MAPPING when called
with PageHighmem, while providing an easy to understand API and
something that can easily be fed into the various page based APIs
in the kernel.

That being said until we get arm moved over the common dma direct
and dma-iommu code, and x86 fully moved over to dma-iommu it just
seems way too much work to even get it into the various architectures
that matter, never mind all the fringe IOMMUs.  So for now I've just
been trying to contain the DMA_ATTR_NON_CONSISTENT and
DMA_ATTR_NO_KERNEL_MAPPING in fewer places while also killing bogus
or pointless users of these APIs.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: cma_remap when using dma_alloc_attr :- DMA_ATTR_NO_KERNEL_MAPPING

2019-07-16 Thread Robin Murphy

On 12/07/2019 19:30, Pankaj Suryawanshi wrote:

Hello,

When we allocate cma memory using dma_alloc_attr using
DMA_ATTR_NO_KERNEL_MAPPING attribute. It will return physical address
without virtual mapping and thats the use case of this attribute. but lets
say some vpu/gpu drivers required virtual mapping of some part of the
allocation. then we dont have anything to remap that allocated memory to
virtual memory. and in 32-bit system it difficult for devices like android
to work all the time with virtual mapping, it degrade the performance.

For Example :

Lets say 4k video allocation required 300MB cma memory but not required
virtual mapping for all the 300MB, its require only 20MB virtually mapped
at some specific use case/point of video, and unmap virtual mapping after
uses, at that time this functions will be useful, it works like ioremap()
for cma_alloc() using dma apis.


Hmm, is there any significant reason that this case couldn't be handled 
with just get_vm_area() plus dma_mmap_attrs(). I know it's only 
*intended* for userspace mappings, but since the basic machinery is there...


Robin.


/*
  * function call(s) to create virtual map of given physical memory
  * range [base, base+size) of CMA memory.
*/
void *cma_remap(__u32 base, __u32 size)
{
 struct page *page = phys_to_page(base);
 void *virt;

 pr_debug("cma: request to map 0x%08x for size 0x%08x\n",
 base, size);

 size = PAGE_ALIGN(size);

 pgprot_t prot = get_dma_pgprot(DMA_ATTR, PAGE_KERNEL);

 if (PageHighMem(page)){
 virt = dma_alloc_remap(page, size, GFP_KERNEL, prot,
__builtin_return_address(0));
 }
 else
 {
 dma_remap(page, size, prot);
 virt = page_address(page);
 }

 if (!virt)
 pr_err("\x1b[31m" " cma: failed to map 0x%08x" "\x1b[0m\n",
 base);
 else
 pr_debug("cma: 0x%08x is virtually mapped to 0x%08x\n",
 base, (__u32) virt);

 return virt;
}

/*
  * function call(s) to remove virtual map of given virtual memory
  * range [virt, virt+size) of CMA memory.
*/

void cma_unmap(void *virt, __u32 size)
{
 size = PAGE_ALIGN(size);
 unsigned long pfn = virt_to_pfn(virt);
 struct page *page = pfn_to_page(pfn);

 if (PageHighMem(page))
 dma_free_remap(virt, size);
 else
 dma_remap(page, size, PAGE_KERNEL);

 pr_debug(" cma: virtual address 0x%08x is unmapped\n",
 (__u32) virt);
}

This functions should be added in arch/arm/mm/dma-mapping.c file.

Please let me know if i am missing anything.

Regards,
Pankaj


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup

2019-07-16 Thread Auger Eric
Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-pasid.c | 48 
> +
>  1 file changed, 18 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 1e25539..1ff2ecc 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu 
> *iommu,
>   devtlb_invalidation_with_pasid(iommu, dev, pasid);
>  }
>  
> +static inline void pasid_flush_caches(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + int pasid, u16 did)
> +{
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> + iotlb_invalidation_with_pasid(iommu, did, pasid);
> + } else
> + iommu_flush_write_buffer(iommu);
Besides the style issue reported by Jonathan and the fact this function
may not deserve the inline (chapt 15 of
https://www.kernel.org/doc/html/v5.1/process/coding-style.html),

Reviewed-by: Eric Auger 

Thanks

Eric
> +
> +}
> +
>  /*
>   * Set up the scalable mode pasid table entry for first only
>   * translation type.
> @@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu 
> *iommu,
>   /* Setup Present and PASID Granular Transfer Type: */
>   pasid_set_translation_type(pte, 1);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> @@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu 
> *iommu,
>*/
>   pasid_set_sre(pte);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> @@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>*/
>   pasid_set_sre(pte);
>   pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>  
>   return 0;
>  }
> 


Re: [PATCH v4 16/22] iommu/vt-d: Move domain helper to header

2019-07-16 Thread Auger Eric
Hi Jacob,
On 6/9/19 3:44 PM, Jacob Pan wrote:
> Move domainer helper to header to be used by SVA code.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-iommu.c | 6 --
>  include/linux/intel-iommu.h | 6 ++
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 39b63fe..7cfa0eb 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu 
> *iommu)
>   iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
>  }
>  
> -/* Convert generic 'struct iommu_domain to private struct dmar_domain */
> -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> -{
> - return container_of(dom, struct dmar_domain, domain);
> -}
> -
>  static int __init intel_iommu_setup(char *str)
>  {
>   if (!str)
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 8605c74..b75f17d 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -597,6 +597,12 @@ static inline void __iommu_flush_cache(
>   clflush_cache_range(addr, size);
>  }
>  
> +/* Convert generic 'struct iommu_domain to private struct dmar_domain */
fix the single '?
> +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> +{
> + return container_of(dom, struct dmar_domain, domain);
> +}
> +
>  /*
>   * 0: readable
>   * 1: writable
> 
Reviewed-by: Eric Auger 

Thanks

Eric



Re: [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID

2019-07-16 Thread Auger Eric
Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/Kconfig   |  1 +
>  drivers/iommu/intel-iommu.c | 60 
> +
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 63 insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c40c4b5..5d1bc2a 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM
>   bool "Support for Shared Virtual Memory with Intel IOMMU"
>   depends on INTEL_IOMMU && X86
>   select PCI_PASID
> + select IOASID
>   select MMU_NOTIFIER
>   select IOASID
>   help
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 09b8ff0..5b84994 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   if (ecap_prs(iommu->ecap))
>   intel_svm_finish_prq(iommu);
>   }
> + ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
>  #endif
>  }
>  
> @@ -4820,6 +4822,46 @@ static int __init platform_optin_force_iommu(void)
>   return 1;
>  }
>  
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + /*
> +  * VT-d virtual command interface always uses the full 20 bit
> +  * PASID range. Host can partition guest PASID range based on
> +  * policies but it is out of guest's control.
> +  */
> + if (min < PASID_MIN || max > PASID_MAX)
> + return -EINVAL;
ioasid_alloc() does not handle that error value, use INVALID_IOASID?
> +
> + if (vcmd_alloc_pasid(iommu, &ioasid))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct iommu_pasid_alloc_info *svm;
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> +  * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +  * We can only free the PASID when all the devices are unbond.
> +  */
> + svm = ioasid_find(NULL, ioasid, NULL);
> + if (!svm) {
> + pr_warn("Freeing unbond IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
>  int __init intel_iommu_init(void)
>  {
>   int ret = -ENODEV;
> @@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void)
>  "%s", iommu->name);
>   iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
>   iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
> + /*
> +  * Register a custom ASID allocator if we are running
> +  * in a guest, the purpose is to have a system wide 
> PASID
> +  * namespace among all PASID users.
> +  * There can be multiple vIOMMUs in each guest but only
> +  * one allocator is active. All vIOMMU allocators will
> +  * eventually be calling the same host allocator.
> +  */
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + ret = 
> ioasid_register_allocator(&iommu->pasid_allocator);
> + if (ret)
> + pr_warn("Custom PASID allocator registeration 
> failed\n");
what if it fails, don't you want a tear down path?

Thanks

Eric
> + }
> +#endif
>   }
>  
>   bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index bff907b..8605c74 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -549,6 +550,7 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>   struct page_req_dsc *prq;
>   unsigned char prq_name[16];/* Name for PRQ interrupt */
> + struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for 
> PASIDs */
>  #endif
>

Re: [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation

2019-07-16 Thread Auger Eric
Hi Jacob,
On 6/9/19 3:44 PM, Jacob Pan wrote:
> From: Lu Baolu 
> 
> If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
> IOMMU driver should rely on the emulation software to allocate
> and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
> register set to support this. This includes a capability register,
> a virtual command register and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
> 
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command register.>
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel-pasid.c | 76 
> +
>  drivers/iommu/intel-pasid.h | 13 +++-
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 2fefeaf..69fddd3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
>   return p;
>  }
>  
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> + u64 res;
> + u64 cap;
> + u8 status_code;
> + unsigned long flags;
> + int ret = 0;
> +
> + if (!ecap_vcs(iommu->ecap)) {
> + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
> + iommu->name);
> + return -ENODEV;
> + }
> +
> + cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
> + if (!(cap & DMA_VCS_PAS)) {
> + pr_warn("IOMMU: %s: Emulation software doesn't support PASID 
> allocation\n",
> + iommu->name);
> + return -ENODEV;
> + }
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + *pasid = VCMD_VRSP_RESULT(res);
> + break;
> + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> + pr_info("IOMMU: %s: No PASID available\n", iommu->name);
> + ret = -ENOMEM;
> + break;
> + default:
> + ret = -ENODEV;
> + pr_warn("IOMMU: %s: Unkonwn error code %d\n",
unknown
s/unknown/unexpected
> + iommu->name, status_code);
> + }
> +
> + return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> + u64 res;
> + u8 status_code;
> + unsigned long flags;
> +
> + if (!ecap_vcs(iommu->ecap)) {
> + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
> + iommu->name);
> + return;
> + }
Logically shouldn't you also check DMAR_VCCAP_REG as well?
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +   !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + break;
> + case VCMD_VRSP_SC_INVALID_PASID:
> + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> + break;
> + default:
> + pr_warn("IOMMU: %s: Unkonwn error code %d\n",
> + iommu->name, status_code);
s/Unkonwn/Unexpected
> + }
> +}
> +
>  /*
>   * Per device pasid table management:
>   */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 23537b3..4b26ab5 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -19,6 +19,16 @@
>  #define PASID_PDE_SHIFT  6
>  #define MAX_NR_PASID_BITS20
>  
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC   0x1
> +#define VCMD_CMD_FREE0x2
> +#define VCMD_VRSP_IP 0x1
> +#define VCMD_VRSP_SC(e)  (((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS 0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL  1
> +#define VCMD_VRSP_SC_INVALID_PASID   1
> +#define VCMD_VRSP_RESULT(e)  (((e) >> 8) & 0xf)
> +
>  /*
>   * Domain ID reserved for pasid entries programmed for first-level
>   * only and pass-through transfer modes.
> @@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
> *iommu,
>  struct device *dev, int pasid);
>  voi