[PATCH 1/2] dma-mapping: add a dma_addressing_limited helper
This helper returns if the device has issues addressing all present memory in the system. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 8d13e28a8e07..e11b115dd0e4 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -679,6 +679,20 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask) return dma_set_mask_and_coherent(dev, mask); } +/** + * dma_addressing_limited - return if the device is addressing limited + * @dev: device to check + * + * Return %true if the devices DMA mask is too small to address all memory in + * the system, else %false. Lack of addressing bits is the prime reason for + * bounce buffering, but might not be the only one. + */ +static inline bool dma_addressing_limited(struct device *dev) +{ + return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) < + dma_get_required_mask(dev); +} + #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] dma-direct: only limit the mapping size if swiotlb could be used
Don't just check for a swiotlb buffer, but also if buffering might be required for this particular device. Fixes: 133d624b1cee ("dma: Introduce dma_max_mapping_size()") Reported-by: Benjamin Herrenschmidt Signed-off-by: Christoph Hellwig Tested-by: Benjamin Herrenschmidt --- kernel/dma/direct.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index d7cec866d16b..e269b6f9b444 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -399,11 +399,9 @@ int dma_direct_supported(struct device *dev, u64 mask) size_t dma_direct_max_mapping_size(struct device *dev) { - size_t size = SIZE_MAX; - /* If SWIOTLB is active, use its maximum mapping size */ - if (is_swiotlb_active()) - size = swiotlb_max_mapping_size(dev); - - return size; + if (is_swiotlb_active() && + (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) + return swiotlb_max_mapping_size(dev); + return SIZE_MAX; } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
fix nvme performance regression due to dma_max_mapping_size()
Hi all, the new dma_max_mapping_size function is a little to eager to limit the I/O size if the swiotlb buffer is present, but the device is not addressing limited. Fix this by adding an additional check. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
x86-64 kernel dma issue; bisected
re: the dma-direct code commit below I have bisected the kernel to isolate a PCI board problem on my AMD x86-64 ASROCK system. The board worked at (Fedora kernel) 4.18.16 but stopped when moving to (Fedora kernel) 5.0. I then used (git/torvalds/linux) 4.20-rc4 or so to locate the fault via bisect. I now have two kernels, good/bad, that straddle the commit. I was asked to try v5.2 just in case it was fixed; I compiled it and the fault appears to still be present. Simply, mpeg video does not stream from board; no errors, but no video. My work is documented at https://bugs.kde.org/show_bug.cgi?id=408004 including dmesgs, lspcis, and (second) git bisect log. Could I please ask if and where I should submit a bug? and am willing to assist if I can. Thank You! - commit 55897af63091ebc2c3f239c6af748113ac50 Author: Christoph Hellwig Date: Mon Dec 3 11:43:54 2018 +0100 dma-direct: merge swiotlb_dma_ops into the dma_direct code While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: 13cf01744608 iommu/vt-d: Make use of iova deferred flushing. The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59, v4.14.133. v5.2.1: Build OK! v5.1.18: Build OK! v4.19.59: Failed to apply! Possible dependencies: 02b4da5f84d1 ("intel-iommu: mark intel_dma_ops static") 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables") 524a669bdd5f ("iommu/vt-d: remove the mapping_error dma_map_ops method") 932a6523ce39 ("iommu/vt-d: Use dev_printk() when possible") 964f2311a686 ("iommu/intel: small map_page cleanup") ef848b7e5a6a ("iommu/vt-d: Setup pasid entry for RID2PASID support") f7b0c4ce8cb3 ("iommu/vt-d: Flush IOTLB for untrusted device in time") v4.14.133: Failed to apply! Possible dependencies: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables") 2e2e35d51279 ("iommu/vt-d: Missing checks for pasid tables if allocation fails") 2f13eb7c580f ("iommu/vt-d: Enable 5-level paging mode in the PASID entry") 3e781fcafedb ("iommu/vt-d: Remove unnecessary WARN_ON()") 4774cc524570 ("iommu/vt-d: Apply per pci device pasid table in SVA") 4fa064b26c2e ("iommu/vt-d: Clear pasid table entry when memory unbound") 524a669bdd5f ("iommu/vt-d: remove the mapping_error dma_map_ops method") 562831747f62 ("iommu/vt-d: Global PASID name space") 7ec916f82c48 ("Revert "iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()"") 85319dcc8955 ("iommu/vt-d: Add for_each_device_domain() helper") 932a6523ce39 ("iommu/vt-d: Use dev_printk() when possible") 964f2311a686 ("iommu/intel: small map_page cleanup") 971401015d14 ("iommu/vt-d: Use real PASID for flush in caching mode") 9ddbfb42138d ("iommu/vt-d: Move device_domain_info to header") a7fc93fed94b ("iommu/vt-d: Allocate and free pasid table") af39507305fb ("iommu/vt-d: Apply global PASID in SVA") be9e6598aeb0 ("iommu/vt-d: Handle memory shortage on pasid table allocation") cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces") d657c5c73ca9 ("iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()") ef848b7e5a6a ("iommu/vt-d: Setup pasid entry for RID2PASID support") f7b0c4ce8cb3 ("iommu/vt-d: Flush IOTLB for untrusted device in time") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks, Sasha ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/vt-d: Check if domain->pgd was allocated
There is a couple of places where on domain_init() failure domain_exit() is called. While currently domain_init() can fail only if alloc_pgtable_page() has failed. Make domain_exit() check if domain->pgd present, before calling domain_unmap(), as it theoretically should crash on clearing pte entries in dma_pte_clear_level(). Cc: David Woodhouse Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Signed-off-by: Dmitry Safonov --- drivers/iommu/intel-iommu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6d1510284d21..698cc40355ef 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1835,7 +1835,6 @@ static inline int guestwidth_to_adjustwidth(int gaw) static void domain_exit(struct dmar_domain *domain) { - struct page *freelist; /* Remove associated devices and clear attached or cached domains */ domain_remove_dev_info(domain); @@ -1843,9 +1842,12 @@ static void domain_exit(struct dmar_domain *domain) /* destroy iovas */ put_iova_domain(&domain->iovad); - freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw)); + if (domain->pgd) { + struct page *freelist; - dma_free_pagelist(freelist); + freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw)); + dma_free_pagelist(freelist); + } free_domain_mem(domain); } -- 2.22.0
[PATCH 1/2] iommu/vt-d: Don't queue_iova() if there is no flush queue
Intel VT-d driver was reworked to use common deferred flushing implementation. Previously there was one global per-cpu flush queue, afterwards - one per domain. Before deferring a flush, the queue should be allocated and initialized. Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush queue. It's probably worth to init it for static or unmanaged domains too, but it may be arguable - I'm leaving it to iommu folks. Prevent queuing an iova flush if the domain doesn't have a queue. The defensive check seems to be worth to keep even if queue would be initialized for all kinds of domains. And is easy backportable. On 4.19.43 stable kernel it has a user-visible effect: previously for devices in si domain there were crashes, on sata devices: BUG: spinlock bad magic on CPU#6, swapper/0/1 lock: 0x88844f582008, .magic: , .owner: /-1, .owner_cpu: 0 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1 Call Trace: dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x45/0x115 intel_unmap+0x107/0x113 intel_unmap_sg+0x6b/0x76 __ata_qc_complete+0x7f/0x103 ata_qc_complete+0x9b/0x26a ata_qc_complete_multiple+0xd0/0xe3 ahci_handle_port_interrupt+0x3ee/0x48a ahci_handle_port_intr+0x73/0xa9 ahci_single_level_irq_intr+0x40/0x60 __handle_irq_event_percpu+0x7f/0x19a handle_irq_event_percpu+0x32/0x72 handle_irq_event+0x38/0x56 handle_edge_irq+0x102/0x121 handle_irq+0x147/0x15c do_IRQ+0x66/0xf2 common_interrupt+0xf/0xf RIP: 0010:__do_softirq+0x8c/0x2df The same for usb devices that use ehci-pci: BUG: spinlock bad magic on CPU#0, swapper/0/1 lock: 0x88844f402008, .magic: , .owner: /-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4 Call Trace: dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x77/0x145 intel_unmap+0x107/0x113 intel_unmap_page+0xe/0x10 usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d usb_hcd_unmap_urb_for_dma+0x17/0x100 unmap_urb_for_dma+0x22/0x24 __usb_hcd_giveback_urb+0x51/0xc3 usb_giveback_urb_bh+0x97/0xde tasklet_action_common.isra.4+0x5f/0xa1 tasklet_action+0x2d/0x30 __do_softirq+0x138/0x2df irq_exit+0x7d/0x8b smp_apic_timer_interrupt+0x10f/0x151 apic_timer_interrupt+0xf/0x20 RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39 Cc: David Woodhouse Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: # 4.14+ Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Dmitry Safonov --- drivers/iommu/intel-iommu.c | 3 ++- drivers/iommu/iova.c| 18 ++ include/linux/iova.h| 6 ++ 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index ac4172c02244..6d1510284d21 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3564,7 +3564,8 @@ static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size) freelist = domain_unmap(domain, start_pfn, last_pfn); - if (intel_iommu_strict || (pdev && pdev->untrusted)) { + if (intel_iommu_strict || (pdev && pdev->untrusted) || + !has_iova_flush_queue(&domain->iovad)) { iommu_flush_iotlb_psi(iommu, domain, start_pfn, nrpages, !freelist, 0); /* free iova */ diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index d499b2621239..8413ae54904a 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -54,9 +54,14 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule, } EXPORT_SYMBOL_GPL(init_iova_domain); +bool has_iova_flush_queue(struct iova_domain *iovad) +{ + return !!iovad->fq; +} + static void free_iova_flush_queue(struct iova_domain *iovad) { - if (!iovad->fq) + if (!has_iova_flush_queue(iovad)) return; if (timer_pending(&iovad->fq_timer)) @@ -74,13 +79,14 @@ static void free_iova_flush_queue(struct iova_domain *iovad) int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb, iova_entry_dtor entry_dtor) { + struct iova_fq __percpu *queue; int cpu; atomic64_set(&iovad->fq_flush_start_cnt, 0); atomic64_set(&iovad->fq_flush_finish_cnt, 0); - iovad->fq = alloc_percpu(struct iova_fq); - if (!iovad->fq) + queue = alloc_percpu(struct iova_fq); + if (!queue) return -ENOMEM; iovad->flush_cb = flush_cb; @@ -89,13 +95,17 @@ int init_iova_flush_queue(struct iova_domain *iovad, for_each_possible_cpu(cpu) { struct iova_fq *fq; - fq = per_cpu_ptr(iovad->fq, cpu); + fq = per_cpu_ptr(queue, cpu); fq->head = 0; fq->tail = 0;
Re: [RFC v1 3/4] vfio/type1: VFIO_IOMMU_PASID_REQUEST(alloc/free)
Hi Yi, On 7/5/19 1:06 PM, Liu, Yi L wrote: > From: Liu Yi L > > This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims > to passdown PASID allocation/free request from the virtual > iommu. This is required to get PASID managed in system-wide. > > Cc: Kevin Tian > Signed-off-by: Liu Yi L > Signed-off-by: Yi Sun > Signed-off-by: Jacob Pan > --- > drivers/vfio/vfio_iommu_type1.c | 125 > > include/uapi/linux/vfio.h | 25 > 2 files changed, 150 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 6fda4fb..d5e0c01 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1832,6 +1832,94 @@ static int vfio_cache_inv_fn(struct device *dev, void > *data) > return iommu_cache_invalidate(dc->domain, dev, &ustruct->info); > } > > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, > + int min_pasid, > + int max_pasid) > +{ > + int ret; > + ioasid_t pasid; > + struct mm_struct *mm = NULL; > + > + mutex_lock(&iommu->lock); > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { Is this check really mandated and do you really need to hold the iommu lock? > + ret = -EINVAL; > + goto out_unlock; > + } > + mm = get_task_mm(current); > + /* Jacob: track ioasid allocation owner by mm */ > + pasid = ioasid_alloc((struct ioasid_set *)mm, min_pasid, > + max_pasid, NULL); Shouldn't we have a PASID number limit per mm to prevent a guest from consuming all PASIDs and induce DoS? > + if (pasid == INVALID_IOASID) { > + ret = -ENOSPC; > + goto out_unlock; > + } > + ret = pasid; > +out_unlock: > + mutex_unlock(&iommu->lock); > + if (mm) > + mmput(mm); > + return ret; > +} > + > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, int pasid) > +{ > + struct mm_struct *mm = NULL; > + void *pdata; > + int ret = 0; > + > + mutex_lock(&iommu->lock); > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { same here > + ret = -EINVAL; > + goto out_unlock; > + } > + pr_debug("%s: pasid: %d\n", __func__, pasid); > + > + /** > + * TODO: > + * a) for pasid free, needs to return error if free failed > + * b) Sanity check: check if the pasid is allocated to the > + * current process such check may be in > + * vendor specific pasid_free callback or > + * in generic layer > + * c) clean up device list and free p_alloc structure > + * > + * Jacob: > + * There are two cases free could fail: > + * 1. free pasid by non-owner, we can use ioasid_set to track mm, if > + * the set does not match, caller is not permitted to free. > + * 2. free before unbind all devices, we can check if ioasid private > + * data, if data != NULL, then fail to free. > + */ who is going to do the garbage collection of PASIDs used by the guest in general as we cannot rely on the userspace to do that in general? > + > + mm = get_task_mm(current); > + pdata = ioasid_find((struct ioasid_set *)mm, pasid, NULL); > + if (IS_ERR(pdata)) { > + if (pdata == ERR_PTR(-ENOENT)) > + pr_debug("pasid %d is not allocated\n", pasid); > + else if (pdata == ERR_PTR(-EACCES)) > + pr_debug("Not owner of pasid %d," > + "no pasid free allowed\n", pasid); > + else > + pr_debug("error happened during searching" > + " pasid: %d\n", pasid); > + ret = -EPERM; return actual pdata error? > + goto out_unlock; > + } > + if (pdata) { > + pr_debug("Cannot free pasid %d with private data\n", pasid); > + /* Expect PASID has no private data if not bond */> + > ret = -EBUSY; > + goto out_unlock; > + } > + ioasid_free(pasid); > + > +out_unlock: > + if (mm) > + mmput(mm); > + mutex_unlock(&iommu->lock); > + return ret; > +} > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { > @@ -1936,6 +2024,43 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > &ustruct); > mutex_unlock(&iommu->lock); > return ret; > + > + } else if (cmd == VFIO_IOMMU_PASID_REQUEST) { > + struct vfio_iommu_type1_pasid_request req; > + int min_pasid, max_pasid, pasid; > + > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, > + flag); > + > + if (
Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
Hi Eric Jacob is on sabbatical, so i'll give it my best shot :-) Yi/Kevin can jump in... On Tue, Jul 16, 2019 at 06:45:51PM +0200, Auger Eric wrote: > Hi Jacob, > > On 6/9/19 3:44 PM, Jacob Pan wrote: > > When supporting guest SVA with emulated IOMMU, the guest PASID > > table is shadowed in VMM. Updates to guest vIOMMU PASID table > > will result in PASID cache flush which will be passed down to > > the host as bind guest PASID calls. > > > > For the SL page tables, it will be harvested from device's > > default domain (request w/o PASID), or aux domain in case of > > mediated device. > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > > vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | | |SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/intel-iommu.c | 4 + > > drivers/iommu/intel-svm.c | 187 > > > > include/linux/intel-iommu.h | 13 ++- > > include/linux/intel-svm.h | 17 > > 4 files changed, 219 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > index 7cfa0eb..3b4d712 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = { > > .dev_enable_feat= intel_iommu_dev_enable_feat, > > .dev_disable_feat = intel_iommu_dev_disable_feat, > > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > > +#ifdef CONFIG_INTEL_IOMMU_SVM > > + .sva_bind_gpasid= intel_svm_bind_gpasid, > > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > > +#endif > > }; > > > > static void quirk_iommu_g4x_gfx(struct pci_dev *dev) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > > index 66d98e1..f06a82f 100644 > > --- a/drivers/iommu/intel-svm.c > > +++ b/drivers/iommu/intel-svm.c > > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list); > > list_for_each_entry(sdev, &svm->devs, list) \ > > if (dev == sdev->dev) \ > > > > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > > + struct device *dev, > > + struct gpasid_bind_data *data) > > +{ > > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > > + struct intel_svm_dev *sdev; > > + struct intel_svm *svm = NULL; > not requested > > + struct dmar_domain *ddomain; > > + int ret = 0; > > + > > + if (WARN_ON(!iommu) || !data) > > + return -EINVAL; > > + > > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > > + return -EINVAL; > > + > > + if (dev_is_pci(dev)) { > > + /* VT-d supports devices with full 20 bit PASIDs only */ > > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) > > + return -EINVAL; > > + } > > + > > + /* > > +* We only check host PASID range, we have no knowledge to check > > +* guest PASID range nor do we use the guest PASID. > guest pasid is set below in svm->gpasid. > So I am confused, do you handle gpasid or not? If you don't use gpasid > at the moment, then you may return -EINVAL if data->flags & > IOMMU_SVA_GPASID_VAL. > > I confess I don't really understand gpasid as I thought you use > enlightened PASID allocation to use a system wide PASID allocator on > host so in which case guest and host PASID do differ? Correct, from within the guest, we only use the enlightned allocator. Guest PASID can be managed via Qemu, and it will also associate guest pasid's with appropriate host pasids. Sort of how guest bdf and host bdf are managed for page-request etc.
Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > When supporting guest SVA with emulated IOMMU, the guest PASID > table is shadowed in VMM. Updates to guest vIOMMU PASID table > will result in PASID cache flush which will be passed down to > the host as bind guest PASID calls. > > For the SL page tables, it will be harvested from device's > default domain (request w/o PASID), or aux domain in case of > mediated device. > > .-. .---. > | vIOMMU| | Guest process CR3, FL only| > | | '---' > ./ > | PASID Entry |--- PASID cache flush - > '-' | > | | V > | |CR3 in GPA > '-' > Guest > --| Shadow |--| > vv v > Host > .-. .--. > | pIOMMU| | Bind FL for GVA-GPA | > | | '--' > ./ | > | PASID Entry | V (Nested xlate) > '\.--. > | | |SL for GPA-HPA, default domain| > | | '--' > '-' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > > Signed-off-by: Jacob Pan > Signed-off-by: Liu, Yi L > --- > drivers/iommu/intel-iommu.c | 4 + > drivers/iommu/intel-svm.c | 187 > > include/linux/intel-iommu.h | 13 ++- > include/linux/intel-svm.h | 17 > 4 files changed, 219 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 7cfa0eb..3b4d712 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = { > .dev_enable_feat= intel_iommu_dev_enable_feat, > .dev_disable_feat = intel_iommu_dev_disable_feat, > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > +#ifdef CONFIG_INTEL_IOMMU_SVM > + .sva_bind_gpasid= intel_svm_bind_gpasid, > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > +#endif > }; > > static void quirk_iommu_g4x_gfx(struct pci_dev *dev) > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index 66d98e1..f06a82f 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list); > list_for_each_entry(sdev, &svm->devs, list) \ > if (dev == sdev->dev) \ > > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, > + struct gpasid_bind_data *data) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct intel_svm_dev *sdev; > + struct intel_svm *svm = NULL; not requested > + struct dmar_domain *ddomain; > + int ret = 0; > + > + if (WARN_ON(!iommu) || !data) > + return -EINVAL; > + > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > + return -EINVAL; > + > + if (dev_is_pci(dev)) { > + /* VT-d supports devices with full 20 bit PASIDs only */ > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) > + return -EINVAL; > + } > + > + /* > + * We only check host PASID range, we have no knowledge to check > + * guest PASID range nor do we use the guest PASID. guest pasid is set below in svm->gpasid. So I am confused, do you handle gpasid or not? If you don't use gpasid at the moment, then you may return -EINVAL if data->flags & IOMMU_SVA_GPASID_VAL. I confess I don't really understand gpasid as I thought you use enlightened PASID allocation to use a system wide PASID allocator on host so in which case guest and host PASID do differ? > + */ > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX) > + return -EINVAL; > + > + ddomain = to_dmar_domain(domain); > + /* REVISIT: > + * Sanity check adddress width and paging mode support > + * width matching in two dimensions: > + * 1. paging mode CPU <= IOMMU > + * 2. address width Guest <= Host. > + */ > + mutex_lock(&pasid_mutex); > + svm = ioasid_find(NULL, data->hpasid, NULL); > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + if (svm) { > + /* > + * If we found svm for the PASID, there must be at > + * least one device bond, otherwise svm should be freed. > + */ > + BUG_ON(list_empty(&svm->devs)); > + > + for_each_svm_dev() { > +
Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > Guest shared virtual address (SVA) may require host to shadow guest > PASID tables. Guest PASID can also be allocated from the host via > enlightened interfaces. In this case, guest needs to bind the guest > mm, i.e. cr3 in guest physical address to the actual PASID table in > the host IOMMU. Nesting will be turned on such that guest virtual > address can go through a two level translation: > - 1st level translates GVA to GPA > - 2nd level translates GPA to HPA > This patch introduces APIs to bind guest PASID data to the assigned > device entry in the physical IOMMU. See the diagram below for usage > explaination. > > .-. .---. > | vIOMMU| | Guest process mm, FL only | > | | '---' > ./ > | PASID Entry |--- PASID cache flush - > '-' | > | | V > | | GP > '-' > Guest > --| Shadow |--- GP->HP* - > vv | > Host v > .-. .--. > | pIOMMU| | Bind FL for GVA-GPA | > | | '--' > ./ | > | PASID Entry | V (Nested xlate) > '\.-. > | | |Set SL to GPA-HPA| > | | '-' > '-' > > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > - GP = Guest PASID > - HP = Host PASID > * Conversion needed if non-identity GP-HP mapping option is chosen. > > Signed-off-by: Jacob Pan > Signed-off-by: Liu Yi L > --- > drivers/iommu/iommu.c | 20 > include/linux/iommu.h | 21 + > include/uapi/linux/iommu.h | 58 > ++ > 3 files changed, 99 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 1758b57..d0416f60 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain > *domain, struct device *dev, > } > EXPORT_SYMBOL_GPL(iommu_cache_invalidate); > > +int iommu_sva_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, struct gpasid_bind_data *data) > +{ > + if (unlikely(!domain->ops->sva_bind_gpasid)) > + return -ENODEV; > + > + return domain->ops->sva_bind_gpasid(domain, dev, data); > +} > +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid); > + > +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, > + ioasid_t pasid) > +{ > + if (unlikely(!domain->ops->sva_unbind_gpasid)) > + return -ENODEV; > + > + return domain->ops->sva_unbind_gpasid(dev, pasid); > +} > +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); > + > static void __iommu_detach_device(struct iommu_domain *domain, > struct device *dev) > { > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 8d766a8..560c8c8 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > #include > > #define IOMMU_READ (1 << 0) > @@ -267,6 +268,8 @@ struct page_response_msg { > * @detach_pasid_table: detach the pasid table > * @cache_invalidate: invalidate translation caches > * @pgsize_bitmap: bitmap of all possible supported page sizes > + * @sva_bind_gpasid: bind guest pasid and mm > + * @sva_unbind_gpasid: unbind guest pasid and mm comment vs field ordering as pointed out by Jonathan on other patches > */ > struct iommu_ops { > bool (*capable)(enum iommu_cap); > @@ -332,6 +335,10 @@ struct iommu_ops { > int (*page_response)(struct device *dev, struct page_response_msg *msg); > int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev, > struct iommu_cache_invalidate_info *inv_info); > + int (*sva_bind_gpasid)(struct iommu_domain *domain, > + struct device *dev, struct gpasid_bind_data *data); > + > + int (*sva_unbind_gpasid)(struct device *dev, int pasid); > > unsigned long pgsize_bitmap; > }; > @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain > *domain); > extern int iommu_cache_invalidate(struct iommu_domain *domain, > struct device *dev, > struct iommu_cache_invalidate_info *inv_info); > +extern int iommu_sva_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, struct gpasid_bind_data *data); > +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain, > +
Re: cma_remap when using dma_alloc_attr :- DMA_ATTR_NO_KERNEL_MAPPING
On Tue, Jul 16, 2019 at 01:02:19PM +0100, Robin Murphy wrote: >> Lets say 4k video allocation required 300MB cma memory but not required >> virtual mapping for all the 300MB, its require only 20MB virtually mapped >> at some specific use case/point of video, and unmap virtual mapping after >> uses, at that time this functions will be useful, it works like ioremap() >> for cma_alloc() using dma apis. > > Hmm, is there any significant reason that this case couldn't be handled > with just get_vm_area() plus dma_mmap_attrs(). I know it's only *intended* > for userspace mappings, but since the basic machinery is there... Because the dma helper really are a black box abstraction. That being said DMA_ATTR_NO_KERNEL_MAPPING and DMA_ATTR_NON_CONSISTENT have been a constant pain in the b**t. I've been toying with replacing them with a dma_alloc_pages or similar abstraction that just returns a struct page that is guaranteed to be dma addressable by the passed in device. Then the driver can call dma_map_page / dma_unmap_page / dma_sync_* on it at well. This would replace DMA_ATTR_NON_CONSISTENT with a sensible API, and also DMA_ATTR_NO_KERNEL_MAPPING when called with PageHighmem, while providing an easy to understand API and something that can easily be fed into the various page based APIs in the kernel. That being said until we get arm moved over the common dma direct and dma-iommu code, and x86 fully moved over to dma-iommu it just seems way too much work to even get it into the various architectures that matter, never mind all the fringe IOMMUs. So for now I've just been trying to contain the DMA_ATTR_NON_CONSISTENT and DMA_ATTR_NO_KERNEL_MAPPING in fewer places while also killing bogus or pointless users of these APIs. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: cma_remap when using dma_alloc_attr :- DMA_ATTR_NO_KERNEL_MAPPING
On 12/07/2019 19:30, Pankaj Suryawanshi wrote: Hello, When we allocate cma memory using dma_alloc_attr using DMA_ATTR_NO_KERNEL_MAPPING attribute. It will return physical address without virtual mapping and thats the use case of this attribute. but lets say some vpu/gpu drivers required virtual mapping of some part of the allocation. then we dont have anything to remap that allocated memory to virtual memory. and in 32-bit system it difficult for devices like android to work all the time with virtual mapping, it degrade the performance. For Example : Lets say 4k video allocation required 300MB cma memory but not required virtual mapping for all the 300MB, its require only 20MB virtually mapped at some specific use case/point of video, and unmap virtual mapping after uses, at that time this functions will be useful, it works like ioremap() for cma_alloc() using dma apis. Hmm, is there any significant reason that this case couldn't be handled with just get_vm_area() plus dma_mmap_attrs(). I know it's only *intended* for userspace mappings, but since the basic machinery is there... Robin. /* * function call(s) to create virtual map of given physical memory * range [base, base+size) of CMA memory. */ void *cma_remap(__u32 base, __u32 size) { struct page *page = phys_to_page(base); void *virt; pr_debug("cma: request to map 0x%08x for size 0x%08x\n", base, size); size = PAGE_ALIGN(size); pgprot_t prot = get_dma_pgprot(DMA_ATTR, PAGE_KERNEL); if (PageHighMem(page)){ virt = dma_alloc_remap(page, size, GFP_KERNEL, prot, __builtin_return_address(0)); } else { dma_remap(page, size, prot); virt = page_address(page); } if (!virt) pr_err("\x1b[31m" " cma: failed to map 0x%08x" "\x1b[0m\n", base); else pr_debug("cma: 0x%08x is virtually mapped to 0x%08x\n", base, (__u32) virt); return virt; } /* * function call(s) to remove virtual map of given virtual memory * range [virt, virt+size) of CMA memory. */ void cma_unmap(void *virt, __u32 size) { size = PAGE_ALIGN(size); unsigned long pfn = virt_to_pfn(virt); struct page *page = pfn_to_page(pfn); if (PageHighMem(page)) dma_free_remap(virt, size); else dma_remap(page, size, PAGE_KERNEL); pr_debug(" cma: virtual address 0x%08x is unmapped\n", (__u32) virt); } This functions should be added in arch/arm/mm/dma-mapping.c file. Please let me know if i am missing anything. Regards, Pankaj ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > After each setup for PASID entry, related translation caches must be flushed. > We can combine duplicated code into one function which is less error prone. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-pasid.c | 48 > + > 1 file changed, 18 insertions(+), 30 deletions(-) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index 1e25539..1ff2ecc 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu > *iommu, > devtlb_invalidation_with_pasid(iommu, dev, pasid); > } > > +static inline void pasid_flush_caches(struct intel_iommu *iommu, > + struct pasid_entry *pte, > + int pasid, u16 did) > +{ > + if (!ecap_coherent(iommu->ecap)) > + clflush_cache_range(pte, sizeof(*pte)); > + > + if (cap_caching_mode(iommu->cap)) { > + pasid_cache_invalidation_with_pasid(iommu, did, pasid); > + iotlb_invalidation_with_pasid(iommu, did, pasid); > + } else > + iommu_flush_write_buffer(iommu); Besides the style issue reported by Jonathan and the fact this function may not deserve the inline (chapt 15 of https://www.kernel.org/doc/html/v5.1/process/coding-style.html), Reviewed-by: Eric Auger Thanks Eric > + > +} > + > /* > * Set up the scalable mode pasid table entry for first only > * translation type. > @@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu > *iommu, > /* Setup Present and PASID Granular Transfer Type: */ > pasid_set_translation_type(pte, 1); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } > @@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu > *iommu, >*/ > pasid_set_sre(pte); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } > @@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu > *iommu, >*/ > pasid_set_sre(pte); > pasid_set_present(pte); > - > - if (!ecap_coherent(iommu->ecap)) > - clflush_cache_range(pte, sizeof(*pte)); > - > - if (cap_caching_mode(iommu->cap)) { > - pasid_cache_invalidation_with_pasid(iommu, did, pasid); > - iotlb_invalidation_with_pasid(iommu, did, pasid); > - } else { > - iommu_flush_write_buffer(iommu); > - } > + pasid_flush_caches(iommu, pte, pasid, did); > > return 0; > } >
Re: [PATCH v4 16/22] iommu/vt-d: Move domain helper to header
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > Move domainer helper to header to be used by SVA code. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-iommu.c | 6 -- > include/linux/intel-iommu.h | 6 ++ > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 39b63fe..7cfa0eb 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu > *iommu) > iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED; > } > > -/* Convert generic 'struct iommu_domain to private struct dmar_domain */ > -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) > -{ > - return container_of(dom, struct dmar_domain, domain); > -} > - > static int __init intel_iommu_setup(char *str) > { > if (!str) > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index 8605c74..b75f17d 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -597,6 +597,12 @@ static inline void __iommu_flush_cache( > clflush_cache_range(addr, size); > } > > +/* Convert generic 'struct iommu_domain to private struct dmar_domain */ fix the single '? > +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) > +{ > + return container_of(dom, struct dmar_domain, domain); > +} > + > /* > * 0: readable > * 1: writable > Reviewed-by: Eric Auger Thanks Eric
Re: [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > When VT-d driver runs in the guest, PASID allocation must be > performed via virtual command interface. This patch registers a > custom IOASID allocator which takes precedence over the default > XArray based allocator. The resulting IOASID allocation will always > come from the host. This ensures that PASID namespace is system- > wide. > > Signed-off-by: Lu Baolu > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/Kconfig | 1 + > drivers/iommu/intel-iommu.c | 60 > + > include/linux/intel-iommu.h | 2 ++ > 3 files changed, 63 insertions(+) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index c40c4b5..5d1bc2a 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM > bool "Support for Shared Virtual Memory with Intel IOMMU" > depends on INTEL_IOMMU && X86 > select PCI_PASID > + select IOASID > select MMU_NOTIFIER > select IOASID > help > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 09b8ff0..5b84994 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu) > if (ecap_prs(iommu->ecap)) > intel_svm_finish_prq(iommu); > } > + ioasid_unregister_allocator(&iommu->pasid_allocator); > + > #endif > } > > @@ -4820,6 +4822,46 @@ static int __init platform_optin_force_iommu(void) > return 1; > } > > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data) > +{ > + struct intel_iommu *iommu = data; > + ioasid_t ioasid; > + > + /* > + * VT-d virtual command interface always uses the full 20 bit > + * PASID range. Host can partition guest PASID range based on > + * policies but it is out of guest's control. > + */ > + if (min < PASID_MIN || max > PASID_MAX) > + return -EINVAL; ioasid_alloc() does not handle that error value, use INVALID_IOASID? > + > + if (vcmd_alloc_pasid(iommu, &ioasid)) > + return INVALID_IOASID; > + > + return ioasid; > +} > + > +static void intel_ioasid_free(ioasid_t ioasid, void *data) > +{ > + struct iommu_pasid_alloc_info *svm; > + struct intel_iommu *iommu = data; > + > + if (!iommu) > + return; > + /* > + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO > + * We can only free the PASID when all the devices are unbond. > + */ > + svm = ioasid_find(NULL, ioasid, NULL); > + if (!svm) { > + pr_warn("Freeing unbond IOASID %d\n", ioasid); > + return; > + } > + vcmd_free_pasid(iommu, ioasid); > +} > +#endif > + > int __init intel_iommu_init(void) > { > int ret = -ENODEV; > @@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void) > "%s", iommu->name); > iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops); > iommu_device_register(&iommu->iommu); > +#ifdef CONFIG_INTEL_IOMMU_SVM > + if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) { > + /* > + * Register a custom ASID allocator if we are running > + * in a guest, the purpose is to have a system wide > PASID > + * namespace among all PASID users. > + * There can be multiple vIOMMUs in each guest but only > + * one allocator is active. All vIOMMU allocators will > + * eventually be calling the same host allocator. > + */ > + iommu->pasid_allocator.alloc = intel_ioasid_alloc; > + iommu->pasid_allocator.free = intel_ioasid_free; > + iommu->pasid_allocator.pdata = (void *)iommu; > + ret = > ioasid_register_allocator(&iommu->pasid_allocator); > + if (ret) > + pr_warn("Custom PASID allocator registeration > failed\n"); what if it fails, don't you want a tear down path? Thanks Eric > + } > +#endif > } > > bus_set_iommu(&pci_bus_type, &intel_iommu_ops); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index bff907b..8605c74 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -549,6 +550,7 @@ struct intel_iommu { > #ifdef CONFIG_INTEL_IOMMU_SVM > struct page_req_dsc *prq; > unsigned char prq_name[16];/* Name for PRQ interrupt */ > + struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for > PASIDs */ > #endif >
Re: [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation
Hi Jacob, On 6/9/19 3:44 PM, Jacob Pan wrote: > From: Lu Baolu > > If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the > IOMMU driver should rely on the emulation software to allocate > and free PASID IDs. The Intel vt-d spec revision 3.0 defines a > register set to support this. This includes a capability register, > a virtual command register and a virtual response register. Refer > to section 10.4.42, 10.4.43, 10.4.44 for more information. > > This patch adds the enlightened PASID allocation/free interfaces > via the virtual command register.> > Cc: Ashok Raj > Cc: Jacob Pan > Cc: Kevin Tian > Signed-off-by: Liu Yi L > Signed-off-by: Lu Baolu > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-pasid.c | 76 > + > drivers/iommu/intel-pasid.h | 13 +++- > include/linux/intel-iommu.h | 2 ++ > 3 files changed, 90 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c > index 2fefeaf..69fddd3 100644 > --- a/drivers/iommu/intel-pasid.c > +++ b/drivers/iommu/intel-pasid.c > @@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid) > return p; > } > > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid) > +{ > + u64 res; > + u64 cap; > + u8 status_code; > + unsigned long flags; > + int ret = 0; > + > + if (!ecap_vcs(iommu->ecap)) { > + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n", > + iommu->name); > + return -ENODEV; > + } > + > + cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); > + if (!(cap & DMA_VCS_PAS)) { > + pr_warn("IOMMU: %s: Emulation software doesn't support PASID > allocation\n", > + iommu->name); > + return -ENODEV; > + } > + > + raw_spin_lock_irqsave(&iommu->register_lock, flags); > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC); > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, > + !(res & VCMD_VRSP_IP), res); > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags); > + > + status_code = VCMD_VRSP_SC(res); > + switch (status_code) { > + case VCMD_VRSP_SC_SUCCESS: > + *pasid = VCMD_VRSP_RESULT(res); > + break; > + case VCMD_VRSP_SC_NO_PASID_AVAIL: > + pr_info("IOMMU: %s: No PASID available\n", iommu->name); > + ret = -ENOMEM; > + break; > + default: > + ret = -ENODEV; > + pr_warn("IOMMU: %s: Unkonwn error code %d\n", unknown s/unknown/unexpected > + iommu->name, status_code); > + } > + > + return ret; > +} > + > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid) > +{ > + u64 res; > + u8 status_code; > + unsigned long flags; > + > + if (!ecap_vcs(iommu->ecap)) { > + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n", > + iommu->name); > + return; > + } Logically shouldn't you also check DMAR_VCCAP_REG as well? > + > + raw_spin_lock_irqsave(&iommu->register_lock, flags); > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE); > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, > + !(res & VCMD_VRSP_IP), res); > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags); > + > + status_code = VCMD_VRSP_SC(res); > + switch (status_code) { > + case VCMD_VRSP_SC_SUCCESS: > + break; > + case VCMD_VRSP_SC_INVALID_PASID: > + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name); > + break; > + default: > + pr_warn("IOMMU: %s: Unkonwn error code %d\n", > + iommu->name, status_code); s/Unkonwn/Unexpected > + } > +} > + > /* > * Per device pasid table management: > */ > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 23537b3..4b26ab5 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -19,6 +19,16 @@ > #define PASID_PDE_SHIFT 6 > #define MAX_NR_PASID_BITS20 > > +/* Virtual command interface for enlightened pasid management. */ > +#define VCMD_CMD_ALLOC 0x1 > +#define VCMD_CMD_FREE0x2 > +#define VCMD_VRSP_IP 0x1 > +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3) > +#define VCMD_VRSP_SC_SUCCESS 0 > +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1 > +#define VCMD_VRSP_SC_INVALID_PASID 1 > +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xf) > + > /* > * Domain ID reserved for pasid entries programmed for first-level > * only and pass-through transfer modes. > @@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu > *iommu, > struct device *dev, int pasid); > voi