[PATCH 1/1] iommu/vt-d: Use real PASID for flush in caching mode
If caching mode is supported, the hardware will cache none-present or erroneous translation entries. Hence, software should explicitly invalidate the PASID cache after a PASID table entry becomes present. We should issue such invalidation with the PASID value that we have changed. PASID 0 is not reserved for this case. Cc: Jacob PanCc: Kevin Tian Cc: Sankaran Rajesh Suggested-by: Ashok Raj Signed-off-by: Liu Yi L Signed-off-by: Lu Baolu --- drivers/iommu/intel-svm.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 99bc9bd..b7b88b5 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -422,17 +422,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ iommu->pasid_table[svm->pasid].val = pasid_entry_val; wmb(); - /* In caching mode, we still have to flush with PASID 0 when -* a PASID table entry becomes present. Not entirely clear -* *why* that would be the case — surely we could just issue -* a flush with the PASID value that we've changed? The PASID -* is the index into the table, after all. It's not like domain -* IDs in the case of the equivalent context-entry change in -* caching mode. And for that matter it's not entirely clear why -* a VMM would be in the business of caching the PASID table -* anyway. Surely that can be left entirely to the guest? */ + + /* +* Flush PASID cache when a PASID table entry becomes +* present. +*/ if (cap_caching_mode(iommu->cap)) - intel_flush_pasid_dev(svm, sdev, 0); + intel_flush_pasid_dev(svm, sdev, svm->pasid); } list_add_rcu(>list, >devs); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 2/5] dt-bindings: brcm: Add reserved-dma-region for iPROC
On 15/03/18 12:03, Jitendra Bhivare wrote: On Thu, Mar 15, 2018 at 12:15 AM, Robin Murphywrote: On 12/03/18 07:03, Jitendra Bhivare wrote: On Tue, Mar 6, 2018 at 5:12 PM, Robin Murphy wrote: On 06/03/18 04:59, Jitendra Bhivare wrote: With SoC wide DMA mask of 40-bit, the mappings for entire IOVA space can't be specified in the PAXBv2 PCIe RC of SoC. The holes in IOVA space needs to be reserved to prevent any IOVA allocations in those spaces. Can you clarify why? If this is the PCI inbound window thing again, let me say once again that "dma-ranges" is the appropriate way for DT to describe the hardware. Robin. dma-ranges = < \ 0x4300 0x00 0x 0x00 0x8000 0x00 0x8000 \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x80 0x> Yes, its for PCI inbound windows. In our HW, they are limited by sizes specified in ARM memory maps which was done for non-IOMMU cases to catch any transfer outside the ranges. dma-ranges are already being used to program these inbound windows and SoC wide DMA mask is already specified but IOMMU code can still allocate IOVAs in the gaps for which translation will fail in PCIe RC. Right, so make iommu-dma reserve the gaps. No need to clutter up the DT with redundant information which gets handled pretty much identically anyway. Robin. There was a mistake in pasting dma-ranges mentioned. This is what we use it in IOMMU case. #define PCIE_DMA_RANGES dma-ranges = < \ 0x4300 0x00 0x 0x00 0x 0x04 0x \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x80 0x> #define PCIE_RESERVED_DMA_REGION reserved-dma-region = < \ 0x0 0x0 0x04 0x0 0x04 0x0 \ 0x0 0x0 0x10 0x0 0x70 0x0> This is non-iommu case which is per ARM memory map. #define PCIE_DMA_RANGES dma-ranges = < \ 0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x40 0x> In IOMMU case, we had to increase first in-bound window i.e. dma-ranges first entry from 2GB-4GB (in non-iommu case) to 0-16GB because out-bound window specified from 2GB-4GB implicitly gets reserved by the PCI iommu code. This allowed IOVA allocations from 0-2GB and 4-16GB mapped through in-bound windows. Basically, my point is that dma-ranges specified is being used to program our in-bound windows and defines PCI space to CPU address space supported mappings of HW. In the same way, it would be better to explicitly specify reserved-dma-region to clarify this as HW hole than implicity reserving the regions. Why? If DMA traffic can only pass through inbound windows, and you program the inbound windows based on dma-ranges, then by definition dma-ranges tells you exactly the set of addresses that are usable for DMA, and by extension, IOMMU remapping. I fail to see how it's then "better" to add a second slightly different description of the exact same information with the bonus maintenance burden of then having to ensure the two actually match. Robin. We can at least have single property 'reserved-dma-window' for MSI region and DMA regions to help describe the holes in the HW than let SW compute it. reserved-dma-region property is added to specify the ranges which should never be mapped and given to devices sitting behind. Reviewed-by: Ray Jui Reviewed-by: Vikram Prakash Reviewed-by: Scott Branden Signed-off-by: Jitendra Bhivare --- Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt index b8e48b4..3be0fe3 100644 --- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt +++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt @@ -30,6 +30,9 @@ Optional properties: - dma-ranges: Some PAXB-based root complexes do not have inbound mapping done by the ASIC after power on reset. In this case, SW is required to configure the mapping, based on inbound memory regions specified by this property. +- reserved-dma-region: PAXBv2 with IOMMU enabled cannot provide mappings for + entire IOVA space specified by DMA mask. Hence this is used to reserve the + gaps in dma-ranges. - brcm,pcie-ob: Some iProc SoCs do not have the outbound address mapping done by the ASIC after power on reset. In this case, SW needs to configure it ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 3/5] dt-bindings: arm-smmu: Add reserved-msi-region
On 12/03/18 07:19, Jitendra Bhivare wrote: On Tue, Mar 6, 2018 at 5:17 PM, Robin Murphywrote: On 06/03/18 04:59, Jitendra Bhivare wrote: iPROC SoC has a special device window to treat GICv3 ITS MSI. Ugh, really? After preferably printing out 100 copies of the SBSA, binding them all together and dropping the lot onto the hardware designers from a great height, could you clarify what exactly the special behaviour is here? Robin. The special device window needs to programmed so that writes to specified address region helps for specific ordering of traffic prior to it. OK, I know of PCIe root complexes having address matching to give the MSI write the appropriate AXI memory type attributes when the SMMU is bypassed, but ordering is a new one. I guess you have some glue logic on the root complex master interface which injects an ACE barrier transaction in front of any write to the ITS doorbell address? Current code maps MSI to IOVA space. Add SMMU node property to use direct mappings for MSI region. This property is read and reserved when arm_smmu_get_resv_regions gets called. Either way, this should be a generic, not SMMU-specific, property - there are other platforms that would also make use of it to describe a similar hardware situation (which we currently only support via ACPI). The big question is where to put it: hardware-wise, the property of "MSIs won't work properly if the doorbell is remapped to a different address" belongs firmly to the root complex, not the IOMMU, while the address itself is already a property of the MSI controller. The IOMMU is just the innocent guy in the middle who has to discover and respect the constraints. I'd like to think we could just have some flag to say "you can't remap my MSIs!" then go and chase the msi-parent/msi-map phandle to the MSI controller to get the address from its reg property, which is more or less the equivalent of what the current ACPI workaround does - see commit 8b4282e6b8e2 ("ACPI/IORT: Add msi address regions reservation helper") in linux-next - but I can already think of ways in which that might not work. For one, there's not necessarily any guarantee that the MSI controller's programming interface and doorbell are in the same place, so we probably do need to describe the specific MSI address(es) that the root complex cares about, from the root complex end :/ Robin. Signed-off-by: Jitendra Bhivare --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 12 1 file changed, 12 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 8a6ffce..13fa2b9 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -71,6 +71,15 @@ conditions. or using stream matching with #iommu-cells = <2>, and may be ignored if present in such cases. +- reserved-msi-region: MSI region to be reserved with specific prot in IOVA + space for direct mapping. The region is specified in tuple + of (busno,prot,bus_addr,size). + +- #region-address-cells: specifies number of cells needed to encode bus_addr + +- #region-size-cells: specifies number of cells needed to encode size + + ** Deprecated properties: - mmu-masters (deprecated in favour of the generic "iommus" binding) : @@ -95,6 +104,9 @@ conditions. <0 36 4>, <0 37 4>; #iommu-cells = <1>; + #region-address-cells = <1>; + #region-size-cells = <1>; + reserved-msi-region = <0x0 0x1a 0x63c3000 0x8000>; }; /* device with two stream IDs, 0 and 7 */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 11/14] swiotlb: remove swiotlb_set_mem_attributes
On Thu, Mar 15, 2018 at 01:51:59PM -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Mar 14, 2018 at 06:52:10PM +0100, Christoph Hellwig wrote: > > Now that set_memory_decrypted is always available we can just call it > > directly. > > > > Won't this break ARM build? And of course right after asking that question I went back to the previous patch as I had that in my mind it was arch/x86/include, not the global one. So pls ignore my query and instead: Reviewed-by: Konrad Rzeszutek Wilk___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 11/14] swiotlb: remove swiotlb_set_mem_attributes
On Wed, Mar 14, 2018 at 06:52:10PM +0100, Christoph Hellwig wrote: > Now that set_memory_decrypted is always available we can just call it > directly. > Won't this break ARM build? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/14] set_memory.h: provide set_memory_{en,de}crypted stubs
On Wed, Mar 14, 2018 at 06:52:09PM +0100, Christoph Hellwig wrote: > Signed-off-by: Christoph HellwigReviewed-by: Konrad Rzeszutek Wilk Thank you! > --- > include/linux/set_memory.h | 12 > 1 file changed, 12 insertions(+) > > diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h > index e5140648f638..da5178216da5 100644 > --- a/include/linux/set_memory.h > +++ b/include/linux/set_memory.h > @@ -17,4 +17,16 @@ static inline int set_memory_x(unsigned long addr, int > numpages) { return 0; } > static inline int set_memory_nx(unsigned long addr, int numpages) { return > 0; } > #endif > > +#ifndef CONFIG_ARCH_HAS_MEM_ENCRYPT > +static inline int set_memory_encrypted(unsigned long addr, int numpages) > +{ > + return 0; > +} > + > +static inline int set_memory_decrypted(unsigned long addr, int numpages) > +{ > + return 0; > +} > +#endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */ > + > #endif /* _LINUX_SET_MEMORY_H_ */ > -- > 2.14.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/14] x86/amd_gart: look at coherent_dma_mask instead of GFP_DMA
On Wed, Mar 14, 2018 at 06:52:04PM +0100, Christoph Hellwig wrote: > We want to phase out looking at the magic GFP_DMA flag in the dma mapping > routines, so switch the gart driver to use the coherent_dma_mask instead, > which is used to select the GFP_DMA flag in the caller. > Reviewed-by: Konrad Rzeszutek Wilk> Signed-off-by: Christoph Hellwig > --- > arch/x86/kernel/amd_gart_64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c > index 52e3abcf3e70..79ac6cbb 100644 > --- a/arch/x86/kernel/amd_gart_64.c > +++ b/arch/x86/kernel/amd_gart_64.c > @@ -484,7 +484,7 @@ gart_alloc_coherent(struct device *dev, size_t size, > dma_addr_t *dma_addr, > unsigned long align_mask; > struct page *page; > > - if (force_iommu && !(flag & GFP_DMA)) { > + if (force_iommu && dev->coherent_dma_mask > DMA_BIT_MASK(24)) { > flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); > page = alloc_pages(flag | __GFP_ZERO, get_order(size)); > if (!page) > -- > 2.14.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 6/7] vfio/type1: remove duplicate retrieval of reserved regions
As we now already have the reserved regions list, just pass that into vfio_iommu_has_sw_msi() fn. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 90f195d..65c13e7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1204,15 +1204,13 @@ static struct vfio_group *find_iommu_group(struct vfio_domain *domain, return NULL; } -static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base) +static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions, + phys_addr_t *base) { - struct list_head group_resv_regions; - struct iommu_resv_region *region, *next; + struct iommu_resv_region *region; bool ret = false; - INIT_LIST_HEAD(_resv_regions); - iommu_get_group_resv_regions(group, _resv_regions); - list_for_each_entry(region, _resv_regions, list) { + list_for_each_entry(region, group_resv_regions, list) { /* * The presence of any 'real' MSI regions should take * precedence over the software-managed one if the @@ -1228,8 +1226,7 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base) ret = true; } } - list_for_each_entry_safe(region, next, _resv_regions, list) - kfree(region); + return ret; } @@ -1564,7 +1561,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (ret) goto out_detach; - resv_msi = vfio_iommu_has_sw_msi(iommu_group, _msi_base); + resv_msi = vfio_iommu_has_sw_msi(_resv_regions, _msi_base); INIT_LIST_HEAD(>group_list); list_add(>next, >group_list); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 4/7] vfio/type1: check dma map request is within a valid iova range
This checks and rejects any dma map request outside valid iova range. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 25e6920..d59db31 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -982,6 +982,23 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, return ret; } +/* + * Check dma map request is within a valid iova range + */ +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu, + dma_addr_t start, dma_addr_t end) +{ + struct list_head *iova = >iova_list; + struct vfio_iova *node; + + list_for_each_entry(node, iova, list) { + if ((start >= node->start) && (end <= node->end)) + return true; + } + + return false; +} + static int vfio_dma_do_map(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_map *map) { @@ -1020,6 +1037,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, goto out_unlock; } + if (!vfio_iommu_iova_dma_valid(iommu, iova, iova + size - 1)) { + ret = -EINVAL; + goto out_unlock; + } + dma = kzalloc(sizeof(*dma), GFP_KERNEL); if (!dma) { ret = -ENOMEM; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/7] vfio/type1: Check reserve region conflict and update iova list
This retrieves the reserved regions associated with dev group and checks for conflicts with any existing dma mappings. Also update the iova list excluding the reserved regions. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 90 + 1 file changed, 90 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 1123c74..cfe2bb2 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1313,6 +1313,82 @@ static int vfio_iommu_aper_resize(struct list_head *iova, return 0; } +/* + * Check reserved region conflicts with existing dma mappings + */ +static bool vfio_iommu_resv_conflict(struct vfio_iommu *iommu, + struct list_head *resv_regions) +{ + struct iommu_resv_region *region; + + /* Check for conflict with existing dma mappings */ + list_for_each_entry(region, resv_regions, list) { + if (vfio_find_dma(iommu, region->start, region->length)) + return true; + } + + return false; +} + +/* + * Check iova region overlap with reserved regions and + * exclude them from the iommu iova range + */ +static int vfio_iommu_resv_exclude(struct list_head *iova, + struct list_head *resv_regions) +{ + struct iommu_resv_region *resv; + struct vfio_iova *n, *next; + + list_for_each_entry(resv, resv_regions, list) { + phys_addr_t start, end; + + start = resv->start; + end = resv->start + resv->length - 1; + + list_for_each_entry_safe(n, next, iova, list) { + int ret = 0; + + /* No overlap */ + if ((start > n->end) || (end < n->start)) + continue; + /* +* Insert a new node if current node overlaps with the +* reserve region to exlude that from valid iova range. +* Note that, new node is inserted before the current +* node and finally the current node is deleted keeping +* the list updated and sorted. +*/ + if (start > n->start) + ret = vfio_iommu_iova_insert(>list, + n->start, start - 1); + if (!ret && end < n->end) + ret = vfio_iommu_iova_insert(>list, + end + 1, n->end); + if (ret) + return ret; + + list_del(>list); + kfree(n); + } + } + + if (list_empty(iova)) + return -EINVAL; + + return 0; +} + +static void vfio_iommu_resv_free(struct list_head *resv_regions) +{ + struct iommu_resv_region *n, *next; + + list_for_each_entry_safe(n, next, resv_regions, list) { + list_del(>list); + kfree(n); + } +} + static void vfio_iommu_iova_free(struct list_head *iova) { struct vfio_iova *n, *next; @@ -1366,6 +1442,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, phys_addr_t resv_msi_base; struct iommu_domain_geometry geo; LIST_HEAD(iova_copy); + LIST_HEAD(group_resv_regions); mutex_lock(>lock); @@ -1444,6 +1521,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_detach; } + iommu_get_group_resv_regions(iommu_group, _resv_regions); + + if (vfio_iommu_resv_conflict(iommu, _resv_regions)) { + ret = -EINVAL; + goto out_detach; + } + /* Get a copy of the current iova list and work on it */ ret = vfio_iommu_iova_get_copy(iommu, _copy); if (ret) @@ -1454,6 +1538,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (ret) goto out_detach; + ret = vfio_iommu_resv_exclude(_copy, _resv_regions); + if (ret) + goto out_detach; + resv_msi = vfio_iommu_has_sw_msi(iommu_group, _msi_base); INIT_LIST_HEAD(>group_list); @@ -1514,6 +1602,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, /* Delete the old one and insert new iova list */ vfio_iommu_iova_insert_copy(iommu, _copy); mutex_unlock(>lock); + vfio_iommu_resv_free(_resv_regions); return 0; @@ -1522,6 +1611,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, out_domain: iommu_domain_free(domain->domain); vfio_iommu_iova_free(_copy); + vfio_iommu_resv_free(_resv_regions); out_free: kfree(domain);
[PATCH v5 7/7] iommu/dma: Move PCI window region reservation back into dma specific path.
This pretty much reverts commit 273df9635385 ("iommu/dma: Make PCI window reservation generic") by moving the PCI window region reservation back into the dma specific path so that these regions doesn't get exposed via the IOMMU API interface. With this change, the vfio interface will report only iommu specific reserved regions to the user space. Cc: Robin MurphyCc: Joerg Roedel Signed-off-by: Shameer Kolothum --- drivers/iommu/dma-iommu.c | 54 ++- 1 file changed, 25 insertions(+), 29 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f05f3cf..ddcbbdb 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -167,40 +167,16 @@ EXPORT_SYMBOL(iommu_put_dma_cookie); * @list: Reserved region list from iommu_get_resv_regions() * * IOMMU drivers can use this to implement their .get_resv_regions callback - * for general non-IOMMU-specific reservations. Currently, this covers host - * bridge windows for PCI devices and GICv3 ITS region reservation on ACPI - * based ARM platforms that may require HW MSI reservation. + * for general non-IOMMU-specific reservations. Currently, this covers GICv3 + * ITS region reservation on ACPI based ARM platforms that may require HW MSI + * reservation. */ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { - struct pci_host_bridge *bridge; - struct resource_entry *window; - - if (!is_of_node(dev->iommu_fwspec->iommu_fwnode) && - iort_iommu_msi_get_resv_regions(dev, list) < 0) - return; - - if (!dev_is_pci(dev)) - return; - - bridge = pci_find_host_bridge(to_pci_dev(dev)->bus); - resource_list_for_each_entry(window, >windows) { - struct iommu_resv_region *region; - phys_addr_t start; - size_t length; - - if (resource_type(window->res) != IORESOURCE_MEM) - continue; - start = window->res->start - window->offset; - length = window->res->end - window->res->start + 1; - region = iommu_alloc_resv_region(start, length, 0, - IOMMU_RESV_RESERVED); - if (!region) - return; + if (!is_of_node(dev->iommu_fwspec->iommu_fwnode)) + iort_iommu_msi_get_resv_regions(dev, list); - list_add_tail(>list, list); - } } EXPORT_SYMBOL(iommu_dma_get_resv_regions); @@ -229,6 +205,23 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie, return 0; } +static void iova_reserve_pci_windows(struct pci_dev *dev, + struct iova_domain *iovad) +{ + struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); + struct resource_entry *window; + unsigned long lo, hi; + + resource_list_for_each_entry(window, >windows) { + if (resource_type(window->res) != IORESOURCE_MEM) + continue; + + lo = iova_pfn(iovad, window->res->start - window->offset); + hi = iova_pfn(iovad, window->res->end - window->offset); + reserve_iova(iovad, lo, hi); + } +} + static int iova_reserve_iommu_regions(struct device *dev, struct iommu_domain *domain) { @@ -238,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev, LIST_HEAD(resv_regions); int ret = 0; + if (dev_is_pci(dev)) + iova_reserve_pci_windows(to_pci_dev(dev), iovad); + iommu_get_resv_regions(dev, _regions); list_for_each_entry(region, _regions, list) { unsigned long lo, hi; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 5/7] vfio/type1: Add IOVA range capability support
This allows the user-space to retrieve the supported IOVA range(s), excluding any reserved regions. The implementation is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 96 + include/uapi/linux/vfio.h | 23 ++ 2 files changed, 119 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d59db31..90f195d 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1929,6 +1929,68 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) return ret; } +static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps, +struct vfio_iommu_type1_info_cap_iova_range *cap_iovas, +size_t size) +{ + struct vfio_info_cap_header *header; + struct vfio_iommu_type1_info_cap_iova_range *iova_cap; + + header = vfio_info_cap_add(caps, size, + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1); + if (IS_ERR(header)) + return PTR_ERR(header); + + iova_cap = container_of(header, + struct vfio_iommu_type1_info_cap_iova_range, header); + iova_cap->nr_iovas = cap_iovas->nr_iovas; + memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges, + cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges)); + return 0; +} + +static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu, + struct vfio_info_cap *caps) +{ + struct vfio_iommu_type1_info_cap_iova_range *cap_iovas; + struct vfio_iova *iova; + size_t size; + int iovas = 0, i = 0, ret; + + mutex_lock(>lock); + + list_for_each_entry(iova, >iova_list, list) + iovas++; + + if (!iovas) { + ret = -EINVAL; + goto out_unlock; + } + + size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges)); + + cap_iovas = kzalloc(size, GFP_KERNEL); + if (!cap_iovas) { + ret = -ENOMEM; + goto out_unlock; + } + + cap_iovas->nr_iovas = iovas; + + list_for_each_entry(iova, >iova_list, list) { + cap_iovas->iova_ranges[i].start = iova->start; + cap_iovas->iova_ranges[i].end = iova->end; + i++; + } + + ret = vfio_iommu_iova_add_cap(caps, cap_iovas, size); + + kfree(cap_iovas); +out_unlock: + mutex_unlock(>lock); + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -1950,19 +2012,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } } else if (cmd == VFIO_IOMMU_GET_INFO) { struct vfio_iommu_type1_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + unsigned long capsz; + int ret; minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes); + /* For backward compatibility, cannot require this */ + capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset); + if (copy_from_user(, (void __user *)arg, minsz)) return -EFAULT; if (info.argsz < minsz) return -EINVAL; + if (info.argsz >= capsz) { + minsz = capsz; + info.cap_offset = 0; /* output, no-recopy necessary */ + } + info.flags = VFIO_IOMMU_INFO_PGSIZES; info.iova_pgsizes = vfio_pgsize_bitmap(iommu); + ret = vfio_iommu_iova_build_caps(iommu, ); + if (ret) + return ret; + + if (caps.size) { + info.flags |= VFIO_IOMMU_INFO_CAPS; + + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + } else { + vfio_info_cap_shift(, sizeof(info)); + if (copy_to_user((void __user *)arg + + sizeof(info), caps.buf, + caps.size)) { + kfree(caps.buf); + return -EFAULT; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + return copy_to_user((void __user *)arg, , minsz) ? -EFAULT : 0; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index c743721..46b49e9
[PATCH v5 0/7] vfio/type1: Add support for valid iova list management
This series introduces an iova list associated with a vfio iommu. The list is kept updated taking care of iommu apertures, and reserved regions. Also this series adds checks for any conflict with existing dma mappings whenever a new device group is attached to the domain. User-space can retrieve valid iova ranges using VFIO_IOMMU_GET_INFO ioctl capability chains. Any dma map request outside the valid iova range will be rejected. v4 --> v5 Rebased to next-20180315. -Incorporated the corner case bug fix suggested by Alex to patch #5. -Based on suggestions by Alex and Robin, added patch#7. This moves the PCI window reservation back in to DMA specific path. This is to fix the issue reported by Eric[1]. Note: The patch #7 has dependency with [2][3] 1. https://patchwork.kernel.org/patch/10232043/ 2. https://patchwork.kernel.org/patch/10216553/ 3. https://patchwork.kernel.org/patch/10216555/ v3 --> v4 Addressed comments received for v3. -dma_addr_t instead of phys_addr_t -LIST_HEAD() usage. -Free up iova_copy list in case of error. -updated logic in filling the iova caps info(patch #5) RFCv2 --> v3 Removed RFC tag. Addressed comments from Alex and Eric: - Added comments to make iova list management logic more clear. - Use of iova list copy so that original is not altered in case of failure. RFCv1 --> RFCv2 Addressed comments from Alex: -Introduced IOVA list management and added checks for conflicts with existing dma map entries during attach/detach. Shameer Kolothum (2): vfio/type1: Add IOVA range capability support iommu/dma: Move PCI window region reservation back into dma specific path. Shameerali Kolothum Thodi (5): vfio/type1: Introduce iova list and add iommu aperture validity check vfio/type1: Check reserve region conflict and update iova list vfio/type1: Update iova list on detach vfio/type1: check dma map request is within a valid iova range vfio/type1: remove duplicate retrieval of reserved regions drivers/iommu/dma-iommu.c | 54 ++--- drivers/vfio/vfio_iommu_type1.c | 497 +++- include/uapi/linux/vfio.h | 23 ++ 3 files changed, 533 insertions(+), 41 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 3/7] vfio/type1: Update iova list on detach
Get a copy of iova list on _group_detach and try to update the list. On success replace the current one with the copy. Leave the list as it is if update fails. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 91 + 1 file changed, 91 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index cfe2bb2..25e6920 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1667,12 +1667,88 @@ static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu) WARN_ON(iommu->notifier.head); } +/* + * Called when a domain is removed in detach. It is possible that + * the removed domain decided the iova aperture window. Modify the + * iova aperture with the smallest window among existing domains. + */ +static void vfio_iommu_aper_expand(struct vfio_iommu *iommu, + struct list_head *iova_copy) +{ + struct vfio_domain *domain; + struct iommu_domain_geometry geo; + struct vfio_iova *node; + dma_addr_t start = 0; + dma_addr_t end = (dma_addr_t)~0; + + list_for_each_entry(domain, >domain_list, next) { + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, + ); + if (geo.aperture_start > start) + start = geo.aperture_start; + if (geo.aperture_end < end) + end = geo.aperture_end; + } + + /* Modify aperture limits. The new aper is either same or bigger */ + node = list_first_entry(iova_copy, struct vfio_iova, list); + node->start = start; + node = list_last_entry(iova_copy, struct vfio_iova, list); + node->end = end; +} + +/* + * Called when a group is detached. The reserved regions for that + * group can be part of valid iova now. But since reserved regions + * may be duplicated among groups, populate the iova valid regions + * list again. + */ +static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu, + struct list_head *iova_copy) +{ + struct vfio_domain *d; + struct vfio_group *g; + struct vfio_iova *node; + dma_addr_t start, end; + LIST_HEAD(resv_regions); + int ret; + + list_for_each_entry(d, >domain_list, next) { + list_for_each_entry(g, >group_list, next) + iommu_get_group_resv_regions(g->iommu_group, +_regions); + } + + if (list_empty(_regions)) + return 0; + + node = list_first_entry(iova_copy, struct vfio_iova, list); + start = node->start; + node = list_last_entry(iova_copy, struct vfio_iova, list); + end = node->end; + + /* purge the iova list and create new one */ + vfio_iommu_iova_free(iova_copy); + + ret = vfio_iommu_aper_resize(iova_copy, start, end); + if (ret) + goto done; + + /* Exclude current reserved regions from iova ranges */ + ret = vfio_iommu_resv_exclude(iova_copy, _regions); +done: + vfio_iommu_resv_free(_regions); + return ret; +} + static void vfio_iommu_type1_detach_group(void *iommu_data, struct iommu_group *iommu_group) { struct vfio_iommu *iommu = iommu_data; struct vfio_domain *domain; struct vfio_group *group; + bool iova_copy_fail; + LIST_HEAD(iova_copy); mutex_lock(>lock); @@ -1695,6 +1771,12 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, } } + /* +* Get a copy of iova list. If success, use copy to update the +* list and to replace the current one. +*/ + iova_copy_fail = !!vfio_iommu_iova_get_copy(iommu, _copy); + list_for_each_entry(domain, >domain_list, next) { group = find_iommu_group(domain, iommu_group); if (!group) @@ -1720,10 +1802,19 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, iommu_domain_free(domain->domain); list_del(>next); kfree(domain); + if (!iova_copy_fail && !list_empty(>domain_list)) + vfio_iommu_aper_expand(iommu, _copy); } break; } + if (!iova_copy_fail && !list_empty(>domain_list)) { + if (!vfio_iommu_resv_refresh(iommu, _copy)) + vfio_iommu_iova_insert_copy(iommu, _copy); + else + vfio_iommu_iova_free(_copy); + } + detach_group_done: mutex_unlock(>lock); } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org
[PATCH v5 1/7] vfio/type1: Introduce iova list and add iommu aperture validity check
This introduces an iova list that is valid for dma mappings. Make sure the new iommu aperture window doesn't conflict with the current one or with any existing dma mappings during attach. Signed-off-by: Shameer Kolothum--- drivers/vfio/vfio_iommu_type1.c | 183 +++- 1 file changed, 180 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 45657e2..1123c74 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -60,6 +60,7 @@ MODULE_PARM_DESC(disable_hugepages, struct vfio_iommu { struct list_headdomain_list; + struct list_headiova_list; struct vfio_domain *external_domain; /* domain for external user */ struct mutexlock; struct rb_root dma_list; @@ -92,6 +93,12 @@ struct vfio_group { struct list_headnext; }; +struct vfio_iova { + struct list_headlist; + dma_addr_t start; + dma_addr_t end; +}; + /* * Guest RAM pinning working set or DMA target */ @@ -1204,6 +1211,149 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base) return ret; } +/* + * This is a helper function to insert an address range to iova list. + * The list starts with a single entry corresponding to the IOMMU + * domain geometry to which the device group is attached. The list + * aperture gets modified when a new domain is added to the container + * if the new aperture doesn't conflict with the current one or with + * any existing dma mappings. The list is also modified to exclude + * any reserved regions associated with the device group. + */ +static int vfio_iommu_iova_insert(struct list_head *head, + dma_addr_t start, dma_addr_t end) +{ + struct vfio_iova *region; + + region = kmalloc(sizeof(*region), GFP_KERNEL); + if (!region) + return -ENOMEM; + + INIT_LIST_HEAD(>list); + region->start = start; + region->end = end; + + list_add_tail(>list, head); + return 0; +} + +/* + * Check the new iommu aperture conflicts with existing aper or with any + * existing dma mappings. + */ +static bool vfio_iommu_aper_conflict(struct vfio_iommu *iommu, +dma_addr_t start, dma_addr_t end) +{ + struct vfio_iova *first, *last; + struct list_head *iova = >iova_list; + + if (list_empty(iova)) + return false; + + /* Disjoint sets, return conflict */ + first = list_first_entry(iova, struct vfio_iova, list); + last = list_last_entry(iova, struct vfio_iova, list); + if ((start > last->end) || (end < first->start)) + return true; + + /* Check for any existing dma mappings outside the new start */ + if (start > first->start) { + if (vfio_find_dma(iommu, first->start, start - first->start)) + return true; + } + + /* Check for any existing dma mappings outside the new end */ + if (end < last->end) { + if (vfio_find_dma(iommu, end + 1, last->end - end)) + return true; + } + + return false; +} + +/* + * Resize iommu iova aperture window. This is called only if the new + * aperture has no conflict with existing aperture and dma mappings. + */ +static int vfio_iommu_aper_resize(struct list_head *iova, + dma_addr_t start, + dma_addr_t end) +{ + struct vfio_iova *node, *next; + + if (list_empty(iova)) + return vfio_iommu_iova_insert(iova, start, end); + + /* Adjust iova list start */ + list_for_each_entry_safe(node, next, iova, list) { + if (start < node->start) + break; + if ((start >= node->start) && (start < node->end)) { + node->start = start; + break; + } + /* Delete nodes before new start */ + list_del(>list); + kfree(node); + } + + /* Adjust iova list end */ + list_for_each_entry_safe(node, next, iova, list) { + if (end > node->end) + continue; + if ((end > node->start) && (end <= node->end)) { + node->end = end; + continue; + } + /* Delete nodes after new end */ + list_del(>list); + kfree(node); + } + + return 0; +} + +static void vfio_iommu_iova_free(struct list_head *iova) +{ + struct vfio_iova *n, *next; + + list_for_each_entry_safe(n, next, iova, list) { + list_del(>list); + kfree(n); + } +} + +static int
Re: [PATCH 07/14] iommu/amd_iommu: use dma_direct_{alloc,free}
On Wed, Mar 14, 2018 at 06:52:06PM +0100, Christoph Hellwig wrote: > This cleans up the code a lot by removing duplicate logic. > > Signed-off-by: Christoph Hellwig> --- > drivers/iommu/Kconfig | 1 + > drivers/iommu/amd_iommu.c | 68 > +++ > 2 files changed, 22 insertions(+), 47 deletions(-) Tested-by: Joerg Roedel Acked-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Mar 15, 2018 at 02:42:00PM +, Dmitry Safonov wrote: > But even with loop-limit we will need ratelimit each printk() *also*. > Otherwise loop-limit will be based on time spent printing, not on > anything else.. > The patch makes sense even with loop-limit in my opinion. Looks like I mis-read your patch, somehow it looked to me as if you replace all 'ratelimited' usages with a call to __ratelimit(), but you just move 'ratelimited' into the loop, which actually makes sense. But still, this alone is no proper fix for the soft-lockups you are seeing. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
On 2018-03-15 16:19:17 [+0100], Joerg Roedel wrote: > Okay, if the irq-layer does the needed locking, then we don't need > another lock here. There is the modify_irte_ga() path for the > virtualized irq routing into KVM guests, but there should be no KVM > guests running when setup the ioapic routing entries. okay then. Let me rebase the queue on top whatever you have now and then if the box still boots I will post them again. > Joerg Sebastian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
On Thu, Mar 15, 2018 at 03:15:41PM +0100, Sebastian Andrzej Siewior wrote: > ->set_allocated() operates only on 0…31 and other could be used at the > same time. However 0…31 should be accessed by other user before they are > ready. > > irq_remapping_alloc() is that ->alloc() callback invoked via > irq_domain_alloc_irqs_hierarchy() and each caller has to hold the > _domain_mutex mutex. So we should not have those in parallel. > > Is it possible to have those entries accessed before the setup is > complete? My understanding is that this setup is performed once during > boot (especially that ioapic part) and not again. > > From looking at that hunk, it should not hurt to add that lock, just > wanted to check it is really needed. Okay, if the irq-layer does the needed locking, then we don't need another lock here. There is the modify_irte_ga() path for the virtualized irq routing into KVM guests, but there should be no KVM guests running when setup the ioapic routing entries. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/omap: Increase group ref in .device_group()
On Thu, Mar 01, 2018 at 07:22:08PM +0800, Jeffy Chen wrote: > Increase group refcounting in omap_iommu_device_group(). > > Signed-off-by: Jeffy ChenApplied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 14:34 +, Dmitry Safonov wrote: > On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote: > > On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > > > So, you suggest to remove ratelimit at all? > > > Do we really need printk flood for each happened fault? > > > Imagine, you've hundreds of mappings and then PCI link flapped.. > > > Wouldn't it be better to keep ratelimiting? > > > I don't mind, just it looks a bit strange to me. > > > > I never said you should remove the ratelimiting, after all you are > > trying to fix a soft-lockup, no? > > > > And that should not be fixed by changes to the ratelimiting, but > > with > > proper irq handling. > > Uh, I'm a bit confused then. > - Isn't it better to ratelimit each printk() instead of bunch of > printks inside irq handler? > - I can limit the number of loops, but the most of the time is spent > in > the loop on printk() (on my machine ~170msec per loop), while > everything else takes much lesser time (on my machine < 1 usec per > loop). So, if I will limit number of loops per-irq, that cycle-limit > will be based on limiting time spent on printk (e.g., how many > printks > to do in atomic context so that node will not lockup). It smells like > ratelimiting, no? > > I must be misunderstanding something, but why introducing another > limit > for number of printk() called when there is ratelimit which may be > tuned.. > So I agree, that maybe better to have another limit to the cycle *also*, because if we clean faults with the same speed as they're generated by hw, we may stuck in the loop.. By on my measures clearing fault is so fast (< 1 usec), that I'm not sure that it may happen with hw. By that reason I didn't introduce loop-limit. But even with loop-limit we will need ratelimit each printk() *also*. Otherwise loop-limit will be based on time spent printing, not on anything else.. The patch makes sense even with loop-limit in my opinion. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 15:22 +0100, Joerg Roedel wrote: > On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > > So, you suggest to remove ratelimit at all? > > Do we really need printk flood for each happened fault? > > Imagine, you've hundreds of mappings and then PCI link flapped.. > > Wouldn't it be better to keep ratelimiting? > > I don't mind, just it looks a bit strange to me. > > I never said you should remove the ratelimiting, after all you are > trying to fix a soft-lockup, no? > > And that should not be fixed by changes to the ratelimiting, but with > proper irq handling. Uh, I'm a bit confused then. - Isn't it better to ratelimit each printk() instead of bunch of printks inside irq handler? - I can limit the number of loops, but the most of the time is spent in the loop on printk() (on my machine ~170msec per loop), while everything else takes much lesser time (on my machine < 1 usec per loop). So, if I will limit number of loops per-irq, that cycle-limit will be based on limiting time spent on printk (e.g., how many printks to do in atomic context so that node will not lockup). It smells like ratelimiting, no? I must be misunderstanding something, but why introducing another limit for number of printk() called when there is ratelimit which may be tuned.. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Mar 15, 2018 at 02:13:03PM +, Dmitry Safonov wrote: > So, you suggest to remove ratelimit at all? > Do we really need printk flood for each happened fault? > Imagine, you've hundreds of mappings and then PCI link flapped.. > Wouldn't it be better to keep ratelimiting? > I don't mind, just it looks a bit strange to me. I never said you should remove the ratelimiting, after all you are trying to fix a soft-lockup, no? And that should not be fixed by changes to the ratelimiting, but with proper irq handling. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: Fix a potential memory leak
On Sat, Feb 24, 2018 at 01:42:27PM +0800, Lu Baolu wrote: > A memory block was allocated in intel_svm_bind_mm() but never freed > in a failure path. This patch fixes this by free it to avoid memory > leakage. > > Cc: Ashok Raj> Cc: Jacob Pan > Cc: # v4.4+ > Signed-off-by: Lu Baolu Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, 2018-03-15 at 14:46 +0100, Joerg Roedel wrote: > On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote: > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index accf58388bdb..6c4ea32ee6a9 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void > > *dev_id) > > int reg, fault_index; > > u32 fault_status; > > unsigned long flag; > > - bool ratelimited; > > static DEFINE_RATELIMIT_STATE(rs, > > DEFAULT_RATELIMIT_INTERVAL, > > DEFAULT_RATELIMIT_BURST); > > > > - /* Disable printing, simply clear the fault when > > ratelimited */ > > - ratelimited = !__ratelimit(); > > - > > raw_spin_lock_irqsave(>register_lock, flag); > > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > > - if (fault_status && !ratelimited) > > + if (fault_status && __ratelimit()) > > pr_err("DRHD: handling fault status reg %x\n", > > fault_status); > > This looks aweful. Have you tried to limit the number of loops in > this > function and returning? You can handle the next faults by the next > interrupt. This ensures that the cpu visits a scheduling point from > time > to time so that you don't see soft-lockups. So, you suggest to remove ratelimit at all? Do we really need printk flood for each happened fault? Imagine, you've hundreds of mappings and then PCI link flapped.. Wouldn't it be better to keep ratelimiting? I don't mind, just it looks a bit strange to me. -- Thanks, Dmitry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/rockchip: Perform a reset on shutdown
On Tue, Feb 20, 2018 at 08:25:04PM +, Marc Zyngier wrote: > Trying to do a kexec whilst the iommus are still on is proving to be > a challenging exercise. It is terribly unsafe, as we're reusing the > memory allocated for the page tables, leading to a likely crash. > > Let's implement a shutdown method that will at least try to stop > DMA from going crazy behind our back. Note that we need to be > extra cautious when doing so, as the IOMMU may not be clocked > if controlled by a another master, as typical on Rockchip system. > > Suggested-by: Robin Murphy> Signed-off-by: Marc Zyngier Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 0/5] Add debugfs info for the AMD IOMMU
On Wed, Mar 14, 2018 at 06:04:44PM -0500, Gary R Hook wrote: > Gary R Hook (5): > iommu/amd - Add debugfs support > iommu/amd - Add a 'verbose' switch for IOMMU debugfs > iommu/amd - Add a README variable for the IOMMU debugfs > iommu/amd - Expose the active IOMMU device table entries > iommu/amd - Add a debugfs entry to specify a IOMMU device table entry Same problem here as with the Intel patches, I don't think it's a good idea to reveal internal iommu data structures to user-space in this way. I've debugged iommu issues for around 10 years now and never had the need for an interface that reveals those internals. How exactly are you planning to use this information? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/14] x86: use generic swiotlb_ops
On Thu, 15 Mar 2018, Christoph Hellwig wrote: > On Thu, Mar 15, 2018 at 01:52:14PM +0100, Thomas Gleixner wrote: > > Yeah, I know that the standard defines it, but that doesn't mean it makes > > sense. At least not to me. > > It makes sense in that it logically is a boolean but we only want > to allocate 1 bit for it, unlike the normal ABI allocations of at > least a byte. > > Either way, tell me what you want it changed to, and I'll do it. I'd prefer either bool or a regular bitfield, but I can live with the boolean bitfield as well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/intel: Ratelimit each dmar fault printing
On Thu, Feb 15, 2018 at 07:17:29PM +, Dmitry Safonov wrote: > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index accf58388bdb..6c4ea32ee6a9 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1618,17 +1618,13 @@ irqreturn_t dmar_fault(int irq, void *dev_id) > int reg, fault_index; > u32 fault_status; > unsigned long flag; > - bool ratelimited; > static DEFINE_RATELIMIT_STATE(rs, > DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > > - /* Disable printing, simply clear the fault when ratelimited */ > - ratelimited = !__ratelimit(); > - > raw_spin_lock_irqsave(>register_lock, flag); > fault_status = readl(iommu->reg + DMAR_FSTS_REG); > - if (fault_status && !ratelimited) > + if (fault_status && __ratelimit()) > pr_err("DRHD: handling fault status reg %x\n", fault_status); This looks aweful. Have you tried to limit the number of loops in this function and returning? You can handle the next faults by the next interrupt. This ensures that the cpu visits a scheduling point from time to time so that you don't see soft-lockups. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/14] x86: use dma-direct
On Thu, 15 Mar 2018, Christoph Hellwig wrote: > On Thu, Mar 15, 2018 at 01:53:52PM +0100, Thomas Gleixner wrote: > > > > > The generic dma-direct implementation is now functionally equivalent > > > > > to > > > > > the x86 nommu dma_map implementation, so switch over to using it. > > > > > > > > Can you please convert the various drivers first and then remove the > > > > unused code? > > > > > > Which various drivers? > > > > > > > > Note that the various iommu drivers are switched from > > > > > x86_dma_supported > > > > The various iommu drivers > > It doesn't really make any sense to switch them separately. They've > inherited the ops from x86 which used to override it on an arch level, > and we've now generalized the exactly same implementation to common code. Fair enough. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/14] x86: use dma-direct
On Thu, Mar 15, 2018 at 01:53:52PM +0100, Thomas Gleixner wrote: > > > > The generic dma-direct implementation is now functionally equivalent to > > > > the x86 nommu dma_map implementation, so switch over to using it. > > > > > > Can you please convert the various drivers first and then remove the > > > unused code? > > > > Which various drivers? > > > > > > Note that the various iommu drivers are switched from x86_dma_supported > > The various iommu drivers It doesn't really make any sense to switch them separately. They've inherited the ops from x86 which used to override it on an arch level, and we've now generalized the exactly same implementation to common code. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/14] x86: use generic swiotlb_ops
On Thu, Mar 15, 2018 at 01:52:14PM +0100, Thomas Gleixner wrote: > Yeah, I know that the standard defines it, but that doesn't mean it makes > sense. At least not to me. It makes sense in that it logically is a boolean but we only want to allocate 1 bit for it, unlike the normal ABI allocations of at least a byte. Either way, tell me what you want it changed to, and I'll do it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 0/5] Add Intel IOMMU debugfs support
On Thu, Feb 15, 2018 at 08:38:11AM -0800, Jacob Pan wrote: > Just wondering if your concern is on the implementation or the debugfs > idea in general. Perhaps have some common IOMMU debugfs? My concern mainly is that we add interfaces which reveal potentially security relevant information to user-space and that tools come up using it so that this also becomes kABI and we can't easily change it anymore and this whole stuff turns into a maintence nightmare. So that is definitly not something I'd like to see enabled in the distros, and its better to avoid it at all and search for better ways to debug upcoming issues. BPF tracers and tracing in general comes to mind here... Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: WARN_ON(irqs_disabled()) in dma_free_attrs?
On 15/03/18 07:58, Christoph Hellwig wrote: On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote: Looking back I don't really understand why we even indirect the "classic" per-device dma_declare_coherent_memory use case through the DMA API. It certainly makes sense for devices which can exist in both shared-memory and device-local-memory configurations, so the driver doesn't have to care about the details (particularly on mobile SoCs where the 'local' memory might just be a chunk of system RAM reserved by the bootloader, and it's just a matter of different use-cases on identical hardware). Well, the "classic" case for me is memory buffers in the device. Setting some memory aside, either in a global pool as now done for arm-nommu or even per-device as on some ARM SOCs is different indeed. As far as I can tell the few devices that use 'local' memory always use that. I was thinking mostly of GPUs, where the same IP core might be available in a discrete PCIe chip with its own GDDR controller and on-board local RAM, and also in an APU/SoC-type configuration reliant on the CPU memory bus. But I guess in practice those drivers are already well beyond the generic DMA API and into complicated explicitly-managed stuff like TTM. It seems like a pretty different use case to me. In the USB case we also have the following additional twist in that it doesn't even need the mapping to be coherent. I'm pretty sure it does (in the sense that it needs to ensure the arch code makes the mapping non-cacheable), otherwise I can't see how the bouncing could work properly. I think the last bit of the comment above hcd_alloc_coherent() is a bit misleading. Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync operations for it. Which would probably still be faster than non-cacheable mappings. Ah, I see, we crossed wires there - the *current* implementation definitely requires a coherent mapping, but in general you're right that there are other ways to solve this particular problem which wouldn't. This case really wants to be able to overload the streaming map/unmap calls to automatically bounce through device-local memory, but I'm sure that's not worth the complexity or effort in general :) So maybe for now the quick fix is to move the sleep check as suggested earlier in this thread, but in the long run we probably need to do some major rework of how dma_declare_coherent_memory and friends work. Maybe; I do think the specific hcd_alloc_coherent() case could still be fixed within the scope of the existing code, but it's not quite as clean and straightforward as I first thought, and the practical impact of tweaking the WARN should be effectively zero despite the theoretical edge cases it opens up. Do you want me to write it up as a proper patch? Yes. Including a proper comment on why the might_sleep is placed there. OK, will do. My mid-term plan was to actually remove the gfp flags argument from the dma alloc routines as is creates more confusion than it helps. I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK or similar flag instead then unfortunately. At least we already have DMA_ATTR_NO_WARN, which could be implemented consistently to clean up the existing __GFP_NOWARN users. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
On Thu, Mar 15, 2018 at 02:01:53PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote: > > On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote: > > > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain > > > *domain, unsigned int virq, > > > return ret; > > > > > > if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) { > > > - if (get_irq_table(devid, true)) > > > + struct irq_remap_table *table; > > > + struct amd_iommu *iommu; > > > + > > > + table = get_irq_table(devid); > > > + if (table) { > > > + if (!table->min_index) { > > > + /* > > > + * Keep the first 32 indexes free for IOAPIC > > > + * interrupts. > > > + */ > > > + table->min_index = 32; > > > + iommu = amd_iommu_rlookup_table[devid]; > > > + for (i = 0; i < 32; ++i) > > > + iommu->irte_ops->set_allocated(table, > > > i); > > > + } > > > > I think this needs to be protected by the table->lock. > > Which part any why? The !->min_index part plus extending(setting to 32)? In particular the set_allocated part, when get_irq_table() returns the table is visible to other users as well. I have not checked the irq-layer locking to be sure that can happen, though. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
On 2018-03-15 13:53:42 [+0100], Joerg Roedel wrote: > On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote: > > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain > > *domain, unsigned int virq, > > return ret; > > > > if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) { > > - if (get_irq_table(devid, true)) > > + struct irq_remap_table *table; > > + struct amd_iommu *iommu; > > + > > + table = get_irq_table(devid); > > + if (table) { > > + if (!table->min_index) { > > + /* > > +* Keep the first 32 indexes free for IOAPIC > > +* interrupts. > > +*/ > > + table->min_index = 32; > > + iommu = amd_iommu_rlookup_table[devid]; > > + for (i = 0; i < 32; ++i) > > + iommu->irte_ops->set_allocated(table, > > i); > > + } > > I think this needs to be protected by the table->lock. Which part any why? The !->min_index part plus extending(setting to 32)? Sebastian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: iommu/amd: lock splitting & GFP_KERNEL allocation
Hi Sebastian, On Fri, Feb 23, 2018 at 11:27:26PM +0100, Sebastian Andrzej Siewior wrote: > I have no idea why but suddenly my A10 box complained loudly about > locking and memory allocations within the iommu code under RT. Looking > at the code it has been like this for a longer time so the iommu must > have appeared recently (well there was a bios upgrade due to other > issues so it might have enabled it). > > The goal here is to make the memory allocation in get_irq_table() not > with disabled interrupts and having as little raw_spin_lock as possible > while having them if the caller is also holding one (like desc->lock > during IRQ-affinity changes). > > The patches were boot tested on an AMD EPYC 7601. Thanks for these patches, I really like the cleanups and the improved locking. Please rebase against x86/amd in the iommu branch and address the other comment I have, then I put them into my tree. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/10] iommu/amd: drop the lock while allocating new irq remap table
On Fri, Feb 23, 2018 at 11:27:34PM +0100, Sebastian Andrzej Siewior wrote: > The irq_remap_table is allocated while the iommu_table_lock is held with > interrupts disabled. While this works it makes RT scream very loudly. > >From looking at the call sites, all callers are in the early device > initialisation (apic_bsp_setup(), pci_enable_device(), > pci_enable_msi()) so make sense to drop the lock which also enables > interrupts and try to allocate that memory with GFP_KERNEL instead > GFP_ATOMIC. > > Since during the allocation the iommu_table_lock is dropped, we need to > recheck if table exists after the lock has been reacquired. I *think* > that it is impossible that the "devid" entry appears in irq_lookup_table > while the lock is dropped since the same device can only be probed once. > It is more likely that another device added an `alias' entry. However I > check for both cases, just to be sure. > > Signed-off-by: Sebastian Andrzej SiewiorThis looks like it conflicts with the x86/amd branch as well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/10] iommu/amd: split irq_lookup_table out of the amd_iommu_devtable_lock
On Fri, Feb 23, 2018 at 11:27:30PM +0100, Sebastian Andrzej Siewior wrote: > The function get_irq_table() reads/writes irq_lookup_table while holding > the amd_iommu_devtable_lock. It also modifies > amd_iommu_dev_table[].data[2]. > set_dte_entry() is using amd_iommu_dev_table[].data[0|1] (under the > domain->lock) so it should be okay. The access to the iommu is > serialized with its own (iommu's) lock. > > So split out get_irq_table() out of amd_iommu_devtable_lock's lock. The > new lock is a raw_spin_lock because modify_irte_ga() is called while > desc->lock is held (which is raw). > > Signed-off-by: Sebastian Andrzej SiewiorThis patch likely conflicts with changes already in the iommu-tree, as it contains a patch splitting the get_irq_table() function. Can you rebase your series against the x86/amd branch? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/14] x86: use dma-direct
On Thu, 15 Mar 2018, Christoph Hellwig wrote: > On Thu, Mar 15, 2018 at 09:56:13AM +0100, Thomas Gleixner wrote: > > On Wed, 14 Mar 2018, Christoph Hellwig wrote: > > > > > The generic dma-direct implementation is now functionally equivalent to > > > the x86 nommu dma_map implementation, so switch over to using it. > > > > Can you please convert the various drivers first and then remove the > > unused code? > > Which various drivers? > > > > Note that the various iommu drivers are switched from x86_dma_supported The various iommu drivers > > > to dma_direct_supported to provide identical functionality, although the > > > checks looks fairly questionable for at least some of them. > > > > Can you please elaborate? From the above it's not clear which checks you > > are referring to. If you convert these drivers seperately then explicit > > information about your concerns wants to be in the changelogs. > > This bit: > > /* Copied from i386. Doesn't make much sense, because it will > only work for pci_alloc_coherent. > The caller just has to use GFP_DMA in this case. */ > if (mask < DMA_BIT_MASK(24)) > return 0; > > in x86_dma_supported, or the equivalent bit in dma_direct_supported. > Kept for bug to bug compatibility, but I guess I should reword or > just drop the changelog bit іf it causes confusion. Reword please. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/10] iommu/amd: remove the special case from get_irq_table()
On Fri, Feb 23, 2018 at 11:27:31PM +0100, Sebastian Andrzej Siewior wrote: > @@ -4103,10 +4093,26 @@ static int irq_remapping_alloc(struct irq_domain > *domain, unsigned int virq, > return ret; > > if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) { > - if (get_irq_table(devid, true)) > + struct irq_remap_table *table; > + struct amd_iommu *iommu; > + > + table = get_irq_table(devid); > + if (table) { > + if (!table->min_index) { > + /* > + * Keep the first 32 indexes free for IOAPIC > + * interrupts. > + */ > + table->min_index = 32; > + iommu = amd_iommu_rlookup_table[devid]; > + for (i = 0; i < 32; ++i) > + iommu->irte_ops->set_allocated(table, > i); > + } I think this needs to be protected by the table->lock. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/14] x86: use generic swiotlb_ops
On Thu, 15 Mar 2018, Christoph Hellwig wrote: > On Thu, Mar 15, 2018 at 10:00:57AM +0100, Thomas Gleixner wrote: > > On Wed, 14 Mar 2018, Christoph Hellwig wrote: > > > #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU) > > > void *iommu; /* hook for IOMMU specific extension */ > > > #endif > > > +#ifdef CONFIG_STA2X11 > > > + bool is_sta2x11 : 1; > > > > Huch? Please use either bool or an unsigned int based bitfield. A boolean > > bitfield doesn't make sense. > > bool bitfields are perfectly valid in C99 and used in various places in > the kernel. But if you want either bool or an unsigned bitfield let me > know and I'll switch it over. Yeah, I know that the standard defines it, but that doesn't mean it makes sense. At least not to me. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] iommu/amd: Add support for fast IOTLB flushing
Hi Suravee, On Mon, Mar 05, 2018 at 08:08:21AM +0700, Suravee Suthikulpanit wrote: > Ping.. > > Joerg, when you get a chance, would you please let me know if you have any > other concerns for this v4. Sorry for the long delay. The patch is fine, I applied it to the iommu-tree. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 2/5] dt-bindings: brcm: Add reserved-dma-region for iPROC
On Thu, Mar 15, 2018 at 12:15 AM, Robin Murphywrote: > On 12/03/18 07:03, Jitendra Bhivare wrote: >> >> On Tue, Mar 6, 2018 at 5:12 PM, Robin Murphy wrote: >>> >>> On 06/03/18 04:59, Jitendra Bhivare wrote: With SoC wide DMA mask of 40-bit, the mappings for entire IOVA space can't be specified in the PAXBv2 PCIe RC of SoC. The holes in IOVA space needs to be reserved to prevent any IOVA allocations in those spaces. >>> >>> >>> >>> Can you clarify why? If this is the PCI inbound window thing again, let >>> me >>> say once again that "dma-ranges" is the appropriate way for DT to >>> describe >>> the hardware. >>> >>> Robin. >> >> dma-ranges = < \ >> 0x4300 0x00 0x 0x00 0x8000 0x00 0x8000 \ >> 0x4300 0x08 0x 0x08 0x 0x08 0x \ >> 0x4300 0x80 0x 0x80 0x 0x80 0x> >> >> Yes, its for PCI inbound windows. In our HW, they are limited by sizes >> specified in >> ARM memory maps which was done for non-IOMMU cases to catch any transfer >> outside the ranges. >> dma-ranges are already being used to program these inbound windows and SoC >> wide DMA mask is already specified but IOMMU code can still allocate IOVAs >> in the gaps for which translation will fail in PCIe RC. > > > Right, so make iommu-dma reserve the gaps. No need to clutter up the DT with > redundant information which gets handled pretty much identically anyway. > > Robin. There was a mistake in pasting dma-ranges mentioned. This is what we use it in IOMMU case. #define PCIE_DMA_RANGES dma-ranges = < \ 0x4300 0x00 0x 0x00 0x 0x04 0x \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x80 0x> #define PCIE_RESERVED_DMA_REGION reserved-dma-region = < \ 0x0 0x0 0x04 0x0 0x04 0x0 \ 0x0 0x0 0x10 0x0 0x70 0x0> This is non-iommu case which is per ARM memory map. #define PCIE_DMA_RANGES dma-ranges = < \ 0x4300 0x00 0x8000 0x00 0x8000 0x00 0x8000 \ 0x4300 0x08 0x 0x08 0x 0x08 0x \ 0x4300 0x80 0x 0x80 0x 0x40 0x> In IOMMU case, we had to increase first in-bound window i.e. dma-ranges first entry from 2GB-4GB (in non-iommu case) to 0-16GB because out-bound window specified from 2GB-4GB implicitly gets reserved by the PCI iommu code. This allowed IOVA allocations from 0-2GB and 4-16GB mapped through in-bound windows. Basically, my point is that dma-ranges specified is being used to program our in-bound windows and defines PCI space to CPU address space supported mappings of HW. In the same way, it would be better to explicitly specify reserved-dma-region to clarify this as HW hole than implicity reserving the regions. We can at least have single property 'reserved-dma-window' for MSI region and DMA regions to help describe the holes in the HW than let SW compute it. >>> >>> reserved-dma-region property is added to specify the ranges which should never be mapped and given to devices sitting behind. Reviewed-by: Ray Jui Reviewed-by: Vikram Prakash Reviewed-by: Scott Branden Signed-off-by: Jitendra Bhivare --- Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt index b8e48b4..3be0fe3 100644 --- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt +++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt @@ -30,6 +30,9 @@ Optional properties: - dma-ranges: Some PAXB-based root complexes do not have inbound mapping done by the ASIC after power on reset. In this case, SW is required to configure the mapping, based on inbound memory regions specified by this property. +- reserved-dma-region: PAXBv2 with IOMMU enabled cannot provide mappings for + entire IOVA space specified by DMA mask. Hence this is used to reserve the + gaps in dma-ranges. - brcm,pcie-ob: Some iProc SoCs do not have the outbound address mapping done by the ASIC after power on reset. In this case, SW needs to configure it >>> > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/14] x86: use generic swiotlb_ops
On Thu, Mar 15, 2018 at 10:00:57AM +0100, Thomas Gleixner wrote: > On Wed, 14 Mar 2018, Christoph Hellwig wrote: > > #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU) > > void *iommu; /* hook for IOMMU specific extension */ > > #endif > > +#ifdef CONFIG_STA2X11 > > + bool is_sta2x11 : 1; > > Huch? Please use either bool or an unsigned int based bitfield. A boolean > bitfield doesn't make sense. bool bitfields are perfectly valid in C99 and used in various places in the kernel. But if you want either bool or an unsigned bitfield let me know and I'll switch it over. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/14] x86: use dma-direct
On Thu, Mar 15, 2018 at 09:56:13AM +0100, Thomas Gleixner wrote: > On Wed, 14 Mar 2018, Christoph Hellwig wrote: > > > The generic dma-direct implementation is now functionally equivalent to > > the x86 nommu dma_map implementation, so switch over to using it. > > Can you please convert the various drivers first and then remove the > unused code? Which various drivers? > > Note that the various iommu drivers are switched from x86_dma_supported > > to dma_direct_supported to provide identical functionality, although the > > checks looks fairly questionable for at least some of them. > > Can you please elaborate? From the above it's not clear which checks you > are referring to. If you convert these drivers seperately then explicit > information about your concerns wants to be in the changelogs. This bit: /* Copied from i386. Doesn't make much sense, because it will only work for pci_alloc_coherent. The caller just has to use GFP_DMA in this case. */ if (mask < DMA_BIT_MASK(24)) return 0; in x86_dma_supported, or the equivalent bit in dma_direct_supported. Kept for bug to bug compatibility, but I guess I should reword or just drop the changelog bit іf it causes confusion. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu
On 15/03/18 08:57, Vivek Gautam wrote: Hi Robin, On Wed, Mar 14, 2018 at 11:20 PM, Robin Murphywrote: On 13/03/18 08:55, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa --- drivers/iommu/arm-smmu.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 56a04ae80bf3..64953ff2281f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(>iommu, dev); + if (pm_runtime_enabled(smmu->dev)) { + struct device_link *link; + + /* +* Establish the link between smmu and master, so that the +* smmu gets runtime enabled/disabled as per the master's +* needs. +*/ + link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME); + if (!link) { FWIW, given that we don't really care about link itself, I'd be quite happy to simplify that lot down to: if (pm_runtime_enabled(smmu_dev) && !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) { Sure, will update this. + dev_warn(smmu->dev, +"Unable to add link to the consumer %s\n", +dev_name(dev)); (side note: since device_link_add() already prints a message on success, maybe it could print its own failure message too?) Should we make device_link that verbose - to print failure messages at each step (there are atleast a couple where we return link as NULL), or we can let the users handle printing the message? I didn't mean to imply anything more than the idea below (although the whole function could of course be refactored further to report explicit error values). It just seems a bit unbalanced for the core code to be noisy about success yet silent about failure, but ultimately that's an entirely separate issue which doesn't have to have any bearing on this series. Robin. ->8- diff --git a/drivers/base/core.c b/drivers/base/core.c index b2261f92f2f1..895da95f5cb9 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -275,6 +275,9 @@ struct device_link *device_link_add(struct device *consumer, out: device_pm_unlock(); device_links_write_unlock(); + if (!link) + dev_warn(consumer, "Failed to link to supplier %s\n", dev_name(supplier)); + return link; } EXPORT_SYMBOL_GPL(device_link_add); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu
On 15/03/18 06:18, Tomasz Figa wrote: On Thu, Mar 15, 2018 at 2:50 AM, Robin Murphywrote: On 13/03/18 08:55, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa --- drivers/iommu/arm-smmu.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 56a04ae80bf3..64953ff2281f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(>iommu, dev); + if (pm_runtime_enabled(smmu->dev)) { + struct device_link *link; + + /* +* Establish the link between smmu and master, so that the +* smmu gets runtime enabled/disabled as per the master's +* needs. +*/ + link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME); + if (!link) { FWIW, given that we don't really care about link itself, I'd be quite happy to simplify that lot down to: if (pm_runtime_enabled(smmu_dev) && !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) { + dev_warn(smmu->dev, +"Unable to add link to the consumer %s\n", +dev_name(dev)); (side note: since device_link_add() already prints a message on success, maybe it could print its own failure message too?) I think we care whether adding the link succeeded. If it fails to be added, we might end up with a complete system lockup on a system with power domains. Well, yeah, that was implicit - the point is that we *only* care about whether it succeeded or not. Thus we may as well just check for NULL directly instead of assigning the value as if we were actually going to do anything with it. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/14] x86: use generic swiotlb_ops
On Wed, 14 Mar 2018, Christoph Hellwig wrote: > #if defined(CONFIG_INTEL_IOMMU) || defined(CONFIG_AMD_IOMMU) > void *iommu; /* hook for IOMMU specific extension */ > #endif > +#ifdef CONFIG_STA2X11 > + bool is_sta2x11 : 1; Huch? Please use either bool or an unsigned int based bitfield. A boolean bitfield doesn't make sense. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu
Hi Robin, On Wed, Mar 14, 2018 at 11:20 PM, Robin Murphywrote: > On 13/03/18 08:55, Vivek Gautam wrote: >> >> From: Sricharan R >> >> Finally add the device link between the master device and >> smmu, so that the smmu gets runtime enabled/disabled only when the >> master needs it. This is done from add_device callback which gets >> called once when the master is added to the smmu. >> >> Signed-off-by: Sricharan R >> Signed-off-by: Vivek Gautam >> Reviewed-by: Tomasz Figa >> --- >> drivers/iommu/arm-smmu.c | 29 + >> 1 file changed, 29 insertions(+) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index 56a04ae80bf3..64953ff2281f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev) >> iommu_device_link(>iommu, dev); >> + if (pm_runtime_enabled(smmu->dev)) { >> + struct device_link *link; >> + >> + /* >> +* Establish the link between smmu and master, so that the >> +* smmu gets runtime enabled/disabled as per the master's >> +* needs. >> +*/ >> + link = device_link_add(dev, smmu->dev, >> DL_FLAG_PM_RUNTIME); >> + if (!link) { > > > FWIW, given that we don't really care about link itself, I'd be quite happy > to simplify that lot down to: > > if (pm_runtime_enabled(smmu_dev) && > !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) { Sure, will update this. > >> + dev_warn(smmu->dev, >> +"Unable to add link to the consumer >> %s\n", >> +dev_name(dev)); > > > (side note: since device_link_add() already prints a message on success, > maybe it could print its own failure message too?) Should we make device_link that verbose - to print failure messages at each step (there are atleast a couple where we return link as NULL), or we can let the users handle printing the message? regards Vivek > > Robin. > > >> + ret = -ENODEV; >> + goto out_unlink; >> + } >> + } >> + >> arm_smmu_rpm_put(smmu); >> return 0; >> +out_unlink: >> + iommu_device_unlink(>iommu, dev); >> + arm_smmu_master_free_smes(fwspec); >> out_rpm_put: >> arm_smmu_rpm_put(smmu); >> out_cfg_free: >> @@ -1486,6 +1507,14 @@ static void arm_smmu_remove_device(struct device >> *dev) >> cfg = fwspec->iommu_priv; >> smmu = cfg->smmu; >> + if (pm_runtime_enabled(smmu->dev)) { >> + struct device_link *link; >> + >> + link = device_link_find(dev, smmu->dev); >> + if (link) >> + device_link_del(link); >> + } >> + >> ret = arm_smmu_rpm_get(smmu); >> if (ret < 0) >> return; >> > -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/14] x86: use dma-direct
On Wed, 14 Mar 2018, Christoph Hellwig wrote: > The generic dma-direct implementation is now functionally equivalent to > the x86 nommu dma_map implementation, so switch over to using it. Can you please convert the various drivers first and then remove the unused code? > Note that the various iommu drivers are switched from x86_dma_supported > to dma_direct_supported to provide identical functionality, although the > checks looks fairly questionable for at least some of them. Can you please elaborate? From the above it's not clear which checks you are referring to. If you convert these drivers seperately then explicit information about your concerns wants to be in the changelogs. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: WARN_ON(irqs_disabled()) in dma_free_attrs?
On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote: >> Looking back I don't really understand why we even indirect the "classic" >> per-device dma_declare_coherent_memory use case through the DMA API. > > It certainly makes sense for devices which can exist in both shared-memory > and device-local-memory configurations, so the driver doesn't have to care > about the details (particularly on mobile SoCs where the 'local' memory > might just be a chunk of system RAM reserved by the bootloader, and it's > just a matter of different use-cases on identical hardware). Well, the "classic" case for me is memory buffers in the device. Setting some memory aside, either in a global pool as now done for arm-nommu or even per-device as on some ARM SOCs is different indeed. As far as I can tell the few devices that use 'local' memory always use that. >> It seems like a pretty different use case to me. In the USB case we >> also have the following additional twist in that it doesn't even need >> the mapping to be coherent. > > I'm pretty sure it does (in the sense that it needs to ensure the arch code > makes the mapping non-cacheable), otherwise I can't see how the bouncing > could work properly. I think the last bit of the comment above > hcd_alloc_coherent() is a bit misleading. Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync operations for it. Which would probably still be faster than non-cacheable mappings. >> So maybe for now the quick fix is to move the sleep check as suggested >> earlier in this thread, but in the long run we probably need to do some >> major rework of how dma_declare_coherent_memory and friends work. > > Maybe; I do think the specific hcd_alloc_coherent() case could still be > fixed within the scope of the existing code, but it's not quite as clean > and straightforward as I first thought, and the practical impact of > tweaking the WARN should be effectively zero despite the theoretical edge > cases it opens up. Do you want me to write it up as a proper patch? Yes. Including a proper comment on why the might_sleep is placed there. My mid-term plan was to actually remove the gfp flags argument from the dma alloc routines as is creates more confusion than it helps. I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK or similar flag instead then unfortunately. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Mar 15, 2018 at 2:46 AM, Robin Murphywrote: > On 13/03/18 08:55, Vivek Gautam wrote: >> >> From: Sricharan R >> >> The smmu device probe/remove and add/remove master device callbacks >> gets called when the smmu is not linked to its master, that is without >> the context of the master device. So calling runtime apis in those places >> separately. >> >> Signed-off-by: Sricharan R >> [vivek: Cleanup pm runtime calls] >> Signed-off-by: Vivek Gautam >> Reviewed-by: Tomasz Figa >> --- >> drivers/iommu/arm-smmu.c | 95 >> >> 1 file changed, 87 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index d5873d545024..56a04ae80bf3 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] >> = { >> { 0, NULL}, >> }; >> +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) >> +{ >> + if (pm_runtime_enabled(smmu->dev)) >> + return pm_runtime_get_sync(smmu->dev); >> + >> + return 0; >> +} >> + >> +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) >> +{ >> + if (pm_runtime_enabled(smmu->dev)) >> + pm_runtime_put(smmu->dev); >> +} >> + >> static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) >> { >> return container_of(dom, struct arm_smmu_domain, domain); >> @@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct >> iommu_domain *domain) >> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> struct arm_smmu_device *smmu = smmu_domain->smmu; >> struct arm_smmu_cfg *cfg = _domain->cfg; >> - int irq; >> + int ret, irq; >> if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) >> return; >> + ret = arm_smmu_rpm_get(smmu); >> + if (ret < 0) >> + return; >> + >> /* >> * Disable the context bank and free the page tables before >> freeing >> * it. >> @@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct >> iommu_domain *domain) >> free_io_pgtable_ops(smmu_domain->pgtbl_ops); >> __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); >> + >> + arm_smmu_rpm_put(smmu); >> } >> static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) >> @@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain >> *domain, struct device *dev) >> return -ENODEV; >> smmu = fwspec_smmu(fwspec); >> + >> + ret = arm_smmu_rpm_get(smmu); >> + if (ret < 0) >> + return ret; >> + >> /* Ensure that the domain is finalised */ >> ret = arm_smmu_init_domain_context(domain, smmu); >> if (ret < 0) >> - return ret; >> + goto rpm_put; >> /* >> * Sanity check the domain. We don't support domains across >> @@ -1230,29 +1255,47 @@ static int arm_smmu_attach_dev(struct iommu_domain >> *domain, struct device *dev) >> } >> /* Looks ok, so add the device to the domain */ >> - return arm_smmu_domain_add_master(smmu_domain, fwspec); >> + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); >> + >> +rpm_put: >> + arm_smmu_rpm_put(smmu); >> + return ret; >> } >> static int arm_smmu_map(struct iommu_domain *domain, unsigned long >> iova, >> phys_addr_t paddr, size_t size, int prot) >> { >> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; >> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> + struct arm_smmu_device *smmu = smmu_domain->smmu; > > > Nit: please use arm_smmu_domain for ops as well (as it was before > 523d7423e21b), or consistently elide it for smmu - the mixture of both > methods is just a horrible mess (here and in unmap). > > >> + int ret; >> if (!ops) >> return -ENODEV; >> - return ops->map(ops, iova, paddr, size, prot); >> + arm_smmu_rpm_get(smmu); >> + ret = ops->map(ops, iova, paddr, size, prot); >> + arm_smmu_rpm_put(smmu); >> + >> + return ret; >> } >> static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned >> long iova, >> size_t size) >> { >> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; >> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> + struct arm_smmu_device *smmu = smmu_domain->smmu; >> + size_t ret; >> if (!ops) >> return 0; >> - return ops->unmap(ops, iova, size); >> + arm_smmu_rpm_get(smmu); >> + ret = ops->unmap(ops, iova, size); >> +
Re: [PATCH v9 4/5] iommu/arm-smmu: Add the device_link between masters and smmu
On Thu, Mar 15, 2018 at 2:50 AM, Robin Murphywrote: > On 13/03/18 08:55, Vivek Gautam wrote: >> >> From: Sricharan R >> >> Finally add the device link between the master device and >> smmu, so that the smmu gets runtime enabled/disabled only when the >> master needs it. This is done from add_device callback which gets >> called once when the master is added to the smmu. >> >> Signed-off-by: Sricharan R >> Signed-off-by: Vivek Gautam >> Reviewed-by: Tomasz Figa >> --- >> drivers/iommu/arm-smmu.c | 29 + >> 1 file changed, 29 insertions(+) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index 56a04ae80bf3..64953ff2281f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -1460,10 +1460,31 @@ static int arm_smmu_add_device(struct device *dev) >> iommu_device_link(>iommu, dev); >> + if (pm_runtime_enabled(smmu->dev)) { >> + struct device_link *link; >> + >> + /* >> +* Establish the link between smmu and master, so that the >> +* smmu gets runtime enabled/disabled as per the master's >> +* needs. >> +*/ >> + link = device_link_add(dev, smmu->dev, >> DL_FLAG_PM_RUNTIME); >> + if (!link) { > > > FWIW, given that we don't really care about link itself, I'd be quite happy > to simplify that lot down to: > > if (pm_runtime_enabled(smmu_dev) && > !device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME)) { > >> + dev_warn(smmu->dev, >> +"Unable to add link to the consumer >> %s\n", >> +dev_name(dev)); > > > (side note: since device_link_add() already prints a message on success, > maybe it could print its own failure message too?) I think we care whether adding the link succeeded. If it fails to be added, we might end up with a complete system lockup on a system with power domains. Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu