Re: [PATCH v2 17/19] iommu: Add max num of cache and granu types
Hi Jacob, On 4/29/19 6:17 PM, Jacob Pan wrote: > On Fri, 26 Apr 2019 18:22:46 +0200 > Auger Eric wrote: > >> Hi Jacob, >> >> On 4/24/19 1:31 AM, Jacob Pan wrote: >>> To convert to/from cache types and granularities between generic and >>> VT-d specific counterparts, a 2D arrary is used. Introduce the >>> limits >> array >>> to help define the converstion array size. >> conversion >>> > will fix, thanks >>> Signed-off-by: Jacob Pan >>> --- >>> include/uapi/linux/iommu.h | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h >>> index 5c95905..2d8fac8 100644 >>> --- a/include/uapi/linux/iommu.h >>> +++ b/include/uapi/linux/iommu.h >>> @@ -197,6 +197,7 @@ struct iommu_inv_addr_info { >>> __u64 granule_size; >>> __u64 nb_granules; >>> }; >>> +#define NR_IOMMU_CACHE_INVAL_GRANU (3) >>> >>> /** >>> * First level/stage invalidation information >>> @@ -235,6 +236,7 @@ struct iommu_cache_invalidate_info { >>> struct iommu_inv_addr_info addr_info; >>> }; >>> }; >>> +#define NR_IOMMU_CACHE_TYPE(3) >>> /** >>> * struct gpasid_bind_data - Information about device and guest >>> PASID binding >>> * @gcr3: Guest CR3 value from guest mm >>> >> Is it really something that needs to be exposed in the uapi? >> > I put it in uapi since the related definitions for granularity and > cache type are in the same file. > Maybe putting them close together like this? I was thinking you can just > fold it into your next series as one patch for introducing cache > invalidation. > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 2d8fac8..4ff6929 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -164,6 +164,7 @@ enum iommu_inv_granularity { > IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */ > IOMMU_INV_GRANU_PASID, /* pasid-selective invalidation */ > IOMMU_INV_GRANU_ADDR, /* page-selective invalidation */ > + NR_IOMMU_INVAL_GRANU, /* number of invalidation granularities > */ }; > > /** > @@ -228,6 +229,7 @@ struct iommu_cache_invalidate_info { > #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ > #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ > #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ > +#define NR_IOMMU_CACHE_TYPE(3) OK I will add this. Thanks Eric > __u8cache; > __u8granularity; > >> Thanks >> >> Eric > > [Jacob Pan] > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 18/19] iommu/vt-d: Support flushing more translation cache types
Hi Jacob, On 4/29/19 11:29 PM, Jacob Pan wrote: > On Sat, 27 Apr 2019 11:04:04 +0200 > Auger Eric wrote: > >> Hi Jacob, >> >> On 4/24/19 1:31 AM, Jacob Pan wrote: >>> When Shared Virtual Memory is exposed to a guest via vIOMMU, >>> extended IOTLB invalidation may be passed down from outside IOMMU >>> subsystems. This patch adds invalidation functions that can be used >>> for additional translation cache types. >>> >>> Signed-off-by: Jacob Pan >>> --- >>> drivers/iommu/dmar.c| 48 >>> + >>> include/linux/intel-iommu.h | 21 2 files >>> changed, 65 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c >>> index 9c49300..680894e 100644 >>> --- a/drivers/iommu/dmar.c >>> +++ b/drivers/iommu/dmar.c >>> @@ -1357,6 +1357,20 @@ void qi_flush_iotlb(struct intel_iommu >>> *iommu, u16 did, u64 addr, qi_submit_sync(, iommu); >>> } >>> >> /* PASID-based IOTLB Invalidate */ >>> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, >>> u32 pasid, >>> + unsigned int size_order, u64 granu) >>> +{ >>> + struct qi_desc desc; >>> + >>> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | >>> + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE; >>> + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(0) | >>> + QI_EIOTLB_AM(size_order); >> I see IH it hardcoded to 0. Don't you envision to cascade the IH. On >> ARM this was needed for perf sake. > Right, we should cascade IH based on IOMMU_INV_ADDR_FLAGS_LEAF. Just > curious how do you deduce the IH information on ARM? I guess you need > to get the non-leaf page directory info? > I will add an argument for IH. On ARM we have the "Leaf" field in the stage1 TLB invalidation command. "When Leaf==1, only cached entries for the last level of translation table walk are required to be invalidated". Thanks Eric >>> + desc.qw2 = 0; >>> + desc.qw3 = 0; >>> + qi_submit_sync(, iommu); >>> +} >>> + >>> void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 >>> pfsid, u16 qdep, u64 addr, unsigned mask) >>> { >>> @@ -1380,6 +1394,40 @@ void qi_flush_dev_iotlb(struct intel_iommu >>> *iommu, u16 sid, u16 pfsid, qi_submit_sync(, iommu); >>> } >>> >> /* Pasid-based Device-TLB Invalidation */ > yes, better to explain piotlb :), same for iotlb. >>> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 >>> pfsid, >>> + u32 pasid, u16 qdep, u64 addr, unsigned size, u64 >>> granu) +{ >>> + struct qi_desc desc; >>> + >>> + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | >>> QI_DEV_EIOTLB_SID(sid) | >>> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | >>> + QI_DEV_IOTLB_PFSID(pfsid); >>> + desc.qw1 |= QI_DEV_EIOTLB_GLOB(granu); > should be desc.qw1 = >>> + >>> + /* If S bit is 0, we only flush a single page. If S bit is >>> set, >>> +* The least significant zero bit indicates the size. VT-d >>> spec >>> +* 6.5.2.6 >>> +*/ >>> + if (!size) >>> + desc.qw0 = QI_DEV_EIOTLB_ADDR(addr) & >>> ~QI_DEV_EIOTLB_SIZE; >> desc.q1 |= ? > Right, I also missed previous qw1 assignment. >>> + else { >>> + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + >>> size); + >>> + desc.qw1 = QI_DEV_EIOTLB_ADDR(addr & ~mask) | >>> QI_DEV_EIOTLB_SIZE; >> desc.q1 |= > right, thanks >>> + } >>> + qi_submit_sync(, iommu); >>> +} >>> + >> /* PASID-cache invalidation */ >>> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 >>> granu, int pasid) +{ >>> + struct qi_desc desc; >>> + >>> + desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) >>> | QI_PC_PASID(pasid); >>> + desc.qw1 = 0; >>> + desc.qw2 = 0; >>> + desc.qw3 = 0; >>> + qi_submit_sync(, iommu); >>> +} >>> /* >>> * Disable Queued Invalidation interface. >>> */ >>> diff --git a/include/linux/intel-iommu.h >>> b/include/linux/intel-iommu.h index 5d67d0d4..38e5efb 100644 >>> --- a/include/linux/intel-iommu.h >>> +++ b/include/linux/intel-iommu.h >>> @@ -339,7 +339,7 @@ enum { >>> #define QI_IOTLB_GRAN(gran)(((u64)gran) >> >>> (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr) >>> (((u64)addr) & VTD_PAGE_MASK) #define >>> QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define >>> QI_IOTLB_AM(am) (((u8)am)) +#define >>> QI_IOTLB_AM(am) (((u8)am) & 0x3f) >>> #define QI_CC_FM(fm) (((u64)fm) << 48) >>> #define QI_CC_SID(sid) (((u64)sid) << 32) >>> @@ -357,17 +357,22 @@ enum { >>> #define QI_PC_DID(did) (((u64)did) << 16) >>> #define QI_PC_GRAN(gran) (((u64)gran) << 4) >>> >>> -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0)) >>> -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1)) >>> +/* PASID cache invalidation granu */ >>> +#define QI_PC_ALL_PASIDS 0 >>> +#define QI_PC_PASID_SEL1 >>> >>> #define QI_EIOTLB_ADDR(addr) ((u64)(addr) &
Re: [PATCH v3 5/8] iommu/vt-d: Implement def_domain_type iommu ops entry
Hi Christoph, On 4/30/19 4:03 AM, Christoph Hellwig wrote: @@ -3631,35 +3607,30 @@ static int iommu_no_mapping(struct device *dev) if (iommu_dummy(dev)) return 1; - if (!iommu_identity_mapping) - return 0; - FYI, iommu_no_mapping has been refactored in for-next: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/vt-d=48b2c937ea37a3bece0094b46450ed5267525289 Oh, yes! Thanks for letting me know this. Will rebase the code. found = identity_mapping(dev); if (found) { + /* +* If the device's dma_mask is less than the system's memory +* size then this is not a candidate for identity mapping. +*/ + u64 dma_mask = *dev->dma_mask; + + if (dev->coherent_dma_mask && + dev->coherent_dma_mask < dma_mask) + dma_mask = dev->coherent_dma_mask; + + if (dma_mask < dma_get_required_mask(dev)) { I know this is mostly existing code moved around, but it really needs some fixing. For one dma_get_required_mask is supposed to return the required to not bounce mask for the given device. E.g. for a device behind an iommu it should always just return 32-bit. If you really want to check vs system memory please call dma_direct_get_required_mask without the dma_ops indirection. Second I don't even think we need to check the coherent_dma_mask, dma_direct is pretty good at always finding memory even without an iommu. Third this doesn't take take the bus_dma_mask into account. This probably should just be: if (min(*dev->dma_mask, dev->bus_dma_mask) < dma_direct_get_required_mask(dev)) { Agreed and will add this in the next version. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
Hi Robin, On 4/29/19 7:06 PM, Robin Murphy wrote: On 29/04/2019 06:10, Lu Baolu wrote: Hi Christoph, On 4/26/19 11:04 PM, Christoph Hellwig wrote: On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote: This is not VT-d specific. It's just how generic IOMMU works. Normally, IOMMU works in paging mode. So if a driver issues DMA with IOVA 0x0123, IOMMU can remap it with a physical address 0x0123. But we should never expect IOMMU to remap 0x0123 with physical address of 0x. That's the reason why I said that IOMMU will not work there. Well, with the iommu it doesn't happen. With swiotlb it obviosuly can happen, so drivers are fine with it. Why would that suddenly become an issue when swiotlb is called from the iommu code? I would say IOMMU is DMA remapping, not DMA engine. :-) I'm not sure I really follow the issue here - if we're copying the buffer to the bounce page(s) there's no conceptual difference from copying it to SWIOTLB slot(s), so there should be no need to worry about the original in-page offset. From the reply up-thread I guess you're trying to include an optimisation to only copy the head and tail of the buffer if it spans multiple pages, and directly map the ones in the middle, but AFAICS that's going to tie you to also using strict mode for TLB maintenance, which may not be a win overall depending on the balance between invalidation bandwidth vs. memcpy bandwidth. At least if we use standard SWIOTLB logic to always copy the whole thing, we should be able to release the bounce pages via the flush queue to allow 'safe' lazy unmaps. With respect, even we use the standard SWIOTLB logic, we need to use the strict mode for TLB maintenance. Say, some swiotbl slots are used by untrusted device for bounce page purpose. When the device driver unmaps the IOVA, the slots are freed but the mapping is still cached in IOTLB, hence the untrusted device is still able to access the slots. Then the slots are allocated to other devices. This makes it possible for the untrusted device to access the data buffer of other devices. Best regards, Lu Baolu Either way I think it would be worth just implementing the straightforward version first, then coming back to consider optimisations later. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC/RFT PATCH 0/2] Optimize dma_*_from_contiguous calls
This series of patches try to optimize dma_*_from_contiguous calls: PATCH-1 does some abstraction and cleanup. PATCH-2 saves single pages and reduce fragmentations from CMA area. Both two patches may impact the source of pages (CMA or normal) depending on the use cases, so are being tagged with "RFC/RFT". Please check their commit messages for detail. Nicolin Chen (2): dma-contiguous: Simplify dma_*_from_contiguous() function calls dma-contiguous: Use fallback alloc_pages for single pages arch/arm/mm/dma-mapping.c | 14 +++- arch/arm64/mm/dma-mapping.c| 11 +++--- arch/xtensa/kernel/pci-dma.c | 19 +++--- drivers/iommu/amd_iommu.c | 20 +++ drivers/iommu/intel-iommu.c| 20 ++- include/linux/dma-contiguous.h | 15 +++- kernel/dma/contiguous.c| 64 ++ kernel/dma/direct.c| 24 +++-- kernel/dma/remap.c | 11 ++ 9 files changed, 73 insertions(+), 125 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/10] iommu: Add helper to get minimal page size of domain
Hi Robin, On 4/29/19 6:55 PM, Robin Murphy wrote: On 21/04/2019 02:17, Lu Baolu wrote: This makes it possible for other modules to know the minimal page size supported by a domain without the knowledge of the structure details. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a5007d035218..46679ef19b7e 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -377,6 +377,14 @@ static inline void iommu_tlb_sync(struct iommu_domain *domain) domain->ops->iotlb_sync(domain); } +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain) +{ + if (domain && domain->pgsize_bitmap) + return 1 << __ffs(domain->pgsize_bitmap); Nit: this would probably be more efficient on most architectures as: if (domain) return domain->pgsize_bitmap & -domain->pgsize_bitmap; It looks reasonable to me. I'd also suggest s/minimal/min/ in the name, just to stop it getting too long. Otherwise, though, I like the idea, and there's at least one other place (in iommu-dma) that can make use of it straight away. Okay, I will change the name to domain_min_pgsize(). Robin. Best regards, Lu Baolu + + return 0; +} + /* PCI device grouping function */ extern struct iommu_group *pci_device_group(struct device *dev); /* Generic device grouping function */ @@ -704,6 +712,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain) +{ + return 0; +} + #endif /* CONFIG_IOMMU_API */ #ifdef CONFIG_IOMMU_DEBUGFS ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/4] iommu/dma-iommu: Use the dev->coherent_dma_mask
Use the dev->coherent_dma_mask when allocating in the dma-iommu ops api. Signed-off-by: Tom Murphy --- drivers/iommu/dma-iommu.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index c18f74ad1e8b..df03104978d7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -436,7 +436,8 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr, } static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, - size_t size, int prot, struct iommu_domain *domain) + size_t size, int prot, struct iommu_domain *domain, + dma_addr_t dma_limit) { struct iommu_dma_cookie *cookie = domain->iova_cookie; size_t iova_off = 0; @@ -447,7 +448,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size = iova_align(>iovad, size + iova_off); } - iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); + iova = iommu_dma_alloc_iova(domain, size, dma_limit, dev); if (!iova) return DMA_MAPPING_ERROR; @@ -490,7 +491,7 @@ static void *iommu_dma_alloc_contiguous(struct device *dev, size_t size, return NULL; *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot, - iommu_get_dma_domain(dev)); + iommu_get_dma_domain(dev), dev->coherent_dma_mask); if (*dma_handle == DMA_MAPPING_ERROR) { if (!dma_release_from_contiguous(dev, page, count)) __free_pages(page, page_order); @@ -760,7 +761,7 @@ static void *iommu_dma_alloc_pool(struct device *dev, size_t size, *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size, dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs), - iommu_get_domain_for_dev(dev)); + iommu_get_domain_for_dev(dev), dev->coherent_dma_mask); if (*dma_handle == DMA_MAPPING_ERROR) { dma_free_from_pool(vaddr, PAGE_ALIGN(size)); return NULL; @@ -850,7 +851,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, dma_handle =__iommu_dma_map(dev, phys, size, dma_info_to_prot(dir, coherent, attrs), - iommu_get_dma_domain(dev)); + iommu_get_dma_domain(dev), dma_get_mask(dev)); if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && dma_handle != DMA_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); @@ -1065,7 +1066,7 @@ static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, return __iommu_dma_map(dev, phys, size, dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, - iommu_get_dma_domain(dev)); + iommu_get_dma_domain(dev), dma_get_mask(dev)); } static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, @@ -1250,7 +1251,8 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!msi_page) return NULL; - iova = __iommu_dma_map(dev, msi_addr, size, prot, domain); + iova = __iommu_dma_map(dev, msi_addr, size, prot, domain, + dma_get_mask(dev)); if (iova == DMA_MAPPING_ERROR) goto out_free_page; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/4] iommu/amd: Convert the AMD iommu driver to the dma-iommu api
Convert the AMD iommu driver to the dma-iommu api. Remove the iova handling and reserve region code from the AMD iommu driver. Signed-off-by: Tom Murphy --- drivers/iommu/Kconfig | 1 + drivers/iommu/amd_iommu.c | 680 -- 2 files changed, 70 insertions(+), 611 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..5914ba85180b 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -136,6 +136,7 @@ config AMD_IOMMU select PCI_PASID select IOMMU_API select IOMMU_IOVA + select IOMMU_DMA depends on X86_64 && PCI && ACPI ---help--- With this option you can enable support for AMD IOMMU hardware in diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ea3a5dc61bb0..366af2b27e7f 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -101,8 +102,6 @@ const struct iommu_ops amd_iommu_ops; static ATOMIC_NOTIFIER_HEAD(ppr_notifier); int amd_iommu_max_glx_val = -1; -static const struct dma_map_ops amd_iommu_dma_ops; - /* * general struct to manage commands send to an IOMMU */ @@ -115,21 +114,6 @@ struct kmem_cache *amd_iommu_irq_cache; static void update_domain(struct protection_domain *domain); static int protection_domain_init(struct protection_domain *domain); static void detach_device(struct device *dev); -static void iova_domain_flush_tlb(struct iova_domain *iovad); - -/* - * Data container for a dma_ops specific protection domain - */ -struct dma_ops_domain { - /* generic protection domain information */ - struct protection_domain domain; - - /* IOVA RB-Tree */ - struct iova_domain iovad; -}; - -static struct iova_domain reserved_iova_ranges; -static struct lock_class_key reserved_rbtree_key; / * @@ -200,12 +184,6 @@ static struct protection_domain *to_pdomain(struct iommu_domain *dom) return container_of(dom, struct protection_domain, domain); } -static struct dma_ops_domain* to_dma_ops_domain(struct protection_domain *domain) -{ - BUG_ON(domain->flags != PD_DMA_OPS_MASK); - return container_of(domain, struct dma_ops_domain, domain); -} - static struct iommu_dev_data *alloc_dev_data(u16 devid) { struct iommu_dev_data *dev_data; @@ -1279,12 +1257,6 @@ static void domain_flush_pages(struct protection_domain *domain, __domain_flush_pages(domain, address, size, 0); } -/* Flush the whole IO/TLB for a given protection domain */ -static void domain_flush_tlb(struct protection_domain *domain) -{ - __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0); -} - /* Flush the whole IO/TLB for a given protection domain - including PDE */ static void domain_flush_tlb_pde(struct protection_domain *domain) { @@ -1686,43 +1658,6 @@ static unsigned long iommu_unmap_page(struct protection_domain *dom, return unmapped; } -/ - * - * The next functions belong to the address allocator for the dma_ops - * interface functions. - * - / - - -static unsigned long dma_ops_alloc_iova(struct device *dev, - struct dma_ops_domain *dma_dom, - unsigned int pages, u64 dma_mask) -{ - unsigned long pfn = 0; - - pages = __roundup_pow_of_two(pages); - - if (dma_mask > DMA_BIT_MASK(32)) - pfn = alloc_iova_fast(_dom->iovad, pages, - IOVA_PFN(DMA_BIT_MASK(32)), false); - - if (!pfn) - pfn = alloc_iova_fast(_dom->iovad, pages, - IOVA_PFN(dma_mask), true); - - return (pfn << PAGE_SHIFT); -} - -static void dma_ops_free_iova(struct dma_ops_domain *dma_dom, - unsigned long address, - unsigned int pages) -{ - pages = __roundup_pow_of_two(pages); - address >>= PAGE_SHIFT; - - free_iova_fast(_dom->iovad, address, pages); -} - / * * The next functions belong to the domain allocation. A domain is @@ -1824,40 +1759,25 @@ static void free_gcr3_table(struct protection_domain *domain) free_page((unsigned long)domain->gcr3_tbl); } -static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom) -{ - domain_flush_tlb(>domain); - domain_flush_complete(>domain); -} - -static void iova_domain_flush_tlb(struct iova_domain *iovad) -{ - struct dma_ops_domain *dom; - - dom = container_of(iovad, struct dma_ops_domain, iovad); - - dma_ops_domain_flush_tlb(dom);
[PATCH v2 1/4] iommu: Add gfp parameter to iommu_ops::map
Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver. The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools. We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap. iommu_map doesn’t lock while mapping and so no two calls should touch the same iova range. The AMD driver already handles the page table page allocations without locks so we can safely remove the locks. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 14 --- drivers/iommu/arm-smmu-v3.c| 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 6 ++--- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c| 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- include/linux/iommu.h | 21 - 18 files changed, 78 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ebd062522cf5..ea3a5dc61bb0 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3092,7 +3092,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, } static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, -phys_addr_t paddr, size_t page_size, int iommu_prot) +phys_addr_t paddr, size_t page_size, int iommu_prot, +gfp_t gfp) { struct protection_domain *domain = to_pdomain(dom); int prot = 0; @@ -3106,9 +3107,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - mutex_lock(>api_lock); - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); - mutex_unlock(>api_lock); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); @@ -3119,16 +3118,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, size_t page_size) { struct protection_domain *domain = to_pdomain(dom); - size_t unmap_size; if (domain->mode == PAGE_MODE_NONE) return 0; - mutex_lock(>api_lock); - unmap_size = iommu_unmap_page(domain, iova, page_size); - mutex_unlock(>api_lock); - - return unmap_size; + return iommu_unmap_page(domain, iova, page_size); } static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom, diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index d3880010c6cf..e5c48089b49f 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1777,7 +1777,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 045d93884164..2d50db55b788 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1286,7 +1286,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index fa5713a4f6f8..7a96c2c8f56b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -440,7 +440,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, if (!iova) return
[PATCH v2 0/4] iommu/amd: Convert the AMD iommu driver to the dma-iommu api
Convert the AMD iommu driver to the dma-iommu api. Remove the iova handling and reserve region code from the AMD iommu driver. Change-log: v2: -Rebase on top of this series: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3 -Add a gfp_t parameter to the iommu_ops::map function. -Made use of the reserve region code inside the dma-iommu api Tom Murphy (4): iommu: Add gfp parameter to iommu_ops::map iommu/dma-iommu: Handle deferred devices iommu/dma-iommu: Use the dev->coherent_dma_mask iommu/amd: Convert the AMD iommu driver to the dma-iommu api drivers/iommu/Kconfig | 1 + drivers/iommu/amd_iommu.c | 694 - drivers/iommu/arm-smmu-v3.c| 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 52 ++- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c| 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- include/linux/iommu.h | 21 +- 19 files changed, 187 insertions(+), 652 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] iommu/dma-iommu: Handle deferred devices
Handle devices which defer their attach to the iommu in the dma-iommu api Signed-off-by: Tom Murphy --- drivers/iommu/dma-iommu.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7a96c2c8f56b..c18f74ad1e8b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -322,6 +322,17 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, return iova_reserve_iommu_regions(dev, domain); } +static int handle_deferred_device(struct device *dev) +{ + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_ops *ops = domain->ops; + + if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev)) + return iommu_attach_device(domain, dev); + + return 0; +} + /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API *page flags. @@ -835,6 +846,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, bool coherent = dev_is_dma_coherent(dev); dma_addr_t dma_handle; + handle_deferred_device(dev); + dma_handle =__iommu_dma_map(dev, phys, size, dma_info_to_prot(dir, coherent, attrs), iommu_get_dma_domain(dev)); @@ -849,6 +862,8 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, { struct iommu_domain *domain = iommu_get_dma_domain(dev); + handle_deferred_device(dev); + if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { phys_addr_t phys = iommu_iova_to_phys(domain, dma_handle); @@ -873,6 +888,8 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, unsigned int cur_len = 0, max_len = dma_get_max_seg_size(dev); int i, count = 0; + handle_deferred_device(dev); + for_each_sg(sg, s, nents, i) { /* Restore this segment's original unaligned fields first */ unsigned int s_iova_off = sg_dma_address(s); @@ -1022,6 +1039,8 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, struct scatterlist *tmp; int i; + handle_deferred_device(dev); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir); @@ -1042,6 +1061,8 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs) { + handle_deferred_device(dev); + return __iommu_dma_map(dev, phys, size, dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, iommu_get_dma_domain(dev)); @@ -1050,12 +1071,15 @@ static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { + handle_deferred_device(dev); + __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size); } static void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { + handle_deferred_device(dev); gfp |= __GFP_ZERO; #ifdef CONFIG_DMA_DIRECT_REMAP @@ -1076,6 +1100,8 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, { struct page *page; + handle_deferred_device(dev); + /* * cpu_addr can be one of 4 things depending on how it was allocated: * @@ -1115,6 +1141,8 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, unsigned long pfn; int ret; + handle_deferred_device(dev); + vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs); if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, )) @@ -1143,6 +1171,8 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, struct page *page; int ret; + handle_deferred_device(dev); + #ifdef CONFIG_DMA_DIRECT_REMAP if (is_vmalloc_addr(cpu_addr)) { if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 19/19] iommu/vt-d: Add svm/sva invalidate function
On Fri, 26 Apr 2019 19:23:03 +0200 Auger Eric wrote: > Hi Jacob, > On 4/24/19 1:31 AM, Jacob Pan wrote: > > When Shared Virtual Address (SVA) is enabled for a guest OS via > > vIOMMU, we need to provide invalidation support at IOMMU API and > > driver level. This patch adds Intel VT-d specific function to > > implement iommu passdown invalidate API for shared virtual address. > > > > The use case is for supporting caching structure invalidation > > of assigned SVM capable devices. Emulated IOMMU exposes queue > > invalidation capability and passes down all descriptors from the > > guest to the physical IOMMU. > > > > The assumption is that guest to host device ID mapping should be > > resolved prior to calling IOMMU driver. Based on the device handle, > > host IOMMU driver can replace certain fields before submit to the > > invalidation queue. > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Ashok Raj > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/intel-iommu.c | 159 > > 1 file changed, 159 > > insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 89989b5..54a3d22 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5338,6 +5338,164 @@ static void > > intel_iommu_aux_detach_device(struct iommu_domain *domain, > > aux_domain_remove_dev(to_dmar_domain(domain), dev); } > > > > +/* > > + * 2D array for converting and sanitizing IOMMU generic TLB > > granularity to > > + * VT-d granularity. Invalidation is typically included in the > > unmap operation > > + * as a result of DMA or VFIO unmap. However, for assigned device > > where guest > > + * could own the first level page tables without being shadowed by > > QEMU. In > > + * this case there is no pass down unmap to the host IOMMU as a > > result of unmap > > + * in the guest. Only invalidations are trapped and passed down. > > + * In all cases, only first level TLB invalidation (request with > > PASID) can be > > + * passed down, therefore we do not include IOTLB granularity for > > request > > + * without PASID (second level). > > + * > > + * For an example, to find the VT-d granularity encoding for IOTLB > > + * type and page selective granularity within PASID: > > + * X: indexed by iommu cache type > > + * Y: indexed by enum iommu_inv_granularity > > + * [IOMMU_INV_TYPE_TLB][IOMMU_INV_GRANU_PAGE_PASID] > > + * > > + * Granu_map array indicates validity of the table. 1: valid, 0: > > invalid > > + * > > + */ > > +const static int > > inv_type_granu_map[NR_IOMMU_CACHE_TYPE][NR_IOMMU_CACHE_INVAL_GRANU] > > = { > The size is frozen for a given uapi version so I guess you can > hardcode the limits for a given version. I guess I could, I just felt more readable this way. > > + /* PASID based IOTLB, support PASID selective and page > > selective */ > > + {0, 1, 1}, > > + /* PASID based dev TLBs, only support all PASIDs or single > > PASID */ > > + {1, 1, 0}, > > + /* PASID cache */ > > + {1, 1, 0} > > +}; > > + > > +const static u64 > > inv_type_granu_table[NR_IOMMU_CACHE_TYPE][NR_IOMMU_CACHE_INVAL_GRANU] > > = { > > + /* PASID based IOTLB */ > > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > > + /* PASID based dev TLBs */ > > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > > + /* PASID cache */ > > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, > > +}; > Can't you use a single matrix instead, ie. inv_type_granu_table > The reason i have an additional inv_type_granu_map[] matrix is that some of fields can be 0 but still valid. A single matrix would not be able to tell the difference between a valid 0 or invalid field. > > + > > +static inline int to_vtd_granularity(int type, int granu, u64 > > *vtd_granu) +{ > > + if (type >= NR_IOMMU_CACHE_TYPE || granu >= > > NR_IOMMU_CACHE_INVAL_GRANU || > > + !inv_type_granu_map[type][granu]) > > + return -EINVAL; > > + > > + *vtd_granu = inv_type_granu_table[type][granu]; > > + > > + return 0; > > +} > > + > > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) > > +{ > > + u64 nr_pages; > direct initialization? will do, thanks > > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 > > for 2MB, etc. > > +* IOMMU cache invalidate API passes granu_size in bytes, > > and number of > > +* granu size in contiguous memory. > > +*/ > > + > > + nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > > + return order_base_2(nr_pages); > > +} > > + > > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > > + struct device *dev, struct > > iommu_cache_invalidate_info *inv_info) +{ > > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > > + struct device_domain_info *info; > > + struct intel_iommu *iommu; > > + unsigned long flags; > > + int cache_type; > > + u8 bus, devfn; > > + u16 did, sid; > > + int ret = 0; > > +
Re: [PATCH v2 18/19] iommu/vt-d: Support flushing more translation cache types
On Sat, 27 Apr 2019 11:04:04 +0200 Auger Eric wrote: > Hi Jacob, > > On 4/24/19 1:31 AM, Jacob Pan wrote: > > When Shared Virtual Memory is exposed to a guest via vIOMMU, > > extended IOTLB invalidation may be passed down from outside IOMMU > > subsystems. This patch adds invalidation functions that can be used > > for additional translation cache types. > > > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/dmar.c| 48 > > + > > include/linux/intel-iommu.h | 21 2 files > > changed, 65 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index 9c49300..680894e 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1357,6 +1357,20 @@ void qi_flush_iotlb(struct intel_iommu > > *iommu, u16 did, u64 addr, qi_submit_sync(, iommu); > > } > > > /* PASID-based IOTLB Invalidate */ > > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, > > u32 pasid, > > + unsigned int size_order, u64 granu) > > +{ > > + struct qi_desc desc; > > + > > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | > > + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE; > > + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(0) | > > + QI_EIOTLB_AM(size_order); > I see IH it hardcoded to 0. Don't you envision to cascade the IH. On > ARM this was needed for perf sake. Right, we should cascade IH based on IOMMU_INV_ADDR_FLAGS_LEAF. Just curious how do you deduce the IH information on ARM? I guess you need to get the non-leaf page directory info? I will add an argument for IH. > > + desc.qw2 = 0; > > + desc.qw3 = 0; > > + qi_submit_sync(, iommu); > > +} > > + > > void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 > > pfsid, u16 qdep, u64 addr, unsigned mask) > > { > > @@ -1380,6 +1394,40 @@ void qi_flush_dev_iotlb(struct intel_iommu > > *iommu, u16 sid, u16 pfsid, qi_submit_sync(, iommu); > > } > > > /* Pasid-based Device-TLB Invalidation */ yes, better to explain piotlb :), same for iotlb. > > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 > > pfsid, > > + u32 pasid, u16 qdep, u64 addr, unsigned size, u64 > > granu) +{ > > + struct qi_desc desc; > > + > > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | > > QI_DEV_EIOTLB_SID(sid) | > > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | > > + QI_DEV_IOTLB_PFSID(pfsid); > > + desc.qw1 |= QI_DEV_EIOTLB_GLOB(granu); should be desc.qw1 = > > + > > + /* If S bit is 0, we only flush a single page. If S bit is > > set, > > +* The least significant zero bit indicates the size. VT-d > > spec > > +* 6.5.2.6 > > +*/ > > + if (!size) > > + desc.qw0 = QI_DEV_EIOTLB_ADDR(addr) & > > ~QI_DEV_EIOTLB_SIZE; > desc.q1 |= ? Right, I also missed previous qw1 assignment. > > + else { > > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + > > size); + > > + desc.qw1 = QI_DEV_EIOTLB_ADDR(addr & ~mask) | > > QI_DEV_EIOTLB_SIZE; > desc.q1 |= right, thanks > > + } > > + qi_submit_sync(, iommu); > > +} > > + > /* PASID-cache invalidation */ > > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 > > granu, int pasid) +{ > > + struct qi_desc desc; > > + > > + desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) > > | QI_PC_PASID(pasid); > > + desc.qw1 = 0; > > + desc.qw2 = 0; > > + desc.qw3 = 0; > > + qi_submit_sync(, iommu); > > +} > > /* > > * Disable Queued Invalidation interface. > > */ > > diff --git a/include/linux/intel-iommu.h > > b/include/linux/intel-iommu.h index 5d67d0d4..38e5efb 100644 > > --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -339,7 +339,7 @@ enum { > > #define QI_IOTLB_GRAN(gran)(((u64)gran) >> > > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr) > > (((u64)addr) & VTD_PAGE_MASK) #define > > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define > > QI_IOTLB_AM(am) (((u8)am)) +#define > > QI_IOTLB_AM(am) (((u8)am) & 0x3f) > > #define QI_CC_FM(fm) (((u64)fm) << 48) > > #define QI_CC_SID(sid) (((u64)sid) << 32) > > @@ -357,17 +357,22 @@ enum { > > #define QI_PC_DID(did) (((u64)did) << 16) > > #define QI_PC_GRAN(gran) (((u64)gran) << 4) > > > > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0)) > > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1)) > > +/* PASID cache invalidation granu */ > > +#define QI_PC_ALL_PASIDS 0 > > +#define QI_PC_PASID_SEL1 > > > > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) > > #define QI_EIOTLB_GL(gl) (((u64)gl) << 7) > > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6) > > -#define QI_EIOTLB_AM(am) (((u64)am)) > > +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f) > > #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32) > > #define
Re: [PATCH v2 0/9] PCI: add help pci_dev_id
On Wed, Apr 24, 2019 at 09:10:21PM +0200, Heiner Kallweit wrote: > In several places in the kernel we find PCI_DEVID used like this: > PCI_DEVID(dev->bus->number, dev->devfn) Therefore create a helper > for it. > > v2: > - apply the change to all affected places in the kernel > > Heiner Kallweit (9): > PCI: add helper pci_dev_id > PCI: use helper pci_dev_id > r8169: use new helper pci_dev_id > powerpc/powernv/npu: use helper pci_dev_id > drm/amdkfd: use helper pci_dev_id > iommu/amd: use helper pci_dev_id > iommu/vt-d: use helper pci_dev_id > stmmac: pci: use helper pci_dev_id > platform/chrome: chromeos_laptop: use helper pci_dev_id > > arch/powerpc/platforms/powernv/npu-dma.c | 14 ++ > drivers/gpu/drm/amd/amdkfd/kfd_topology.c| 3 +-- > drivers/iommu/amd_iommu.c| 2 +- > drivers/iommu/intel-iommu.c | 2 +- > drivers/iommu/intel_irq_remapping.c | 2 +- > drivers/net/ethernet/realtek/r8169.c | 3 +-- > drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 2 +- > drivers/pci/msi.c| 6 +++--- > drivers/pci/search.c | 10 +++--- > drivers/platform/chrome/chromeos_laptop.c| 2 +- > include/linux/pci.h | 5 + > 11 files changed, 24 insertions(+), 27 deletions(-) Applied with acks/reviewed-by from Benson, Joerg, Christian, Alexey, and David to pci/misc for v5.2, thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
On Mon, Apr 29, 2019 at 04:57:20PM +0100, Marc Zyngier wrote: > Thanks for having reworked this. I'm quite happy with the way this looks > now (modulo the couple of nits Robin and I mentioned, which I'm to > address myself). > > Jorg: are you OK with this going via the irq tree? As-is this has a trivial conflict with my "implement generic dma_map_ops for IOMMUs" series. But I can tweak it a bit to avoid that conflict. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 14/26] iommu/dma: Refactor iommu_dma_free
On Mon, Apr 29, 2019 at 09:03:48PM +0200, Christoph Hellwig wrote: > On Mon, Apr 29, 2019 at 02:59:43PM +0100, Robin Murphy wrote: > > Hmm, I do still prefer my original flow with the dma_common_free_remap() > > call right out of the way at the end rather than being a special case in > > the middle of all the page-freeing (which is the kind of existing > > complexity I was trying to eliminate). I guess you've done this to avoid > > having "if (!dma_release_from_contiguous(...))..." twice like I ended up > > with, which is fair enough I suppose - once we manage to solve the new > > dma_{alloc,free}_contiguous() interface that may tip the balance so I can > > always revisit this then. > > Ok, I'll try to accomodate that with a minor rework. Does this look reasonable? diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 5b2a2bf44078..f884d22b1388 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -921,7 +921,7 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, { size_t alloc_size = PAGE_ALIGN(size); int count = alloc_size >> PAGE_SHIFT; - struct page *page = NULL; + struct page *page = NULL, **pages = NULL; __iommu_dma_unmap(dev, handle, size); @@ -934,19 +934,17 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, * If it the address is remapped, then it's either non-coherent * or highmem CMA, or an iommu_dma_alloc_remap() construction. */ - struct page **pages = __iommu_dma_get_pages(cpu_addr); - - if (pages) - __iommu_dma_free_pages(pages, count); - else + pages = __iommu_dma_get_pages(cpu_addr); + if (!pages) page = vmalloc_to_page(cpu_addr); - dma_common_free_remap(cpu_addr, alloc_size, VM_USERMAP); } else { /* Lowmem means a coherent atomic or CMA allocation */ page = virt_to_page(cpu_addr); } + if (pages) + __iommu_dma_free_pages(pages, count); if (page && !dma_release_from_contiguous(dev, page, count)) __free_pages(page, get_order(alloc_size)); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 11/26] iommu/dma: Factor out remapped pages lookup
On Mon, Apr 29, 2019 at 02:05:46PM +0100, Robin Murphy wrote: > On 22/04/2019 18:59, Christoph Hellwig wrote: >> From: Robin Murphy >> >> Since we duplicate the find_vm_area() logic a few times in places where >> we only care aboute the pages, factor out a helper to abstract it. >> >> Signed-off-by: Robin Murphy >> [hch: don't warn when not finding a region, as we'll rely on that later] > > Yeah, I did think about that and the things which it might make a little > easier, but preserved it as-is for the sake of keeping my modifications > quick and simple. TBH I'm now feeling more inclined to drop the WARNs > entirely at this point, since it's not like there's ever been any general > guarantee that freeing the wrong thing shouldn't just crash, but that's > something we can easily come back to later if need be. Ok, I've dropped the warnings. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 14/26] iommu/dma: Refactor iommu_dma_free
On Mon, Apr 29, 2019 at 02:59:43PM +0100, Robin Murphy wrote: > Hmm, I do still prefer my original flow with the dma_common_free_remap() > call right out of the way at the end rather than being a special case in > the middle of all the page-freeing (which is the kind of existing > complexity I was trying to eliminate). I guess you've done this to avoid > having "if (!dma_release_from_contiguous(...))..." twice like I ended up > with, which is fair enough I suppose - once we manage to solve the new > dma_{alloc,free}_contiguous() interface that may tip the balance so I can > always revisit this then. Ok, I'll try to accomodate that with a minor rework. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/26] arm64/iommu: improve mmap bounds checking
On Mon, Apr 29, 2019 at 01:35:46PM +0100, Robin Murphy wrote: > On 22/04/2019 18:59, Christoph Hellwig wrote: >> The nr_pages checks should be done for all mmap requests, not just those >> using remap_pfn_range. > > I think it probably makes sense now to just squash this with #22 one way or > the other, but if you really really still want to keep it as a separate > patch with a misleading commit message then I'm willing to keep my > complaints to myself :) Well, I split this out in response to your earlier comments, so if you prefer it squasheѕ back in I can do that.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
On Mon, Apr 29, 2019 at 12:51:20PM +0200, starost...@gmail.com wrote: > Hello, > sorry for other questions, but I am new in this list: > Is Ubuntu server 19.04 with "kernel 5.0.9-050009-generic" good for this > test? Yes, that might do depending on what else debian put in that kernel. > Can I add attachments to this lists? Sure, it's even preferred. Just try to trim non-relevant bits, and perhaps provide a link to the full log. > And who is xhci and iommu maintainers? Are they CC in this mail? Yes, that's Mathias and Joerg, respectively, that I added on CC. > Dne 29.4.2019 v 11:48 Johan Hovold napsal(a): > > So this is a debian 4.18 kernel seemingly crashing due to a xhci or > > iommu issue. > > > > Can you reproduce this on a mainline kernel? > > > > If so, please post the corresponding logs to the lists and CC the xhci > > and iommu maintainers (added to CC). Here's an excerpt from the 4.18 log meanwhile: [0.00] Linux version 4.18.0-0.bpo.1-amd64 (debian-ker...@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Debian 4.18.6-1~bpo9+1 (2018-09-13) ... [ 960.145603] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 1790.293623] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfff5 flags=0x0020] [ 1790.300905] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfde9e000 flags=0x0020] ... [ 4789.145364] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 4789.310916] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xff63c000 flags=0x0020] [ 4789.317023] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xff5c6000 flags=0x0020] [ 4789.702446] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 4789.766842] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfdeaf000 flags=0x0020] [ 4789.781531] AMD-Vi: Event logged [IO_PAGE_FAULT device=15:00.0 domain=0x address=0xfdeaf040 flags=0x0020] [ 4790.093644] general protection fault: [#1] SMP NOPTI [ 4790.094186] CPU: 2 PID: 24561 Comm: readua Not tainted 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1 [ 4790.094738] Hardware name: Micro-Star International Co., Ltd MS-7B38/A320M PRO-VD PLUS (MS-7B38), BIOS 1.C0 11/02/2018 [ 4790.095333] RIP: 0010:prefetch_freepointer+0x10/0x20 [ 4790.095936] Code: 58 48 c7 c7 30 09 a5 b0 e8 4b 64 eb ff eb 90 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e f3 c3 90 0f 1f 44 00 00 8b [ 4790.097314] RSP: 0018:b6e303e67ce0 EFLAGS: 00010286 [ 4790.098016] RAX: RBX: 93773f762ca13047 RCX: 77df [ 4790.098711] RDX: 77de RSI: 93773f762ca13047 RDI: 9a30de807c00 [ 4790.099413] RBP: 9a30d20cc018 R08: 9a30deca4de0 R09: [ 4790.100141] R10: 9a306f3c6638 R11: R12: 006000c0 [ 4790.100871] R13: 0008 R14: 9a30de807c00 R15: 9a30de807c00 [ 4790.101619] FS: () GS:9a30dec8(0063) knlGS:f7d6a700 [ 4790.102387] CS: 0010 DS: 002b ES: 002b CR0: 80050033 [ 4790.103157] CR2: ffc77e3c CR3: af0c CR4: 003406e0 [ 4790.103954] Call Trace: [ 4790.104753] kmem_cache_alloc_trace+0xb5/0x1c0 [ 4790.105580] ? proc_do_submiturb+0x35a/0xda0 [usbcore] [ 4790.106382] proc_do_submiturb+0x35a/0xda0 [usbcore] [ 4790.107189] ? futex_wake+0x94/0x170 [ 4790.108009] proc_submiturb_compat+0xb1/0xe0 [usbcore] [ 4790.108851] usbdev_do_ioctl+0x894/0x1170 [usbcore] [ 4790.109704] usbdev_compat_ioctl+0xc/0x10 [usbcore] [ 4790.110553] __ia32_compat_sys_ioctl+0xc0/0x250 [ 4790.111413] do_fast_syscall_32+0x98/0x1e0 [ 4790.112280] entry_SYSCALL_compat_after_hwframe+0x45/0x4d [ 4790.113167] Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_filter binfmt_misc nls_ascii nls_cp437 vfat fat efi_pstore edac_mce_amd ccp rng_core kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi snd_hda_intel ghash_clmulni_intel wmi_bmof snd_hda_codec efivars pcspkr snd_hda_core snd_hwdep snd_pcm k10temp snd_timer sg snd soundcore sp5100_tco evdev pcc_cpufreq acpi_cpufreq efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb sd_mod hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper amdgpu chash gpu_sched [ 4790.120417] i2c_algo_bit i2c_piix4 ttm drm_kms_helper ahci xhci_pci libahci xhci_hcd drm libata r8169 mii usbcore scsi_mod usb_common
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
On Fri, Apr 26, 2019 at 03:47:15PM +0200, starost...@gmail.com wrote: > Hello all, > we are development and manufacturing company that use your FT232R serial > converter for couple of years with our software. We consume about a > hundreds pcs of FT232R per yer. We use FT232R as USB serial converter > with direct access (no virtual serial port) and as a "hardware key" > using FTDIChip-ID. We operate our software with FT232R converters on > Windows and Debian Linux operating system. > > We have used Intel motherboards with Intel processors so far. We want to > use AMD motherboards with AMD processors too. *We made a couple of tests > on AMD motherboards with AMD processors and Debian Linux 9.6, but we > have come across a big problem. > **When we open internal EEPROM of FT232R for reading, there will arise > many error messages in system log files. And then Debian Linux crash > after some time!* > > > _1) Hardware configurations:_ > - motherboards with AMD A320M chipset: > - MSI A320M PRO-VD PLUS, > https://www.msi.com/Motherboard/support/A320M-PRO-VD-PLUS > - ASUS PRIME A320M-K, https://www.asus.com/Motherboards/PRIME-A320M-K > - GIGABYTE A320M-S2H, > https://www.gigabyte.com/Motherboard/GA-A320M-S2H-rev-1x#kf > - latest bios installed, default bios configuration loaded, > - CPU AMD Athlon 200GE, AMD Ryzen 3 2200G > - 4GB RAM, SSD drive Kingston A400 120GB > > _2a) Operating system A:_ > - Debian Linux 9.6 64bit, https://www.debian.org/distrib/, > https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-9.6.0-amd64-netinst.iso > - installed from Netinst installer WITHOUT graphic dekstop, default > configuration > - tested kernels > - default kernel 4.9.0.8-amd64, > https://packages.debian.org/stretch/linux-image-4.9.0-8-amd64 > - backports kernel 4.18.0-0.bpo.1-amd64, > https://packages.debian.org/stretch-backports/linux-image-4.18.0-0.bpo.1-amd64 > > _2b) Operating system B:_ > - Ubuntu server 19.04 64bit, http://releases.ubuntu.com/19.04/, > http://releases.ubuntu.com/19.04/ubuntu-19.04-live-server-amd64.iso > - installed WITHOUT graphic dekstop, default configuration > - tested kernels > - default kernel 5.0.0-amd64, > https://packages.debian.org/stretch/linux-image-4.9.0-8-amd64 > - experimental kernel 5.0.9-050009-generic amd64, > https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.9/ > > _3) Drivers_ > - libftd2xx drivers version 1.4.8, https://www.ftdichip.com/Drivers/D2XX.htm > > _4) Performed tests_ > You can repeat this test: > - connect FT232R into USB2.0 port (not USB3 port!) > - use examples in directory: ...\libftd2xx-i386-1.4.8.tar\release\examples\ > - add parameter "-m32" to "CFLAGS" variable into "Rules.make" file > - compiled "\examples\EEPROM\user\read\" > - run script "test.sh" - see attached file > - *Debian Linux or Ubuntu server crashes after some minutes* - see > attached kernel logs from our system > - see "kern.log" https://paste.ee/p/xxIZ2 So this is a debian 4.18 kernel seemingly crashing due to a xhci or iommu issue. Can you reproduce this on a mainline kernel? If so, please post the corresponding logs to the lists and CC the xhci and iommu maintainers (added to CC). > _5) Very important note_ > *This problem occurs when FT232R is connected into USB2.0 port only!* > When it is connected into USB3 port, all works fine, no error messages, > no crash. > > _6) Other test that we made_ > - we made other tests on Windows 10 > - same configuration with ASUS PRIME A320M-K motherboard > - latest drivers + latest FTDI drivers > - FT232R connected to USB2.0 or USB3 - no problem > > - we made the same tests on Intel architecture (that we use now) > - motherboard MSI B250M PRO-VH, CPU Intel Pentium G4560, 4GB RAM, SSD > drive Kingston A400 120GB > - same operating system Debian Linux 9.6 64bit as descripted above > - FT232R connected to USB2.0 or USB3 - no problem Johan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
Hello, sorry for other questions, but I am new in this list: Is Ubuntu server 19.04 with "kernel 5.0.9-050009-generic" good for this test? Can I add attachments to this lists? And who is xhci and iommu maintainers? Are they CC in this mail? starosta Dne 29.4.2019 v 11:48 Johan Hovold napsal(a): So this is a debian 4.18 kernel seemingly crashing due to a xhci or iommu issue. Can you reproduce this on a mainline kernel? If so, please post the corresponding logs to the lists and CC the xhci and iommu maintainers (added to CC). ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
Ok, I attach log from today test on Ubuntu 19.04 with5.0.9-050009-generickernel. Full kern.log: https://paste.ee/p/yF3Qi#section0 Dmesg log: https://paste.ee/p/yF3Qi#section1 Summary: - motherboard with AMD A320M chipset, CPU AMD Athlon 200GE - Ubuntu 19.04 default instalation, kernel 5.0.9-050009-generic - connected "FTDI FT232R" (USB to serial converter) to USB port, latest FTDI libftd2xx drivers version 1.4.8 installed, https://www.ftdichip.com/Drivers/D2XX.htm - ftdi_sio driver is unloaded (rmmod ftdi_sio) - problem is with reading EEPROM content from FTDI FT232R chip - problem is on USB2.0 port only, on USB3 port it works fine starosta Dne 29.4.2019 v 13:22 Johan Hovold napsal(a): On Mon, Apr 29, 2019 at 12:51:20PM +0200, starost...@gmail.com wrote: Hello, sorry for other questions, but I am new in this list: Is Ubuntu server 19.04 with "kernel 5.0.9-050009-generic" good for this test? Yes, that might do depending on what else debian put in that kernel. Can I add attachments to this lists? Sure, it's even preferred. Just try to trim non-relevant bits, and perhaps provide a link to the full log. And who is xhci and iommu maintainers? Are they CC in this mail? Yes, that's Mathias and Joerg, respectively, that I added on CC. Dne 29.4.2019 v 11:48 Johan Hovold napsal(a): So this is a debian 4.18 kernel seemingly crashing due to a xhci or iommu issue. Can you reproduce this on a mainline kernel? If so, please post the corresponding logs to the lists and CC the xhci and iommu maintainers (added to CC). Here's an excerpt from the 4.18 log meanwhile: [0.00] Linux version 4.18.0-0.bpo.1-amd64 (debian-ker...@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Debian 4.18.6-1~bpo9+1 (2018-09-13) ... [ 960.145603] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 1790.293623] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfff5 flags=0x0020] [ 1790.300905] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfde9e000 flags=0x0020] ... [ 4789.145364] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 4789.310916] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xff63c000 flags=0x0020] [ 4789.317023] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xff5c6000 flags=0x0020] [ 4789.702446] xhci_hcd :15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. [ 4789.766842] xhci_hcd :15:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xfdeaf000 flags=0x0020] [ 4789.781531] AMD-Vi: Event logged [IO_PAGE_FAULT device=15:00.0 domain=0x address=0xfdeaf040 flags=0x0020] [ 4790.093644] general protection fault: [#1] SMP NOPTI [ 4790.094186] CPU: 2 PID: 24561 Comm: readua Not tainted 4.18.0-0.bpo.1-amd64 #1 Debian 4.18.6-1~bpo9+1 [ 4790.094738] Hardware name: Micro-Star International Co., Ltd MS-7B38/A320M PRO-VD PLUS (MS-7B38), BIOS 1.C0 11/02/2018 [ 4790.095333] RIP: 0010:prefetch_freepointer+0x10/0x20 [ 4790.095936] Code: 58 48 c7 c7 30 09 a5 b0 e8 4b 64 eb ff eb 90 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e f3 c3 90 0f 1f 44 00 00 8b [ 4790.097314] RSP: 0018:b6e303e67ce0 EFLAGS: 00010286 [ 4790.098016] RAX: RBX: 93773f762ca13047 RCX: 77df [ 4790.098711] RDX: 77de RSI: 93773f762ca13047 RDI: 9a30de807c00 [ 4790.099413] RBP: 9a30d20cc018 R08: 9a30deca4de0 R09: [ 4790.100141] R10: 9a306f3c6638 R11: R12: 006000c0 [ 4790.100871] R13: 0008 R14: 9a30de807c00 R15: 9a30de807c00 [ 4790.101619] FS: () GS:9a30dec8(0063) knlGS:f7d6a700 [ 4790.102387] CS: 0010 DS: 002b ES: 002b CR0: 80050033 [ 4790.103157] CR2: ffc77e3c CR3: af0c CR4: 003406e0 [ 4790.103954] Call Trace: [ 4790.104753] kmem_cache_alloc_trace+0xb5/0x1c0 [ 4790.105580] ? proc_do_submiturb+0x35a/0xda0 [usbcore] [ 4790.106382] proc_do_submiturb+0x35a/0xda0 [usbcore] [ 4790.107189] ? futex_wake+0x94/0x170 [ 4790.108009] proc_submiturb_compat+0xb1/0xe0 [usbcore] [ 4790.108851] usbdev_do_ioctl+0x894/0x1170 [usbcore] [ 4790.109704] usbdev_compat_ioctl+0xc/0x10 [usbcore] [ 4790.110553] __ia32_compat_sys_ioctl+0xc0/0x250 [ 4790.111413] do_fast_syscall_32+0x98/0x1e0 [ 4790.112280] entry_SYSCALL_compat_after_hwframe+0x45/0x4d [ 4790.113167] Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_filter binfmt_misc nls_ascii nls_cp437 vfat fat efi_pstore
Re: [PATCH v2 17/19] iommu: Add max num of cache and granu types
On Fri, 26 Apr 2019 18:22:46 +0200 Auger Eric wrote: > Hi Jacob, > > On 4/24/19 1:31 AM, Jacob Pan wrote: > > To convert to/from cache types and granularities between generic and > > VT-d specific counterparts, a 2D arrary is used. Introduce the > > limits > array > > to help define the converstion array size. > conversion > > will fix, thanks > > Signed-off-by: Jacob Pan > > --- > > include/uapi/linux/iommu.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > index 5c95905..2d8fac8 100644 > > --- a/include/uapi/linux/iommu.h > > +++ b/include/uapi/linux/iommu.h > > @@ -197,6 +197,7 @@ struct iommu_inv_addr_info { > > __u64 granule_size; > > __u64 nb_granules; > > }; > > +#define NR_IOMMU_CACHE_INVAL_GRANU (3) > > > > /** > > * First level/stage invalidation information > > @@ -235,6 +236,7 @@ struct iommu_cache_invalidate_info { > > struct iommu_inv_addr_info addr_info; > > }; > > }; > > +#define NR_IOMMU_CACHE_TYPE(3) > > /** > > * struct gpasid_bind_data - Information about device and guest > > PASID binding > > * @gcr3: Guest CR3 value from guest mm > > > Is it really something that needs to be exposed in the uapi? > I put it in uapi since the related definitions for granularity and cache type are in the same file. Maybe putting them close together like this? I was thinking you can just fold it into your next series as one patch for introducing cache invalidation. diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 2d8fac8..4ff6929 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -164,6 +164,7 @@ enum iommu_inv_granularity { IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */ IOMMU_INV_GRANU_PASID, /* pasid-selective invalidation */ IOMMU_INV_GRANU_ADDR, /* page-selective invalidation */ + NR_IOMMU_INVAL_GRANU, /* number of invalidation granularities */ }; /** @@ -228,6 +229,7 @@ struct iommu_cache_invalidate_info { #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ +#define NR_IOMMU_CACHE_TYPE(3) __u8cache; __u8granularity; > Thanks > > Eric [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] iommu/dma: Reserve IOVA for PCIe inaccessible DMA address
On 12/04/2019 04:13, Srinath Mannam wrote: dma_ranges field of PCI host bridge structure has resource entries in sorted order of address range given through dma-ranges DT property. This list is the accessible DMA address range. So that this resource list will be processed and reserve IOVA address to the inaccessible address holes in the list. This method is similar to PCI IO resources address ranges reserving in IOMMU for each EP connected to host bridge. Signed-off-by: Srinath Mannam Based-on-patch-by: Oza Pawandeep Reviewed-by: Oza Pawandeep --- drivers/iommu/dma-iommu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d19f3d6..fb42d7c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -212,6 +212,7 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus); struct resource_entry *window; unsigned long lo, hi; + phys_addr_t start = 0, end; resource_list_for_each_entry(window, >windows) { if (resource_type(window->res) != IORESOURCE_MEM) @@ -221,6 +222,24 @@ static void iova_reserve_pci_windows(struct pci_dev *dev, hi = iova_pfn(iovad, window->res->end - window->offset); reserve_iova(iovad, lo, hi); } + + /* Get reserved DMA windows from host bridge */ + resource_list_for_each_entry(window, >dma_ranges) { + end = window->res->start - window->offset; +resv_iova: + if (end - start) { + lo = iova_pfn(iovad, start); + hi = iova_pfn(iovad, end); + reserve_iova(iovad, lo, hi); + } + start = window->res->end - window->offset + 1; + /* If window is last entry */ + if (window->node.next == >dma_ranges && + end != DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE)) { I still think that's a very silly way to write "~(dma_addr_t)0", but otherwise, Acked-by: Robin Murphy + end = DMA_BIT_MASK(sizeof(dma_addr_t) * BITS_PER_BYTE); + goto resv_iova; + } + } } static int iova_reserve_iommu_regions(struct device *dev, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi Julien, On 29/04/2019 15:44, Julien Grall wrote: > Hi all, > > On RT, the function iommu_dma_map_msi_msg expects to be called from > preemptible > context. However, this is not always the case resulting a splat with > !CONFIG_DEBUG_ATOMIC_SLEEP: > > [ 48.875777] BUG: sleeping function called from invalid context at > kernel/locking/rtmutex.c:974 > [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip > [ 48.875782] INFO: lockdep is turned off. > [ 48.875784] irq event stamp: 10684 > [ 48.875786] hardirqs last enabled at (10683): [] > _raw_spin_unlock_irqrestore+0x88/0x90 > [ 48.875791] hardirqs last disabled at (10684): [] > _raw_spin_lock_irqsave+0x24/0x68 > [ 48.875796] softirqs last enabled at (0): [] > copy_process.isra.1.part.2+0x8d8/0x1970 > [ 48.875801] softirqs last disabled at (0): [<>] > (null) > [ 48.875805] Preemption disabled at: > [ 48.875805] [] __setup_irq+0xd8/0x6c0 > [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted > 5.0.3-rt1-7-g42ede9a0fed6 #45 > [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno > Development Platform, BIOS EDK II Jan 23 2017 > [ 48.875817] Call trace: > [ 48.875818] dump_backtrace+0x0/0x140 > [ 48.875821] show_stack+0x14/0x20 > [ 48.875823] dump_stack+0xa0/0xd4 > [ 48.875827] ___might_sleep+0x16c/0x1f8 > [ 48.875831] rt_spin_lock+0x5c/0x70 > [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 > [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 > [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 > [ 48.875846] msi_domain_activate+0x38/0x98 > [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 > [ 48.875852] irq_domain_activate_irq+0x34/0x58 > [ 48.875855] irq_activate+0x28/0x30 > [ 48.875858] __setup_irq+0x2b0/0x6c0 > [ 48.875861] request_threaded_irq+0xdc/0x188 > [ 48.875865] sky2_setup_irq+0x44/0xf8 > [ 48.875868] sky2_open+0x1a4/0x240 > [ 48.875871] __dev_open+0xd8/0x188 > [ 48.875874] __dev_change_flags+0x164/0x1f0 > [ 48.875877] dev_change_flags+0x20/0x60 > [ 48.875879] do_setlink+0x2a0/0xd30 > [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 > [ 48.875885] rtnl_newlink+0x50/0x78 > [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 > [ 48.875891] netlink_rcv_skb+0x58/0x118 > [ 48.875893] rtnetlink_rcv+0x14/0x20 > [ 48.875896] netlink_unicast+0x188/0x200 > [ 48.875898] netlink_sendmsg+0x248/0x3d8 > [ 48.875900] sock_sendmsg+0x18/0x40 > [ 48.875904] ___sys_sendmsg+0x294/0x2d0 > [ 48.875908] __sys_sendmsg+0x68/0xb8 > [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 > [ 48.875914] el0_svc_common+0x90/0x118 > [ 48.875918] el0_svc_handler+0x2c/0x80 > [ 48.875922] el0_svc+0x8/0xc > > This series is a first attempt to rework how MSI are mapped and composed > when an IOMMU is present. > > I was able to test the changes in GICv2m and GICv3 ITS. I don't have > hardware for the other interrupt controllers. > > Cheers, > > Julien Grall (7): > genirq/msi: Add a new field in msi_desc to store an IOMMU cookie > iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts > irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() > irqchip/gic-v3-its: Don't map the MSI page in > its_irq_compose_msi_msg() > irqchip/ls-scfg-msi: Don't map the MSI page in > ls_scfg_msi_compose_msg() > irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, > s}i_msg() > iommu/dma-iommu: Remove iommu_dma_map_msi_msg() > > drivers/iommu/Kconfig | 1 + > drivers/iommu/dma-iommu.c | 49 > +++ > drivers/irqchip/irq-gic-v2m.c | 8 ++- > drivers/irqchip/irq-gic-v3-its.c | 5 +++- > drivers/irqchip/irq-gic-v3-mbi.c | 15 ++-- > drivers/irqchip/irq-ls-scfg-msi.c | 7 +- > include/linux/dma-iommu.h | 22 -- > include/linux/msi.h | 26 + > kernel/irq/Kconfig| 3 +++ > 9 files changed, 109 insertions(+), 27 deletions(-) Thanks for having reworked this. I'm quite happy with the way this looks now (modulo the couple of nits Robin and I mentioned, which I'm to address myself). Jorg: are you OK with this going via the irq tree? Thanks, M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On 29/04/2019 15:44, Julien Grall wrote: > On RT, iommu_dma_map_msi_msg() may be called from non-preemptible > context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as > the function is using spin_lock (they can sleep on RT). > > iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT > and update the MSI message with the IOVA. > > Only the part to lookup for the MSI page requires to be called in > preemptible context. As the MSI page cannot change over the lifecycle > of the MSI interrupt, the lookup can be cached and re-used later on. > > iomma_dma_map_msi_msg() is now split in two functions: > - iommu_dma_prepare_msi(): This function will prepare the mapping > in the IOMMU and store the cookie in the structure msi_desc. This > function should be called in preemptible context. > - iommu_dma_compose_msi_msg(): This function will update the MSI > message with the IOVA when the device is behind an IOMMU. > > Signed-off-by: Julien Grall > > --- > Changes in v2: > - Rework the commit message to use imperative mood > - Use the MSI accessor to get/set the iommu cookie > - Don't use ternary on return > - Select CONFIG_IRQ_MSI_IOMMU > - Pass an msi_desc rather than the irq number > --- > drivers/iommu/Kconfig | 1 + > drivers/iommu/dma-iommu.c | 47 > ++- > include/linux/dma-iommu.h | 23 +++ > 3 files changed, 62 insertions(+), 9 deletions(-) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 6f07f3b21816..eb1c8cd243f9 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -94,6 +94,7 @@ config IOMMU_DMA > bool > select IOMMU_API > select IOMMU_IOVA > + select IRQ_MSI_IOMMU > select NEED_SG_DMA_LENGTH > > config FSL_PAMU > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 77aabe637a60..2309f59cefa4 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page > *iommu_dma_get_msi_page(struct device *dev, > return NULL; > } > > -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) > +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) > { > - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); > + struct device *dev = msi_desc_to_dev(desc); > struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > struct iommu_dma_cookie *cookie; > struct iommu_dma_msi_page *msi_page; > - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; > unsigned long flags; > > - if (!domain || !domain->iova_cookie) > - return; > + if (!domain || !domain->iova_cookie) { > + desc->iommu_cookie = NULL; nit: This could be msi_desc_set_iommu_cookie(desc, NULL); now that you have introduced the relevant accessors. > + return 0; > + } > > cookie = domain->iova_cookie; > > @@ -911,7 +912,37 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) > msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); > spin_unlock_irqrestore(>msi_lock, flags); > > - if (WARN_ON(!msi_page)) { > + msi_desc_set_iommu_cookie(desc, msi_page); > + > + if (!msi_page) > + return -ENOMEM; > + else > + return 0; > +} > + > +void iommu_dma_compose_msi_msg(struct msi_desc *desc, > +struct msi_msg *msg) > +{ > + struct device *dev = msi_desc_to_dev(desc); > + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + const struct iommu_dma_msi_page *msi_page; > + > + msi_page = msi_desc_get_iommu_cookie(desc); > + > + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) > + return; > + > + msg->address_hi = upper_32_bits(msi_page->iova); > + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; > + msg->address_lo += lower_32_bits(msi_page->iova); > +} > + > +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) > +{ > + struct msi_desc *desc = irq_get_msi_desc(irq); > + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; > + > + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { > /* >* We're called from a void callback, so the best we can do is >* 'fail' by filling the message with obviously bogus values. > @@ -922,8 +953,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) > msg->address_lo = ~0U; > msg->data = ~0U; > } else { > - msg->address_hi = upper_32_bits(msi_page->iova); > - msg->address_lo &= cookie_msi_granule(cookie) - 1; > - msg->address_lo += lower_32_bits(msi_page->iova); > + iommu_dma_compose_msi_msg(desc, msg); > } > } >
Re: [PATCH v2 15/19] iommu/vt-d: Add bind guest PASID support
On Fri, 26 Apr 2019 18:15:27 +0200 Auger Eric wrote: > Hi Jacob, > > On 4/24/19 1:31 AM, Jacob Pan wrote: > > When supporting guest SVA with emulated IOMMU, the guest PASID > > table is shadowed in VMM. Updates to guest vIOMMU PASID table > > will result in PASID cache flush which will be passed down to > > the host as bind guest PASID calls. > > > > For the SL page tables, it will be harvested from device's > > default domain (request w/o PASID), or aux domain in case of > > mediated device. > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > > vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | | |SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > > > Signed-off-by: Jacob Pan > > Signed-off-by: Liu, Yi L > > --- > > drivers/iommu/intel-iommu.c | 4 + > > drivers/iommu/intel-svm.c | 174 > > > > include/linux/intel-iommu.h | 10 ++- include/linux/intel-svm.h > > | 7 ++ 4 files changed, 193 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 77bbe1b..89989b5 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -5768,6 +5768,10 @@ const struct iommu_ops intel_iommu_ops = { > > .dev_enable_feat= intel_iommu_dev_enable_feat, > > .dev_disable_feat = intel_iommu_dev_disable_feat, > > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > > +#ifdef CONFIG_INTEL_IOMMU_SVM > > + .sva_bind_gpasid= intel_svm_bind_gpasid, > > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > > +#endif > > }; > > > > static void quirk_iommu_g4x_gfx(struct pci_dev *dev) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > > index 8fff212..0a973c2 100644 > > --- a/drivers/iommu/intel-svm.c > > +++ b/drivers/iommu/intel-svm.c > > @@ -227,6 +227,180 @@ static const struct mmu_notifier_ops > > intel_mmuops = { > > static DEFINE_MUTEX(pasid_mutex); > > static LIST_HEAD(global_svm_list); > > +#define for_each_svm_dev() \ > > + list_for_each_entry(sdev, >devs, list) \ > > + if (dev == sdev->dev) \ > > + > > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > > + struct device *dev, > > + struct gpasid_bind_data *data) > > +{ > > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > > + struct intel_svm_dev *sdev; > > + struct intel_svm *svm = NULL; > > + struct dmar_domain *ddomain; > > + int pasid_max; > > + int ret = 0; > > + > > + if (WARN_ON(!iommu) || !data) > > + return -EINVAL; > > + > > + if (dev_is_pci(dev)) { > > + pasid_max = pci_max_pasids(to_pci_dev(dev)); > > + if (pasid_max < 0) > > + return -EINVAL; > > + } else > > + pasid_max = 1 << 20; > > + > > + if (data->pasid <= 0 || data->pasid >= pasid_max) > > + return -EINVAL; > > + > > + ddomain = to_dmar_domain(domain); > > + /* REVISIT: > > +* Sanity check adddress width and paging mode support > > +* width matching in two dimensions: > > +* 1. paging mode CPU <= IOMMU > > +* 2. address width Guest <= Host. > > +*/ > > + mutex_lock(_mutex); > > + svm = ioasid_find(NULL, data->pasid, NULL); > > + if (IS_ERR(svm)) { > > + ret = PTR_ERR(svm); > > + goto out; > > + } > > + if (svm) { > > + if (list_empty(>devs)) { > > + dev_err(dev, "GPASID %d has no devices > > bond but SVA is allocated\n", > > + data->pasid); > > + ret = -ENODEV; /* > > + * If we found svm for the > > PASID, there must be at > > + * least one device bond, > > otherwise svm should be freed. > > + */ > comment should be put after list_empty I think. In which circumstances > can it happen, I mean, isn't it a BUG_ON case? Well, I think failing to bind guest PASID is not severe enough to the host to use
Re: [PATCH v2 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
On 29/04/2019 15:44, Julien Grall wrote: A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Yay! The end of my horrible bodge :) Reviewed-by: Robin Murphy Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2309f59cefa4..12f4464828a4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -936,23 +936,3 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(desc, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 3fc48fbd6f63..ddd217c197df 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -82,7 +82,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -122,10 +121,6 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On 29/04/2019 15:44, Julien Grall wrote: On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood - Use the MSI accessor to get/set the iommu cookie - Don't use ternary on return - Select CONFIG_IRQ_MSI_IOMMU - Pass an msi_desc rather than the irq number --- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 47 ++- include/linux/dma-iommu.h | 23 +++ 3 files changed, 62 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..eb1c8cd243f9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ config IOMMU_DMA bool select IOMMU_API select IOMMU_IOVA + select IRQ_MSI_IOMMU select NEED_SG_DMA_LENGTH config FSL_PAMU diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..2309f59cefa4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -911,7 +912,37 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(>msi_lock, flags); - if (WARN_ON(!msi_page)) { + msi_desc_set_iommu_cookie(desc, msi_page); + + if (!msi_page) + return -ENOMEM; + else Nit: the "else" isn't really necessary. + return 0; +} + +void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) +{ + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page; + + msi_page = msi_desc_get_iommu_cookie(desc); + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +953,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + iommu_dma_compose_msi_msg(desc, msg); } } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index e760dc5d1fa8..3fc48fbd6f63 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -71,12 +71,24 @@ void
Re: [PATCH v2 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
On 29/04/2019 15:44, Julien Grall wrote: When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Reviewed-by: Robin Murphy Signed-off-by: Julien Grall --- Changes in v2: - Update the commit message to use imperative mood - Protect the field with a new config that will be selected by IOMMU_DMA later on - Add a set of helpers to access the new field --- include/linux/msi.h | 26 ++ kernel/irq/Kconfig | 3 +++ 2 files changed, 29 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..82a308c19222 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IRQ_MSI_IOMMU + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ @@ -119,6 +122,29 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev) \ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#ifdef CONFIG_IRQ_MSI_IOMMU +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return desc->iommu_cookie; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ + desc->iommu_cookie = iommu_cookie; +} +#else +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return NULL; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ +} +#endif + #ifdef CONFIG_PCI_MSI #define first_pci_msi_entry(pdev) first_msi_entry(&(pdev)->dev) #define for_each_pci_msi_entry(desc, pdev)\ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 5f3e2baefca9..8fee06625c37 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -91,6 +91,9 @@ config GENERIC_MSI_IRQ_DOMAIN select IRQ_DOMAIN_HIERARCHY select GENERIC_MSI_IRQ +config IRQ_MSI_IOMMU + bool + config HANDLE_DOMAIN_IRQ bool ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: implement generic dma_map_ops for IOMMUs v3
On 22/04/2019 18:59, Christoph Hellwig wrote: Hi Robin, please take a look at this series, which implements a completely generic set of dma_map_ops for IOMMU drivers. This is done by taking the existing arm64 code, moving it to drivers/iommu and then massaging it so that it can also work for architectures with DMA remapping. This should help future ports to support IOMMUs more easily, and also allow to remove various custom IOMMU dma_map_ops implementations, like Tom was planning to for the AMD one. Modulo a few nits, I think this looks pretty much good to go, and it would certainly be good to get as much as reasonable queued soon for the sake of progress elsewhere. Catalin, Will, I think the arm64 parts are all OK but please take a look to confirm. Thanks, Robin. A git tree is also available at: git://git.infradead.org/users/hch/misc.git dma-iommu-ops.3 Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3 Changes since v2: - address various review comments and include patches from Robin Changes since v1: - only include other headers in dma-iommu.h if CONFIG_DMA_IOMMU is enabled - keep using a scatterlist in iommu_dma_alloc - split out mmap/sgtable fixes and move them early in the series - updated a few commit logs ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 26/26] arm64: trim includes in dma-mapping.c
On 22/04/2019 18:59, Christoph Hellwig wrote: With most of the previous functionality now elsewhere a lot of the headers included in this file are not needed. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 184ef9ccd69d..15bd768ceb7e 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -5,20 +5,9 @@ */ #include -#include -#include #include -#include -#include -#include -#include #include -#include #include -#include -#include -#include - Nit: please keep the blank line between linux/ and asm/ include blocks to match the predominant local style. With that, Reviewed-by: Robin Murphy #include pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 20/26] iommu/dma: Refactor iommu_dma_alloc, part 2
On 22/04/2019 18:59, Christoph Hellwig wrote: From: Robin Murphy Honestly I don't think anything left of my patch here... Apart from the iommu_dma_alloc_remap() case which remains sufficiently different that it's better off being self-contained, the rest of the logic can now be consolidated into a single flow which separates the logcially-distinct steps of allocating pages, getting the CPU address, and finally getting the IOMMU address. ...and it certainly doesn't do that any more. It's clear that we have fundamentally different ways of reading code, so I don't think it's productive to keep arguing personal preference - I still find the end result here a fair bit more tolerable than before, so if you update the commit message to reflect the actual change (at which point there's really nothing left of my authorship) I can live with it. Robin. Signed-off-by: Robin Murphy [hch: split the page allocation into a new helper to simplify the flow] Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 65 +-- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9b269f0792f3..acdfe866cb29 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -955,35 +955,14 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, __iommu_dma_free(dev, size, cpu_addr); } -static void *iommu_dma_alloc(struct device *dev, size_t size, - dma_addr_t *handle, gfp_t gfp, unsigned long attrs) +static void *iommu_dma_alloc_pages(struct device *dev, size_t size, + struct page **pagep, gfp_t gfp, unsigned long attrs) { bool coherent = dev_is_dma_coherent(dev); - int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); size_t alloc_size = PAGE_ALIGN(size); struct page *page = NULL; void *cpu_addr; - gfp |= __GFP_ZERO; - - if (gfpflags_allow_blocking(gfp) && - !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) - return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs); - - if (!gfpflags_allow_blocking(gfp) && !coherent) { - cpu_addr = dma_alloc_from_pool(alloc_size, , gfp); - if (!cpu_addr) - return NULL; - - *handle = __iommu_dma_map(dev, page_to_phys(page), size, - ioprot); - if (*handle == DMA_MAPPING_ERROR) { - dma_free_from_pool(cpu_addr, alloc_size); - return NULL; - } - return cpu_addr; - } - if (gfpflags_allow_blocking(gfp)) page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT, get_order(alloc_size), @@ -993,33 +972,59 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (!page) return NULL; - *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot); - if (*handle == DMA_MAPPING_ERROR) - goto out_free_pages; - if (!coherent || PageHighMem(page)) { pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); cpu_addr = dma_common_contiguous_remap(page, alloc_size, VM_USERMAP, prot, __builtin_return_address(0)); if (!cpu_addr) - goto out_unmap; + goto out_free_pages; if (!coherent) arch_dma_prep_coherent(page, size); } else { cpu_addr = page_address(page); } + + *pagep = page; memset(cpu_addr, 0, alloc_size); return cpu_addr; -out_unmap: - __iommu_dma_unmap(dev, *handle, size); out_free_pages: if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT)) __free_pages(page, get_order(alloc_size)); return NULL; } +static void *iommu_dma_alloc(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, unsigned long attrs) +{ + bool coherent = dev_is_dma_coherent(dev); + int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); + struct page *page = NULL; + void *cpu_addr; + + gfp |= __GFP_ZERO; + + if (gfpflags_allow_blocking(gfp) && + !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) + return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs); + + if (!gfpflags_allow_blocking(gfp) && !coherent) + cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), , gfp); + else + cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs); + if (!cpu_addr) + return NULL; + + *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot); + if (*handle == DMA_MAPPING_ERROR) { +
[PATCH v2 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message --- drivers/iommu/dma-iommu.c | 20 include/linux/dma-iommu.h | 5 - 2 files changed, 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2309f59cefa4..12f4464828a4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -936,23 +936,3 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; msg->address_lo += lower_32_bits(msi_page->iova); } - -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ - struct msi_desc *desc = irq_get_msi_desc(irq); - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; - - if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { - /* -* We're called from a void callback, so the best we can do is -* 'fail' by filling the message with obviously bogus values. -* Since we got this far due to an IOMMU being present, it's -* not like the existing address would have worked anyway... -*/ - msg->address_hi = ~0U; - msg->address_lo = ~0U; - msg->data = ~0U; - } else { - iommu_dma_compose_msi_msg(desc, msg); - } -} diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 3fc48fbd6f63..ddd217c197df 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -82,7 +82,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg); void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); #else @@ -122,10 +121,6 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, { } -static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 23/26] iommu/dma: Don't depend on CONFIG_DMA_DIRECT_REMAP
On 22/04/2019 18:59, Christoph Hellwig wrote: For entirely dma coherent architectures there is no requirement to ever remap dma coherent allocation. Move all the remap and pool code under IS_ENABLED() checks and drop the Kconfig dependency. Reviewed-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/iommu/Kconfig | 1 - drivers/iommu/dma-iommu.c | 16 +--- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index bdc14baf2ee5..6f07f3b21816 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -95,7 +95,6 @@ config IOMMU_DMA select IOMMU_API select IOMMU_IOVA select NEED_SG_DMA_LENGTH - depends on DMA_DIRECT_REMAP config FSL_PAMU bool "Freescale IOMMU support" diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8fc6098c1eeb..278a9a960107 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -923,10 +923,11 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr) struct page *page = NULL; /* Non-coherent atomic allocation? Easy */ - if (dma_free_from_pool(cpu_addr, alloc_size)) + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + dma_free_from_pool(cpu_addr, alloc_size)) return; - if (is_vmalloc_addr(cpu_addr)) { + if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { /* * If it the address is remapped, then it's either non-coherent * or highmem CMA, or an iommu_dma_alloc_remap() construction. @@ -972,7 +973,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, size_t size, if (!page) return NULL; - if (!coherent || PageHighMem(page)) { + if (IS_ENABLED(CONFIG_DMA_REMAP) && (!coherent || PageHighMem(page))) { pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); cpu_addr = dma_common_contiguous_remap(page, alloc_size, @@ -1005,11 +1006,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, gfp |= __GFP_ZERO; - if (gfpflags_allow_blocking(gfp) && + if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) && !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs); - if (!gfpflags_allow_blocking(gfp) && !coherent) + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + !gfpflags_allow_blocking(gfp) && !coherent) cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), , gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs); @@ -1041,7 +1043,7 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, if (off >= nr_pages || vma_pages(vma) > nr_pages - off) return -ENXIO; - if (is_vmalloc_addr(cpu_addr)) { + if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { struct page **pages = __iommu_dma_get_pages(cpu_addr); if (pages) @@ -1063,7 +1065,7 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, struct page *page; int ret; - if (is_vmalloc_addr(cpu_addr)) { + if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { struct page **pages = __iommu_dma_get_pages(cpu_addr); if (pages) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
ls_scfg_msi_compose_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The FreeScale SCFG MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-ls-scfg-msi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-ls-scfg-msi.c b/drivers/irqchip/irq-ls-scfg-msi.c index c671b3212010..669d29105772 100644 --- a/drivers/irqchip/irq-ls-scfg-msi.c +++ b/drivers/irqchip/irq-ls-scfg-msi.c @@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) msg->data |= cpumask_first(mask); } - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static int ls_scfg_msi_set_affinity(struct irq_data *irq_data, @@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct ls_scfg_msi *msi_data = domain->host_data; int pos, err = 0; @@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain *domain, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr); + if (err) + return err; + irq_domain_set_info(domain, virq, pos, _scfg_msi_parent_chip, msi_data, handle_simple_irq, NULL, NULL); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
its_irq_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 ITS driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI maping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-its.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 7577755bdcf4..12ddbcfe1b1e 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_map_msi_msg(d->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct its_node *its = its_dev->its; irq_hw_number_t hwirq; int err; int i; @@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); + for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood - Use the MSI accessor to get/set the iommu cookie - Don't use ternary on return - Select CONFIG_IRQ_MSI_IOMMU - Pass an msi_desc rather than the irq number --- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 47 ++- include/linux/dma-iommu.h | 23 +++ 3 files changed, 62 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 6f07f3b21816..eb1c8cd243f9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ config IOMMU_DMA bool select IOMMU_API select IOMMU_IOVA + select IRQ_MSI_IOMMU select NEED_SG_DMA_LENGTH config FSL_PAMU diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..2309f59cefa4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -911,7 +912,37 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(>msi_lock, flags); - if (WARN_ON(!msi_page)) { + msi_desc_set_iommu_cookie(desc, msi_page); + + if (!msi_page) + return -ENOMEM; + else + return 0; +} + +void iommu_dma_compose_msi_msg(struct msi_desc *desc, + struct msi_msg *msg) +{ + struct device *dev = msi_desc_to_dev(desc); + const struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + const struct iommu_dma_msi_page *msi_page; + + msi_page = msi_desc_get_iommu_cookie(desc); + + if (!domain || !domain->iova_cookie || WARN_ON(!msi_page)) + return; + + msg->address_hi = upper_32_bits(msi_page->iova); + msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1; + msg->address_lo += lower_32_bits(msi_page->iova); +} + +void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_get_msi_desc(irq); + phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; + + if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) { /* * We're called from a void callback, so the best we can do is * 'fail' by filling the message with obviously bogus values. @@ -922,8 +953,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) msg->address_lo = ~0U; msg->data = ~0U; } else { - msg->address_hi = upper_32_bits(msi_page->iova); - msg->address_lo &= cookie_msi_granule(cookie) - 1; - msg->address_lo += lower_32_bits(msi_page->iova); + iommu_dma_compose_msi_msg(desc, msg); } } diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index e760dc5d1fa8..3fc48fbd6f63 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -71,12 +71,24 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir,
[PATCH v2 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi all, On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible context. However, this is not always the case resulting a splat with !CONFIG_DEBUG_ATOMIC_SLEEP: [ 48.875777] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:974 [ 48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip [ 48.875782] INFO: lockdep is turned off. [ 48.875784] irq event stamp: 10684 [ 48.875786] hardirqs last enabled at (10683): [] _raw_spin_unlock_irqrestore+0x88/0x90 [ 48.875791] hardirqs last disabled at (10684): [] _raw_spin_lock_irqsave+0x24/0x68 [ 48.875796] softirqs last enabled at (0): [] copy_process.isra.1.part.2+0x8d8/0x1970 [ 48.875801] softirqs last disabled at (0): [<>] (null) [ 48.875805] Preemption disabled at: [ 48.875805] [] __setup_irq+0xd8/0x6c0 [ 48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 5.0.3-rt1-7-g42ede9a0fed6 #45 [ 48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 [ 48.875817] Call trace: [ 48.875818] dump_backtrace+0x0/0x140 [ 48.875821] show_stack+0x14/0x20 [ 48.875823] dump_stack+0xa0/0xd4 [ 48.875827] ___might_sleep+0x16c/0x1f8 [ 48.875831] rt_spin_lock+0x5c/0x70 [ 48.875835] iommu_dma_map_msi_msg+0x5c/0x1d8 [ 48.875839] gicv2m_compose_msi_msg+0x3c/0x48 [ 48.875843] irq_chip_compose_msi_msg+0x40/0x58 [ 48.875846] msi_domain_activate+0x38/0x98 [ 48.875849] __irq_domain_activate_irq+0x58/0xa0 [ 48.875852] irq_domain_activate_irq+0x34/0x58 [ 48.875855] irq_activate+0x28/0x30 [ 48.875858] __setup_irq+0x2b0/0x6c0 [ 48.875861] request_threaded_irq+0xdc/0x188 [ 48.875865] sky2_setup_irq+0x44/0xf8 [ 48.875868] sky2_open+0x1a4/0x240 [ 48.875871] __dev_open+0xd8/0x188 [ 48.875874] __dev_change_flags+0x164/0x1f0 [ 48.875877] dev_change_flags+0x20/0x60 [ 48.875879] do_setlink+0x2a0/0xd30 [ 48.875882] __rtnl_newlink+0x5b4/0x6d8 [ 48.875885] rtnl_newlink+0x50/0x78 [ 48.875888] rtnetlink_rcv_msg+0x178/0x640 [ 48.875891] netlink_rcv_skb+0x58/0x118 [ 48.875893] rtnetlink_rcv+0x14/0x20 [ 48.875896] netlink_unicast+0x188/0x200 [ 48.875898] netlink_sendmsg+0x248/0x3d8 [ 48.875900] sock_sendmsg+0x18/0x40 [ 48.875904] ___sys_sendmsg+0x294/0x2d0 [ 48.875908] __sys_sendmsg+0x68/0xb8 [ 48.875911] __arm64_sys_sendmsg+0x20/0x28 [ 48.875914] el0_svc_common+0x90/0x118 [ 48.875918] el0_svc_handler+0x2c/0x80 [ 48.875922] el0_svc+0x8/0xc This series is a first attempt to rework how MSI are mapped and composed when an IOMMU is present. I was able to test the changes in GICv2m and GICv3 ITS. I don't have hardware for the other interrupt controllers. Cheers, Julien Grall (7): genirq/msi: Add a new field in msi_desc to store an IOMMU cookie iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() iommu/dma-iommu: Remove iommu_dma_map_msi_msg() drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 49 +++ drivers/irqchip/irq-gic-v2m.c | 8 ++- drivers/irqchip/irq-gic-v3-its.c | 5 +++- drivers/irqchip/irq-gic-v3-mbi.c | 15 ++-- drivers/irqchip/irq-ls-scfg-msi.c | 7 +- include/linux/dma-iommu.h | 22 -- include/linux/msi.h | 26 + kernel/irq/Kconfig| 3 +++ 9 files changed, 109 insertions(+), 27 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent patch split iommu_dma_map_msi_msg in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv3 MSI driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the two MSI mappings when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v3-mbi.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c index fbfa7ff6deb1..d50f6cdf043c 100644 --- a/drivers/irqchip/irq-gic-v3-mbi.c +++ b/drivers/irqchip/irq-gic-v3-mbi.c @@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int hwirq, static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct mbi_range *mbi = NULL; int hwirq, offset, i, err = 0; @@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = mbi->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_CLRSPI_NSR); + if (err) + return err; + + err = iommu_dma_prepare_msi(info->desc, + mbi_phys_base + GICD_SETSPI_NSR); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) @@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR); msg[0].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } #ifdef CONFIG_PCI_MSI @@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, struct msi_msg *msg) msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR); msg[1].data = data->parent_data->hwirq; - iommu_dma_map_msi_msg(data->irq, [1]); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), [1]); } /* Platform-MSI specific irqchip */ -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
gicv2m_compose_msi_msg() may be called from non-preemptible context. However, on RT, iommu_dma_map_msi_msg() requires to be called from a preemptible context. A recent change split iommu_dma_map_msi_msg() in two new functions: one that should be called in preemptible context, the other does not have any requirement. The GICv2m driver is reworked to avoid executing preemptible code in non-preemptible context. This can be achieved by preparing the MSI mapping when allocating the MSI interrupt. Signed-off-by: Julien Grall --- Changes in v2: - Rework the commit message to use imperative mood --- drivers/irqchip/irq-gic-v2m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..4359f0583377 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET) msg->data -= v2m->spi_offset; - iommu_dma_map_msi_msg(data->irq, msg); + iommu_dma_compose_msi_msg(irq_data_get_msi_desc(data), msg); } static struct irq_chip gicv2m_irq_chip = { @@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, unsigned int hwirq, static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { + msi_alloc_info_t *info = args; struct v2m_data *v2m = NULL, *tmp; int hwirq, offset, i, err = 0; @@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, hwirq = v2m->spi_start + offset; + err = iommu_dma_prepare_msi(info->desc, + v2m->res.start + V2M_MSI_SETSPI_NS); + if (err) + return err; + for (i = 0; i < nr_irqs; i++) { err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i); if (err) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Signed-off-by: Julien Grall --- Changes in v2: - Update the commit message to use imperative mood - Protect the field with a new config that will be selected by IOMMU_DMA later on - Add a set of helpers to access the new field --- include/linux/msi.h | 26 ++ kernel/irq/Kconfig | 3 +++ 2 files changed, 29 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index 7e9b81c3b50d..82a308c19222 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -77,6 +77,9 @@ struct msi_desc { struct device *dev; struct msi_msg msg; struct irq_affinity_desc*affinity; +#ifdef CONFIG_IRQ_MSI_IOMMU + const void *iommu_cookie; +#endif union { /* PCI MSI/X specific data */ @@ -119,6 +122,29 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#ifdef CONFIG_IRQ_MSI_IOMMU +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return desc->iommu_cookie; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ + desc->iommu_cookie = iommu_cookie; +} +#else +static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) +{ + return NULL; +} + +static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, +const void *iommu_cookie) +{ +} +#endif + #ifdef CONFIG_PCI_MSI #define first_pci_msi_entry(pdev) first_msi_entry(&(pdev)->dev) #define for_each_pci_msi_entry(desc, pdev) \ diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 5f3e2baefca9..8fee06625c37 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -91,6 +91,9 @@ config GENERIC_MSI_IRQ_DOMAIN select IRQ_DOMAIN_HIERARCHY select GENERIC_MSI_IRQ +config IRQ_MSI_IOMMU + bool + config HANDLE_DOMAIN_IRQ bool -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline
On Fri, Apr 26, 2019 at 11:55:12AM -0400, Qian Cai wrote: > https://git.sr.ht/~cai/linux-debug/blob/master/dmesg Thanks, I can't see any definitions for unity ranges or exclusion ranges in the IVRS table dump, which makes it even more weird. Can you please send me the output of for f in `ls -1 /sys/kernel/iommu_groups/*/reserved_regions`; do echo "---$f"; cat $f;done to double-check? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/26] iommu/dma: Cleanup variable naming in iommu_dma_alloc
On 22/04/2019 18:59, Christoph Hellwig wrote: From: Robin Murphy Most importantly clear up the size / iosize confusion. Also rename addr to cpu_addr to match the surrounding code and make the intention a little more clear. Signed-off-by: Robin Murphy [hch: split from a larger patch] I can't bring myself to actually ack "my" patch, but I am perfectly happy with the split :) Robin. Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 45 +++ 1 file changed, 22 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 95a12e975994..9b269f0792f3 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -960,64 +960,63 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, { bool coherent = dev_is_dma_coherent(dev); int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); - size_t iosize = size; + size_t alloc_size = PAGE_ALIGN(size); struct page *page = NULL; - void *addr; + void *cpu_addr; - size = PAGE_ALIGN(size); gfp |= __GFP_ZERO; if (gfpflags_allow_blocking(gfp) && !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) - return iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs); + return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs); if (!gfpflags_allow_blocking(gfp) && !coherent) { - addr = dma_alloc_from_pool(size, , gfp); - if (!addr) + cpu_addr = dma_alloc_from_pool(alloc_size, , gfp); + if (!cpu_addr) return NULL; - *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, + *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot); if (*handle == DMA_MAPPING_ERROR) { - dma_free_from_pool(addr, size); + dma_free_from_pool(cpu_addr, alloc_size); return NULL; } - return addr; + return cpu_addr; } if (gfpflags_allow_blocking(gfp)) - page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT, -get_order(size), + page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT, +get_order(alloc_size), gfp & __GFP_NOWARN); if (!page) - page = alloc_pages(gfp, get_order(size)); + page = alloc_pages(gfp, get_order(alloc_size)); if (!page) return NULL; - *handle = __iommu_dma_map(dev, page_to_phys(page), iosize, ioprot); + *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot); if (*handle == DMA_MAPPING_ERROR) goto out_free_pages; if (!coherent || PageHighMem(page)) { pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); - addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot, - __builtin_return_address(0)); - if (!addr) + cpu_addr = dma_common_contiguous_remap(page, alloc_size, + VM_USERMAP, prot, __builtin_return_address(0)); + if (!cpu_addr) goto out_unmap; if (!coherent) - arch_dma_prep_coherent(page, iosize); + arch_dma_prep_coherent(page, size); } else { - addr = page_address(page); + cpu_addr = page_address(page); } - memset(addr, 0, size); - return addr; + memset(cpu_addr, 0, alloc_size); + return cpu_addr; out_unmap: - __iommu_dma_unmap(dev, *handle, iosize); + __iommu_dma_unmap(dev, *handle, size); out_free_pages: - if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT)) - __free_pages(page, get_order(size)); + if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT)) + __free_pages(page, get_order(alloc_size)); return NULL; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 21/26] iommu/dma: Refactor iommu_dma_get_sgtable
On 22/04/2019 18:59, Christoph Hellwig wrote: Inline __iommu_dma_get_sgtable_page into the main function, and use the fact that __iommu_dma_get_pages return NULL for remapped contigous allocations to simplify the code flow a bit. Yeah, even I was a bit dubious about the readability of "if (page)... else if (pages)..." that my attempt ended up with, so I don't really have anything to complain about here. Reviewed-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 45 +++ 1 file changed, 17 insertions(+), 28 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index acdfe866cb29..138b85e675c8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1070,42 +1070,31 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, return __iommu_dma_mmap(pages, size, vma); } -static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page *page, - size_t size) -{ - int ret = sg_alloc_table(sgt, 1, GFP_KERNEL); - - if (!ret) - sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); - return ret; -} - static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; - struct page **pages; + struct page *page; + int ret; - if (!is_vmalloc_addr(cpu_addr)) { - struct page *page = virt_to_page(cpu_addr); - return __iommu_dma_get_sgtable_page(sgt, page, size); - } + if (is_vmalloc_addr(cpu_addr)) { + struct page **pages = __iommu_dma_get_pages(cpu_addr); - if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) { - /* -* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped, -* hence in the vmalloc space. -*/ - struct page *page = vmalloc_to_page(cpu_addr); - return __iommu_dma_get_sgtable_page(sgt, page, size); + if (pages) { + return sg_alloc_table_from_pages(sgt, pages, + PAGE_ALIGN(size) >> PAGE_SHIFT, + 0, size, GFP_KERNEL); + } + + page = vmalloc_to_page(cpu_addr); + } else { + page = virt_to_page(cpu_addr); } - pages = __iommu_dma_get_pages(cpu_addr); - if (WARN_ON_ONCE(!pages)) - return -ENXIO; - return sg_alloc_table_from_pages(sgt, pages, count, 0, size, -GFP_KERNEL); + ret = sg_alloc_table(sgt, 1, GFP_KERNEL); + if (!ret) + sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); + return ret; } static const struct dma_map_ops iommu_dma_ops = { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 22/26] iommu/dma: Refactor iommu_dma_mmap
On 22/04/2019 18:59, Christoph Hellwig wrote: Inline __iommu_dma_mmap_pfn into the main function, and use the fact that __iommu_dma_get_pages return NULL for remapped contigous allocations to simplify the code flow a bit. ...and later we can squash __iommu_dma_mmap() once the dust settles on vm_map_pages() - seems good to me. Reviewed-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 36 +++- 1 file changed, 11 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 138b85e675c8..8fc6098c1eeb 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1025,21 +1025,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, return cpu_addr; } -static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma, - unsigned long pfn, size_t size) -{ - return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, - vma->vm_end - vma->vm_start, - vma->vm_page_prot); -} - static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; - unsigned long off = vma->vm_pgoff; - struct page **pages; + unsigned long pfn, off = vma->vm_pgoff; int ret; vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs); @@ -1050,24 +1041,19 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, if (off >= nr_pages || vma_pages(vma) > nr_pages - off) return -ENXIO; - if (!is_vmalloc_addr(cpu_addr)) { - unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr)); - return __iommu_dma_mmap_pfn(vma, pfn, size); - } + if (is_vmalloc_addr(cpu_addr)) { + struct page **pages = __iommu_dma_get_pages(cpu_addr); - if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) { - /* -* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped, -* hence in the vmalloc space. -*/ - unsigned long pfn = vmalloc_to_pfn(cpu_addr); - return __iommu_dma_mmap_pfn(vma, pfn, size); + if (pages) + return __iommu_dma_mmap(pages, size, vma); + pfn = vmalloc_to_pfn(cpu_addr); + } else { + pfn = page_to_pfn(virt_to_page(cpu_addr)); } - pages = __iommu_dma_get_pages(cpu_addr); - if (WARN_ON_ONCE(!pages)) - return -ENXIO; - return __iommu_dma_mmap(pages, size, vma); + return remap_pfn_range(vma, vma->vm_start, pfn + off, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); } static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 14/26] iommu/dma: Refactor iommu_dma_free
On 22/04/2019 18:59, Christoph Hellwig wrote: From: Robin Murphy The freeing logic was made particularly horrible by part of it being opaque to the arch wrapper, which led to a lot of convoluted repetition to ensure each path did everything in the right order. Now that it's all private, we can pick apart and consolidate the logically-distinct steps of freeing the IOMMU mapping, the underlying pages, and the CPU remap (if necessary) into something much more manageable. Signed-off-by: Robin Murphy [various cosmetic changes to the code flow] Hmm, I do still prefer my original flow with the dma_common_free_remap() call right out of the way at the end rather than being a special case in the middle of all the page-freeing (which is the kind of existing complexity I was trying to eliminate). I guess you've done this to avoid having "if (!dma_release_from_contiguous(...))..." twice like I ended up with, which is fair enough I suppose - once we manage to solve the new dma_{alloc,free}_contiguous() interface that may tip the balance so I can always revisit this then. Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 75 ++- 1 file changed, 35 insertions(+), 40 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 4632b9d301a1..9658c4cc3cfe 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -916,6 +916,41 @@ static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, __iommu_dma_unmap(dev, handle, size); } +static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, unsigned long attrs) +{ + size_t alloc_size = PAGE_ALIGN(size); + int count = alloc_size >> PAGE_SHIFT; + struct page *page = NULL; + + __iommu_dma_unmap(dev, handle, size); + + /* Non-coherent atomic allocation? Easy */ + if (dma_free_from_pool(cpu_addr, alloc_size)) + return; + + if (is_vmalloc_addr(cpu_addr)) { + /* +* If it the address is remapped, then it's either non-coherent s/If it/If/ My "easy; more involved; most complex" narrative definitely doesn't translate very well to a different order :) Otherwise, Reluctantly-Acked-by: Robin Murphy +* or highmem CMA, or an iommu_dma_alloc_remap() construction. +*/ + struct page **pages = __iommu_dma_get_pages(cpu_addr); + + if (pages) + __iommu_dma_free_pages(pages, count); + else + page = vmalloc_to_page(cpu_addr); + + dma_common_free_remap(cpu_addr, alloc_size, VM_USERMAP); + } else { + /* Lowmem means a coherent atomic or CMA allocation */ + page = virt_to_page(cpu_addr); + } + + if (page && !dma_release_from_contiguous(dev, page, count)) + __free_pages(page, get_order(alloc_size)); +} + static void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) { @@ -985,46 +1020,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, return addr; } -static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t handle, unsigned long attrs) -{ - size_t iosize = size; - - size = PAGE_ALIGN(size); - /* -* @cpu_addr will be one of 4 things depending on how it was allocated: -* - A remapped array of pages for contiguous allocations. -* - A remapped array of pages from iommu_dma_alloc_remap(), for all -* non-atomic allocations. -* - A non-cacheable alias from the atomic pool, for atomic -* allocations by non-coherent devices. -* - A normal lowmem address, for atomic allocations by -* coherent devices. -* Hence how dodgy the below logic looks... -*/ - if (dma_in_atomic_pool(cpu_addr, size)) { - __iommu_dma_unmap(dev, handle, iosize); - dma_free_from_pool(cpu_addr, size); - } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) { - struct page *page = vmalloc_to_page(cpu_addr); - - __iommu_dma_unmap(dev, handle, iosize); - dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); - dma_common_free_remap(cpu_addr, size, VM_USERMAP); - } else if (is_vmalloc_addr(cpu_addr)){ - struct page **pages = __iommu_dma_get_pages(cpu_addr); - - if (WARN_ON(!pages)) - return; - __iommu_dma_unmap(dev, handle, iosize); - __iommu_dma_free_pages(pages, size >> PAGE_SHIFT); - dma_common_free_remap(cpu_addr, size, VM_USERMAP); - } else { - __iommu_dma_unmap(dev, handle, iosize); -
Re: [PATCH 13/26] iommu/dma: Remove __iommu_dma_free
On 22/04/2019 18:59, Christoph Hellwig wrote: We only have a single caller of this function left, so open code it there. Heh, I even caught myself out for a moment thinking this looked redundant with #18 now, but no :) Reviewed-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 21 ++--- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index b8e46e89a60a..4632b9d301a1 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -534,24 +534,6 @@ static struct page **__iommu_dma_get_pages(void *cpu_addr) return area->pages; } -/** - * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc_remap() - * @dev: Device which owns this buffer - * @pages: Array of buffer pages as returned by __iommu_dma_alloc_remap() - * @size: Size of buffer in bytes - * @handle: DMA address of buffer - * - * Frees both the pages associated with the buffer, and the array - * describing them - */ -static void __iommu_dma_free(struct device *dev, struct page **pages, - size_t size, dma_addr_t *handle) -{ - __iommu_dma_unmap(dev, *handle, size); - __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT); - *handle = DMA_MAPPING_ERROR; -} - /** * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space * @dev: Device to allocate memory for. Must be a real device @@ -1034,7 +1016,8 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, if (WARN_ON(!pages)) return; - __iommu_dma_free(dev, pages, iosize, ); + __iommu_dma_unmap(dev, handle, iosize); + __iommu_dma_free_pages(pages, size >> PAGE_SHIFT); dma_common_free_remap(cpu_addr, size, VM_USERMAP); } else { __iommu_dma_unmap(dev, handle, iosize); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
Hi Marc, On 23/04/2019 11:54, Marc Zyngier wrote: On 18/04/2019 18:26, Julien Grall wrote: On RT, the function iommu_dma_map_msi_msg may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). The function iommu_dma_map_msi_msg is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. This patch split the function iommu_dma_map_msi_msg in two new functions: - iommu_dma_prepare_msi: This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg: This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall --- drivers/iommu/dma-iommu.c | 43 --- include/linux/dma-iommu.h | 21 + 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 77aabe637a60..f5c1f1685095 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -888,17 +888,17 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; } -void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) I quite like the idea of moving from having an irq to having an msi_desc passed to the IOMMU layer... { - struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); + struct device *dev = msi_desc_to_dev(desc); struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; - struct iommu_dma_msi_page *msi_page; - phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; unsigned long flags; - if (!domain || !domain->iova_cookie) - return; + if (!domain || !domain->iova_cookie) { + desc->iommu_cookie = NULL; + return 0; + } cookie = domain->iova_cookie; @@ -908,10 +908,33 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) * of an MSI from within an IPI handler. */ spin_lock_irqsave(>msi_lock, flags); - msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); + desc->iommu_cookie = iommu_dma_get_msi_page(dev, msi_addr, domain); spin_unlock_irqrestore(>msi_lock, flags); - if (WARN_ON(!msi_page)) { + return (desc->iommu_cookie) ? 0 : -ENOMEM; +} + +void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg) ... but I'd like it even better if it was uniform. Can you please move the irq_get_msi_desc() to the callers of iommu_dma_compose_msi_msg(), and make both functions take a msi_desc? Make sense. I will modify iommu_dma_compose_msi_msg to take a msi_desc in parameter. Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 12/26] iommu/dma: Refactor the page array remapping allocator
On 22/04/2019 18:59, Christoph Hellwig wrote: Move the call to dma_common_pages_remap into __iommu_dma_alloc and rename it to iommu_dma_alloc_remap. This creates a self-contained helper for remapped pages allocation and mapping. Reviewed-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 54 +++ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8e2d9733113e..b8e46e89a60a 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -535,9 +535,9 @@ static struct page **__iommu_dma_get_pages(void *cpu_addr) } /** - * iommu_dma_free - Free a buffer allocated by __iommu_dma_alloc() + * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc_remap() * @dev: Device which owns this buffer - * @pages: Array of buffer pages as returned by __iommu_dma_alloc() + * @pages: Array of buffer pages as returned by __iommu_dma_alloc_remap() * @size: Size of buffer in bytes * @handle: DMA address of buffer * @@ -553,33 +553,35 @@ static void __iommu_dma_free(struct device *dev, struct page **pages, } /** - * __iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space + * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space * @dev: Device to allocate memory for. Must be a real device * attached to an iommu_dma_domain * @size: Size of buffer in bytes + * @dma_handle: Out argument for allocated DMA handle * @gfp: Allocation flags * @attrs: DMA attributes for this allocation - * @prot: IOMMU mapping flags - * @handle: Out argument for allocated DMA handle * * If @size is less than PAGE_SIZE, then a full CPU page will be allocated, * but an IOMMU which supports smaller pages might not map the whole thing. * - * Return: Array of struct page pointers describing the buffer, - *or NULL on failure. + * Return: Mapped virtual address, or NULL on failure. */ -static struct page **__iommu_dma_alloc(struct device *dev, size_t size, - gfp_t gfp, unsigned long attrs, int prot, dma_addr_t *handle) +static void *iommu_dma_alloc_remap(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; + bool coherent = dev_is_dma_coherent(dev); + int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); + pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); + unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap; struct page **pages; struct sg_table sgt; dma_addr_t iova; - unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap; + void *vaddr; - *handle = DMA_MAPPING_ERROR; + *dma_handle = DMA_MAPPING_ERROR; min_size = alloc_sizes & -alloc_sizes; if (min_size < PAGE_SIZE) { @@ -605,7 +607,7 @@ static struct page **__iommu_dma_alloc(struct device *dev, size_t size, if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL)) goto out_free_iova; - if (!(prot & IOMMU_CACHE)) { + if (!(ioprot & IOMMU_CACHE)) { struct scatterlist *sg; int i; @@ -613,14 +615,21 @@ static struct page **__iommu_dma_alloc(struct device *dev, size_t size, arch_dma_prep_coherent(sg_page(sg), sg->length); } - if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot) + if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot) < size) goto out_free_sg; - *handle = iova; + vaddr = dma_common_pages_remap(pages, size, VM_USERMAP, prot, + __builtin_return_address(0)); + if (!vaddr) + goto out_unmap; + + *dma_handle = iova; sg_free_table(); - return pages; + return vaddr; +out_unmap: + __iommu_dma_unmap(dev, iova, size); out_free_sg: sg_free_table(); out_free_iova: @@ -989,18 +998,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, size >> PAGE_SHIFT); } } else { - pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); - struct page **pages; - - pages = __iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot, - handle); - if (!pages) - return NULL; - - addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot, - __builtin_return_address(0)); - if (!addr) - __iommu_dma_free(dev, pages, iosize,
Re: [PATCH 11/26] iommu/dma: Factor out remapped pages lookup
On 22/04/2019 18:59, Christoph Hellwig wrote: From: Robin Murphy Since we duplicate the find_vm_area() logic a few times in places where we only care aboute the pages, factor out a helper to abstract it. Signed-off-by: Robin Murphy [hch: don't warn when not finding a region, as we'll rely on that later] Yeah, I did think about that and the things which it might make a little easier, but preserved it as-is for the sake of keeping my modifications quick and simple. TBH I'm now feeling more inclined to drop the WARNs entirely at this point, since it's not like there's ever been any general guarantee that freeing the wrong thing shouldn't just crash, but that's something we can easily come back to later if need be. Robin. Signed-off-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 32 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index b52c5d6be7b4..8e2d9733113e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -525,6 +525,15 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, return pages; } +static struct page **__iommu_dma_get_pages(void *cpu_addr) +{ + struct vm_struct *area = find_vm_area(cpu_addr); + + if (!area || !area->pages) + return NULL; + return area->pages; +} + /** * iommu_dma_free - Free a buffer allocated by __iommu_dma_alloc() * @dev: Device which owns this buffer @@ -1023,11 +1032,11 @@ static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr, dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); dma_common_free_remap(cpu_addr, size, VM_USERMAP); } else if (is_vmalloc_addr(cpu_addr)){ - struct vm_struct *area = find_vm_area(cpu_addr); + struct page **pages = __iommu_dma_get_pages(cpu_addr); - if (WARN_ON(!area || !area->pages)) + if (WARN_ON(!pages)) return; - __iommu_dma_free(dev, area->pages, iosize, ); + __iommu_dma_free(dev, pages, iosize, ); dma_common_free_remap(cpu_addr, size, VM_USERMAP); } else { __iommu_dma_unmap(dev, handle, iosize); @@ -1049,7 +1058,7 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, { unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; unsigned long off = vma->vm_pgoff; - struct vm_struct *area; + struct page **pages; int ret; vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs); @@ -1074,11 +1083,10 @@ static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, return __iommu_dma_mmap_pfn(vma, pfn, size); } - area = find_vm_area(cpu_addr); - if (WARN_ON(!area || !area->pages)) + pages = __iommu_dma_get_pages(cpu_addr); + if (WARN_ON_ONCE(!pages)) return -ENXIO; - - return __iommu_dma_mmap(area->pages, size, vma); + return __iommu_dma_mmap(pages, size, vma); } static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page *page, @@ -1096,7 +1104,7 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, unsigned long attrs) { unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; - struct vm_struct *area = find_vm_area(cpu_addr); + struct page **pages; if (!is_vmalloc_addr(cpu_addr)) { struct page *page = virt_to_page(cpu_addr); @@ -1112,10 +1120,10 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, return __iommu_dma_get_sgtable_page(sgt, page, size); } - if (WARN_ON(!area || !area->pages)) + pages = __iommu_dma_get_pages(cpu_addr); + if (WARN_ON_ONCE(!pages)) return -ENXIO; - - return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size, + return sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/26] iommu/dma: move the arm64 wrappers to common code
On 22/04/2019 18:59, Christoph Hellwig wrote: There is nothing really arm64 specific in the iommu_dma_ops implementation, so move it to dma-iommu.c and keep a lot of symbols self-contained. Note the implementation does depend on the DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the DMA_IOMMU support depend on it, but this will be relaxed soon. Nothing looks objectionable, and boot testing with this much of the series merged has my coherent and non-coherent IOMMU-backed devices appearing to still work OK, so: Acked-by: Robin Murphy Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 389 +--- drivers/iommu/Kconfig | 1 + drivers/iommu/dma-iommu.c | 388 --- include/linux/dma-iommu.h | 43 +--- 4 files changed, 369 insertions(+), 452 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 636fa7c64370..d1661f78eb4d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -58,27 +59,6 @@ void arch_dma_prep_coherent(struct page *page, size_t size) __dma_flush_area(page_address(page), size); } -#ifdef CONFIG_IOMMU_DMA -static int __swiotlb_get_sgtable_page(struct sg_table *sgt, - struct page *page, size_t size) -{ - int ret = sg_alloc_table(sgt, 1, GFP_KERNEL); - - if (!ret) - sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); - - return ret; -} - -static int __swiotlb_mmap_pfn(struct vm_area_struct *vma, - unsigned long pfn, size_t size) -{ - return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, - vma->vm_end - vma->vm_start, - vma->vm_page_prot); -} -#endif /* CONFIG_IOMMU_DMA */ - static int __init arm64_dma_init(void) { WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(), @@ -90,379 +70,18 @@ static int __init arm64_dma_init(void) arch_initcall(arm64_dma_init); #ifdef CONFIG_IOMMU_DMA -#include -#include -#include - -static void *__iommu_alloc_attrs(struct device *dev, size_t size, -dma_addr_t *handle, gfp_t gfp, -unsigned long attrs) -{ - bool coherent = dev_is_dma_coherent(dev); - int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); - size_t iosize = size; - void *addr; - - if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n")) - return NULL; - - size = PAGE_ALIGN(size); - - /* -* Some drivers rely on this, and we probably don't want the -* possibility of stale kernel data being read by devices anyway. -*/ - gfp |= __GFP_ZERO; - - if (!gfpflags_allow_blocking(gfp)) { - struct page *page; - /* -* In atomic context we can't remap anything, so we'll only -* get the virtually contiguous buffer we need by way of a -* physically contiguous allocation. -*/ - if (coherent) { - page = alloc_pages(gfp, get_order(size)); - addr = page ? page_address(page) : NULL; - } else { - addr = dma_alloc_from_pool(size, , gfp); - } - if (!addr) - return NULL; - - *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot); - if (*handle == DMA_MAPPING_ERROR) { - if (coherent) - __free_pages(page, get_order(size)); - else - dma_free_from_pool(addr, size); - addr = NULL; - } - } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) { - pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs); - struct page *page; - - page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT, - get_order(size), gfp & __GFP_NOWARN); - if (!page) - return NULL; - - *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot); - if (*handle == DMA_MAPPING_ERROR) { - dma_release_from_contiguous(dev, page, - size >> PAGE_SHIFT); - return NULL; - } - addr = dma_common_contiguous_remap(page, size, VM_USERMAP, - prot, - __builtin_return_address(0)); - if (addr) { - if (!coherent) -
Re: [PATCH 02/26] arm64/iommu: improve mmap bounds checking
On 22/04/2019 18:59, Christoph Hellwig wrote: The nr_pages checks should be done for all mmap requests, not just those using remap_pfn_range. I think it probably makes sense now to just squash this with #22 one way or the other, but if you really really still want to keep it as a separate patch with a misleading commit message then I'm willing to keep my complaints to myself :) Robin. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 674860e3e478..604c638b2787 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -73,19 +73,9 @@ static int __swiotlb_get_sgtable_page(struct sg_table *sgt, static int __swiotlb_mmap_pfn(struct vm_area_struct *vma, unsigned long pfn, size_t size) { - int ret = -ENXIO; - unsigned long nr_vma_pages = vma_pages(vma); - unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; - unsigned long off = vma->vm_pgoff; - - if (off < nr_pages && nr_vma_pages <= (nr_pages - off)) { - ret = remap_pfn_range(vma, vma->vm_start, - pfn + off, - vma->vm_end - vma->vm_start, - vma->vm_page_prot); - } - - return ret; + return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); } #endif /* CONFIG_IOMMU_DMA */ @@ -241,6 +231,8 @@ static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { + unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; + unsigned long off = vma->vm_pgoff; struct vm_struct *area; int ret; @@ -249,6 +241,9 @@ static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, )) return ret; + if (off >= nr_pages || vma_pages(vma) > nr_pages - off) + return -ENXIO; + if (!is_vmalloc_addr(cpu_addr)) { unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr)); return __swiotlb_mmap_pfn(vma, pfn, size); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] iommu/amd: flush not present cache in iommu_map_page
On Mon, Apr 29, 2019 at 01:17:44PM +0100, Tom Murphy wrote: > Yes. My patches depend on the "iommu/vt-d: Delegate DMA domain to > generic iommu" patch which is currently being reviewed. Nice! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] iommu/amd: flush not present cache in iommu_map_page
On Mon, Apr 29, 2019 at 12:59 PM Christoph Hellwig wrote: > > On Sat, Apr 27, 2019 at 03:20:35PM +0100, Tom Murphy wrote: > > I am working on another patch to improve the intel iotlb flushing in > > the iommu ops patch which should cover this too. > > So are you looking into converting the intel-iommu driver to use > dma-iommu as well? That would be great! Yes. My patches depend on the "iommu/vt-d: Delegate DMA domain to generic iommu" patch which is currently being reviewed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers
On 29/04/2019 12:49, Christoph Hellwig wrote: On Tue, Apr 23, 2019 at 11:01:44AM +0100, Robin Murphy wrote: Wouldn't this suffice? Since we also use alloc_pages() in the coherent atomic case, the free path should already be able to deal with it. Let me take a proper look at v3 and see how it all looks in context. Any comments on v3? I've been deferring lots of other DMA work to not create conflicts, so I'd hate to miss this merge window. Ah, I did skim the commits in the branch, but I'll run through again and reply on the patches while my head's still in email mode... Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] iommu/amd: flush not present cache in iommu_map_page
On Sat, Apr 27, 2019 at 03:20:35PM +0100, Tom Murphy wrote: > I am working on another patch to improve the intel iotlb flushing in > the iommu ops patch which should cover this too. So are you looking into converting the intel-iommu driver to use dma-iommu as well? That would be great! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers
On Tue, Apr 23, 2019 at 11:01:44AM +0100, Robin Murphy wrote: > Wouldn't this suffice? Since we also use alloc_pages() in the coherent > atomic case, the free path should already be able to deal with it. > > Let me take a proper look at v3 and see how it all looks in context. Any comments on v3? I've been deferring lots of other DMA work to not create conflicts, so I'd hate to miss this merge window. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
On Mon, Apr 29, 2019 at 12:06:52PM +0100, Robin Murphy wrote: > > From the reply up-thread I guess you're trying to include an optimisation > to only copy the head and tail of the buffer if it spans multiple pages, > and directly map the ones in the middle, but AFAICS that's going to tie you > to also using strict mode for TLB maintenance, which may not be a win > overall depending on the balance between invalidation bandwidth vs. memcpy > bandwidth. At least if we use standard SWIOTLB logic to always copy the > whole thing, we should be able to release the bounce pages via the flush > queue to allow 'safe' lazy unmaps. Oh. The head and tail optimization is what I missed. Yes, for that we'd need the offset. > Either way I think it would be worth just implementing the straightforward > version first, then coming back to consider optimisations later. Agreed, let's start simple. Especially as large DMA mappings or allocations should usually be properly aligned anyway, and if not we should fix that for multiple reasons. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/10] swiotlb: Factor out slot allocation and free
On 29/04/2019 06:10, Lu Baolu wrote: Hi Christoph, On 4/26/19 11:04 PM, Christoph Hellwig wrote: On Thu, Apr 25, 2019 at 10:07:19AM +0800, Lu Baolu wrote: This is not VT-d specific. It's just how generic IOMMU works. Normally, IOMMU works in paging mode. So if a driver issues DMA with IOVA 0x0123, IOMMU can remap it with a physical address 0x0123. But we should never expect IOMMU to remap 0x0123 with physical address of 0x. That's the reason why I said that IOMMU will not work there. Well, with the iommu it doesn't happen. With swiotlb it obviosuly can happen, so drivers are fine with it. Why would that suddenly become an issue when swiotlb is called from the iommu code? I would say IOMMU is DMA remapping, not DMA engine. :-) I'm not sure I really follow the issue here - if we're copying the buffer to the bounce page(s) there's no conceptual difference from copying it to SWIOTLB slot(s), so there should be no need to worry about the original in-page offset. From the reply up-thread I guess you're trying to include an optimisation to only copy the head and tail of the buffer if it spans multiple pages, and directly map the ones in the middle, but AFAICS that's going to tie you to also using strict mode for TLB maintenance, which may not be a win overall depending on the balance between invalidation bandwidth vs. memcpy bandwidth. At least if we use standard SWIOTLB logic to always copy the whole thing, we should be able to release the bounce pages via the flush queue to allow 'safe' lazy unmaps. Either way I think it would be worth just implementing the straightforward version first, then coming back to consider optimisations later. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/10] iommu: Add helper to get minimal page size of domain
On 21/04/2019 02:17, Lu Baolu wrote: This makes it possible for other modules to know the minimal page size supported by a domain without the knowledge of the structure details. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a5007d035218..46679ef19b7e 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -377,6 +377,14 @@ static inline void iommu_tlb_sync(struct iommu_domain *domain) domain->ops->iotlb_sync(domain); } +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain) +{ + if (domain && domain->pgsize_bitmap) + return 1 << __ffs(domain->pgsize_bitmap); Nit: this would probably be more efficient on most architectures as: if (domain) return domain->pgsize_bitmap & -domain->pgsize_bitmap; I'd also suggest s/minimal/min/ in the name, just to stop it getting too long. Otherwise, though, I like the idea, and there's at least one other place (in iommu-dma) that can make use of it straight away. Robin. + + return 0; +} + /* PCI device grouping function */ extern struct iommu_group *pci_device_group(struct device *dev); /* Generic device grouping function */ @@ -704,6 +712,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } +static inline unsigned long domain_minimal_pgsize(struct iommu_domain *domain) +{ + return 0; +} + #endif /* CONFIG_IOMMU_API */ #ifdef CONFIG_IOMMU_DEBUGFS ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 11/19] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
On 27/04/2019 09:38, Auger Eric wrote: --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5153,7 +5153,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain, domain->auxd_refcnt--; if (!domain->auxd_refcnt && domain->default_pasid > 0) - intel_pasid_free_id(domain->default_pasid); + ioasid_free(domain->default_pasid); } static int aux_domain_add_dev(struct dmar_domain *domain, @@ -5171,9 +5171,8 @@ static int aux_domain_add_dev(struct dmar_domain *domain, if (domain->default_pasid <= 0) { int pasid; - pasid = intel_pasid_alloc_id(domain, PASID_MIN, - pci_max_pasids(to_pci_dev(dev)), - GFP_KERNEL); + pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1, + domain); if (pasid <= 0) { >>> ioasid_t is a uint and returns INVALID_IOASID on error. Wouldn't it be >>> simpler to make ioasid_alloc return an int? >> Well, I think we still want the full uint range - 1(INVALID_IOASID). >> Intel uses 20bit but I think SMMUs use 32 bits for streamID? I >> should just check >> if (pasid == INVALID_IOASID) { > Jean-Philippe may correct me but SMMU uses 20b SubstreamId which is a > superset of PASIDs. StreamId is 32b. Right, we use 20 bits for PASIDs (== SubstreamID really). Given the choices that vendors are making for PASIDs (a global namespace rather than per-VM), I wouldn't be surprised if they extend the size of PASIDs in a couple of years, so I added the typedef ioasid_t to ease a possible change from 32-bit to 64 in the future. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu