[PATCH v3 3/3] swiotlb: checking whether swiotlb buffer is full with io_tlb_used
This patch uses io_tlb_used to help check whether swiotlb buffer is full. io_tlb_used is no longer used for only debugfs. It is also used to help optimize swiotlb_tbl_map_single(). Suggested-by: Joe Jin Signed-off-by: Dongli Zhang --- Changed since v2: * move #ifdef folding to previous patch (suggested by Konrad Rzeszutek Wilk) kernel/dma/swiotlb.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index bedc9f9..a01b83e 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -483,6 +483,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, * request and allocate a buffer from that IO TLB pool. */ spin_lock_irqsave(_tlb_lock, flags); + + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used)) + goto not_found; + index = ALIGN(io_tlb_index, stride); if (index >= io_tlb_nslabs) index = 0; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/3] swiotlb: fix comment on swiotlb_bounce()
Fix the comment as swiotlb_bounce() is used to copy from original dma location to swiotlb buffer during swiotlb_tbl_map_single(), while to copy from swiotlb buffer to original dma location during swiotlb_tbl_unmap_single(). Signed-off-by: Dongli Zhang --- kernel/dma/swiotlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 1fb6fd6..1d8b377 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -385,7 +385,7 @@ void __init swiotlb_exit(void) } /* - * Bounce: copy the swiotlb buffer back to the original dma location + * Bounce: copy the swiotlb buffer from or back to the original dma location */ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/3] swiotlb: add debugfs to track swiotlb buffer usage
The device driver will not be able to do dma operations once swiotlb buffer is full, either because the driver is using so many IO TLB blocks inflight, or because there is memory leak issue in device driver. To export the swiotlb buffer usage via debugfs would help the user estimate the size of swiotlb buffer to pre-allocate or analyze device driver memory leak issue. Signed-off-by: Dongli Zhang --- Changed since v1: * init debugfs with late_initcall (suggested by Robin Murphy) * create debugfs entries with debugfs_create_ulong(suggested by Robin Murphy) Changed since v2: * some #ifdef folding (suggested by Konrad Rzeszutek Wilk) kernel/dma/swiotlb.c | 44 1 file changed, 44 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 1d8b377..bedc9f9 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -34,6 +34,9 @@ #include #include #include +#ifdef CONFIG_DEBUG_FS +#include +#endif #include #include @@ -73,6 +76,11 @@ phys_addr_t io_tlb_start, io_tlb_end; static unsigned long io_tlb_nslabs; /* + * The number of used IO TLB block + */ +static unsigned long io_tlb_used; + +/* * This is a free list describing the number of free entries available from * each index */ @@ -524,6 +532,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size); return DMA_MAPPING_ERROR; found: + io_tlb_used += nslots; spin_unlock_irqrestore(_tlb_lock, flags); /* @@ -584,6 +593,8 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, */ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) io_tlb_list[i] = ++count; + + io_tlb_used -= nslots; } spin_unlock_irqrestore(_tlb_lock, flags); } @@ -662,3 +673,36 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask) { return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask; } + +#ifdef CONFIG_DEBUG_FS + +static int __init swiotlb_create_debugfs(void) +{ + static struct dentry *d_swiotlb_usage; + struct dentry *ent; + + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); + + if (!d_swiotlb_usage) + return -ENOMEM; + + ent = debugfs_create_ulong("io_tlb_nslabs", 0400, + d_swiotlb_usage, _tlb_nslabs); + if (!ent) + goto fail; + + ent = debugfs_create_ulong("io_tlb_used", 0400, + d_swiotlb_usage, _tlb_used); + if (!ent) + goto fail; + + return 0; + +fail: + debugfs_remove_recursive(d_swiotlb_usage); + return -ENOMEM; +} + +late_initcall(swiotlb_create_debugfs); + +#endif -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] PCI: iproc: Add feature to set order mode
Order mode in RX header of incoming pcie packets can be override to strict or loose order based on requirement. Sysfs entry is provided to set dynamic and default order modes of upstream traffic. To improve performance in few endpoints we required to modify the ordering attributes. Using this feature we can override order modes of RX packets at IPROC RC. Ex: In PCIe based NVMe SSD endpoints data read/writes from disk is using Queue pairs (submit/completion). After completion of block read/write, EP writes completion command to completion queue to notify RC. So that all data transfers before completion command write are not required to strict order except completion command. It means previous all packets before completion command write to RC should be written to memory and acknowledged. Signed-off-by: Srinath Mannam Reviewed-by: Ray Jui --- drivers/pci/controller/pcie-iproc.c | 128 drivers/pci/controller/pcie-iproc.h | 16 + 2 files changed, 144 insertions(+) diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c index c20fd6b..13ce80f 100644 --- a/drivers/pci/controller/pcie-iproc.c +++ b/drivers/pci/controller/pcie-iproc.c @@ -57,6 +57,9 @@ #define PCIE_DL_ACTIVE_SHIFT 2 #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT) +#define MPS_MRRS_CFG_MPS_SHIFT 0 +#define MPS_MRRS_CFG_MRRS_SHIFT16 + #define APB_ERR_EN_SHIFT 0 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) @@ -91,6 +94,14 @@ #define IPROC_PCIE_REG_INVALID 0x +#define RO_FIELD(window) BIT((window) << 1) +#define RO_VALUE(window) BIT(((window) << 1) + 1) +/* All Windows are allowed */ +#define RO_ALL_WINDOW 0x +/* Wait on All Windows */ +#define RO_FIELD_ALL_WINDOW0x +#define DYNAMIC_ORDER_MODE 0x5 + /** * iProc PCIe outbound mapping controller specific parameters * @@ -295,6 +306,15 @@ enum iproc_pcie_reg { /* enable APB error for unsupported requests */ IPROC_PCIE_APB_ERR_EN, + /* Ordering Mode configuration registers */ + IPROC_PCIE_ORDERING_CFG, + IPROC_PCIE_MPS_MRRS_CFG, + IPROC_PCIE_IMAP0_RO_CONTROL, + IPROC_PCIE_IMAP1_RO_CONTROL, + IPROC_PCIE_IMAP2_RO_CONTROL, + IPROC_PCIE_IMAP3_RO_CONTROL, + IPROC_PCIE_IMAP4_RO_CONTROL, + /* total number of core registers */ IPROC_PCIE_MAX_NUM_REG, }; @@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = { [IPROC_PCIE_IMAP4] = 0xe70, [IPROC_PCIE_LINK_STATUS]= 0xf0c, [IPROC_PCIE_APB_ERR_EN] = 0xf40, + [IPROC_PCIE_ORDERING_CFG] = 0x2000, + [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008, + [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c, + [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020, + [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024, + [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028, + [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c, }; /* iProc PCIe PAXC v1 registers */ @@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie) return 0; } +static +ssize_t order_mode_show(struct device *dev, + struct device_attribute *attr, + char *buff) +{ + struct pci_host_bridge *host = to_pci_host_bridge(dev); + struct iproc_pcie *pcie = pci_host_bridge_priv(host); + + return sprintf(buff, "Current PAXB order configuration %d\n", + pcie->order_cfg); +} + +static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie) +{ + u32 val = 0, mps; + + /* Set all IMAPs to relaxed order in dynamic order mode */ + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, +DYNAMIC_ORDER_MODE); + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL, +RO_ALL_WINDOW); + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL, +RO_ALL_WINDOW); + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, +RO_ALL_WINDOW); + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL, +RO_ALL_WINDOW); + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL, +RO_ALL_WINDOW); + + /* PAXB MPS/MRRS settings configuration */ + iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL, + 2, ); + mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5; + /* set MRRS to match system MPS */ + val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT); + /* set MPS to 4096 bytes */ + val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT); + iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val); +} + +static +ssize_t order_mode_store(struct device *dev, +
[PATCH 2/3] PCI: iproc: CRS state check in config request
In the current implementation, config read of 0x0001 data is assumed as CRS completion. but sometimes 0x0001 can be a valid data. IPROC PCIe RC has a register to show config request status flags like SC, UR, CRS and CA. So that extra check is added in the code to confirm the CRS state using this register before reissue config request. Signed-off-by: Srinath Mannam Reviewed-by: Ray Jui --- drivers/pci/controller/pcie-iproc.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c index 13ce80f..ee89d56 100644 --- a/drivers/pci/controller/pcie-iproc.c +++ b/drivers/pci/controller/pcie-iproc.c @@ -63,6 +63,10 @@ #define APB_ERR_EN_SHIFT 0 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) +#define CFG_RD_SUCCESS 0 +#define CFG_RD_UR 1 +#define CFG_RD_CRS 2 +#define CFG_RD_CA 3 #define CFG_RETRY_STATUS 0x0001 #define CFG_RETRY_STATUS_TIMEOUT_US50 /* 500 milliseconds */ @@ -300,6 +304,9 @@ enum iproc_pcie_reg { IPROC_PCIE_IARR4, IPROC_PCIE_IMAP4, + /* config read status */ + IPROC_PCIE_CFG_RD_STATUS, + /* link status */ IPROC_PCIE_LINK_STATUS, @@ -370,6 +377,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = { [IPROC_PCIE_IMAP3] = 0xe08, [IPROC_PCIE_IARR4] = 0xe68, [IPROC_PCIE_IMAP4] = 0xe70, + [IPROC_PCIE_CFG_RD_STATUS] = 0xee0, [IPROC_PCIE_LINK_STATUS]= 0xf0c, [IPROC_PCIE_APB_ERR_EN] = 0xf40, [IPROC_PCIE_ORDERING_CFG] = 0x2000, @@ -501,10 +509,12 @@ static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, return (pcie->base + offset); } -static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) +static unsigned int iproc_pcie_cfg_retry(struct iproc_pcie *pcie, +void __iomem *cfg_data_p) { int timeout = CFG_RETRY_STATUS_TIMEOUT_US; unsigned int data; + u32 status; /* * As per PCIe spec r3.1, sec 2.3.2, CRS Software Visibility only @@ -525,6 +535,15 @@ static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) */ data = readl(cfg_data_p); while (data == CFG_RETRY_STATUS && timeout--) { + /* +* CRS state is set in CFG_RD status register +* This will handle the case where CFG_RETRY_STATUS is +* valid config data. +*/ + status = iproc_pcie_read_reg(pcie, IPROC_PCIE_CFG_RD_STATUS); + if (status != CFG_RD_CRS) + return data; + udelay(1); data = readl(cfg_data_p); } @@ -603,7 +622,7 @@ static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, if (!cfg_data_p) return PCIBIOS_DEVICE_NOT_FOUND; - data = iproc_pcie_cfg_retry(cfg_data_p); + data = iproc_pcie_cfg_retry(pcie, cfg_data_p); *val = data; if (size <= 2) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/3] Add IPROC PCIe new features
Add changes related to IPROC PCIe RC IP new features. This patch set is based on Linux-5.0-rc2. Srinath Mannam (3): PCI: iproc: Add feature to set order mode PCI: iproc: CRS state check in config request PCI: iproc: Add PCIe 32bit outbound memory configuration drivers/pci/controller/pcie-iproc.c | 172 +++- drivers/pci/controller/pcie-iproc.h | 16 2 files changed, 184 insertions(+), 4 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] PCI: iproc: Add PCIe 32bit outbound memory configuration
IPROC PCIe RC supports outbound memory mapping with SOC address inside 4GB address boundary, which is 32 bit AXI address. This patch add the support. Signed-off-by: Srinath Mannam Signed-off-by: Abhishek Shah Signed-off-by: Ray Jui Reviewed-by: Scott Branden Reviewed-by: Vikram Prakash --- drivers/pci/controller/pcie-iproc.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c index ee89d56..4d0e63b 100644 --- a/drivers/pci/controller/pcie-iproc.c +++ b/drivers/pci/controller/pcie-iproc.c @@ -982,8 +982,25 @@ static int iproc_pcie_setup_ob(struct iproc_pcie *pcie, u64 axi_addr, resource_size_t window_size = ob_map->window_sizes[size_idx] * SZ_1M; - if (size < window_size) - continue; + /* +* Keep iterating until we reach the last window and +* with the minimal window size at index zero. In this +* case, we take a compromise by mapping it using the +* minimum window size that can be supported +*/ + if (size < window_size) { + if (size_idx > 0 || window_idx > 0) + continue; + + /* +* For the corner case of reaching the minimal +* window size that can be supported on the +* last window +*/ + axi_addr = ALIGN_DOWN(axi_addr, window_size); + pci_addr = ALIGN_DOWN(pci_addr, window_size); + size = window_size; + } if (!IS_ALIGNED(axi_addr, window_size) || !IS_ALIGNED(pci_addr, window_size)) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: amd: call free_iova_fast with pfn in map_sg
In the error path of map_sg, free_iova_fast is being called with address instead of the pfn. This results in a bad value getting into the rcache, and can result in hitting a BUG_ON when iova_magazine_free_pfns is called. Cc: Joerg Roedel Cc: Suravee Suthikulpanit Signed-off-by: Jerry Snitselaar --- drivers/iommu/amd_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 87ba23a75b38..418df8ff3e50 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2623,7 +2623,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist, } out_free_iova: - free_iova_fast(_dom->iovad, address, npages); + free_iova_fast(_dom->iovad, address >> PAGE_SHIFT, npages); out_err: return 0; -- 2.20.1.98.gecbdaf0899 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] videobuf2: replace a layering violation with dma_map_resource
On Mon, Jan 14, 2019 at 01:42:26PM +0100, Marek Szyprowski wrote: > Hi Christoph, > > On 2019-01-11 19:17, Christoph Hellwig wrote: > > vb2_dc_get_userptr pokes into arm direct mapping details to get the > > resemblance of a dma address for a a physical address that does is > > not backed by a page struct. Not only is this not portable to other > > architectures with dma direct mapping offsets, but also not to uses > > of IOMMUs of any kind. Switch to the proper dma_map_resource / > > dma_unmap_resource interface instead. > > > > Signed-off-by: Christoph Hellwig > > There are checks for IOMMU presence in other places in vb2-dma-contig, > so it was used only when no IOMMU is available, but I agree that the > hacky code should be replaced by a generic code if possible. > > Tested-by: Marek Szyprowski > > V4L2 pipeline works fine on older Exynos-based boards with CMA and IOMMU > disabled. Do you know if these rely on the offsets? E.g. would they still work with the patch below applied on top. That would keep the map_resource semantics as-is as solve the issue pointed out by Robin for now. If not I can only think of a flag to bypass the offseting for now, but that would be pretty ugly. Or go for the long-term solution of discovering the relationship between the two devices, as done by the PCIe P2P code.. diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 8e0359b04957..25bd19974223 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -359,7 +359,7 @@ EXPORT_SYMBOL(dma_direct_map_sg); dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr, size_t size, enum dma_data_direction dir, unsigned long attrs) { - dma_addr_t dma_addr = phys_to_dma(dev, paddr); + dma_addr_t dma_addr = paddr; if (unlikely(!dma_direct_possible(dev, dma_addr, size))) { report_addr(dev, dma_addr, size); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] arm64/xen: fix xen-swiotlb cache flushing
Xen-swiotlb hooks into the arm/arm64 arch code through a copy of the DMA mapping operations stored in the struct device arch data. Switching arm64 to use the direct calls for the merged DMA direct / swiotlb code broke this scheme. Replace the indirect calls with direct-calls in xen-swiotlb as well to fix this problem. Fixes: 356da6d0cd ("dma-mapping: bypass indirect calls for dma-direct") Reported-by: Julien Grall Signed-off-by: Christoph Hellwig --- arch/arm/include/asm/xen/page-coherent.h | 94 + arch/arm64/include/asm/device.h| 3 - arch/arm64/include/asm/xen/page-coherent.h | 76 + arch/arm64/mm/dma-mapping.c| 4 +- drivers/xen/swiotlb-xen.c | 4 +- include/xen/arm/page-coherent.h| 97 +- 6 files changed, 176 insertions(+), 102 deletions(-) diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h index b3ef061d8b74..2c403e7c782d 100644 --- a/arch/arm/include/asm/xen/page-coherent.h +++ b/arch/arm/include/asm/xen/page-coherent.h @@ -1 +1,95 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H +#define _ASM_ARM_XEN_PAGE_COHERENT_H + +#include +#include #include + +static inline const struct dma_map_ops *xen_get_dma_ops(struct device *dev) +{ + if (dev && dev->archdata.dev_dma_ops) + return dev->archdata.dev_dma_ops; + return get_arch_dma_ops(NULL); +} + +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs) +{ + return xen_get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, attrs); +} + +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size, + void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs) +{ + xen_get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs); +} + +static inline void xen_dma_map_page(struct device *hwdev, struct page *page, +dma_addr_t dev_addr, unsigned long offset, size_t size, +enum dma_data_direction dir, unsigned long attrs) +{ + unsigned long page_pfn = page_to_xen_pfn(page); + unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr); + unsigned long compound_pages = + (1unmap_page) + xen_get_dma_ops(hwdev)->unmap_page(hwdev, handle, size, dir, attrs); + } else + __xen_dma_unmap_page(hwdev, handle, size, dir, attrs); +} + +static inline void xen_dma_sync_single_for_cpu(struct device *hwdev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(handle); + if (pfn_valid(pfn)) { + if (xen_get_dma_ops(hwdev)->sync_single_for_cpu) + xen_get_dma_ops(hwdev)->sync_single_for_cpu(hwdev, handle, size, dir); + } else + __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir); +} + +static inline void xen_dma_sync_single_for_device(struct device *hwdev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(handle); + if (pfn_valid(pfn)) { + if (xen_get_dma_ops(hwdev)->sync_single_for_device) + xen_get_dma_ops(hwdev)->sync_single_for_device(hwdev, handle, size, dir); + } else + __xen_dma_sync_single_for_device(hwdev, handle, size, dir); +} + +#endif /* _ASM_ARM_XEN_PAGE_COHERENT_H */ diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 3dd3d664c5c5..4658c937e173 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,9 +20,6 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif -#ifdef CONFIG_XEN - const struct dma_map_ops *dev_dma_ops; -#endif }; struct pdev_archdata { diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
Re: [PATCH 5/5] iommu/tegra-smmu: Add resv_regions ops
On 16/01/2019 20:50, Navneet Kumar wrote: Add support for get/put reserved regions. For any particular reason? If you're aiming to get VFIO-passthrough-with-MSIs working on Tegra, this probably won't be correct anyway... Signed-off-by: Navneet Kumar --- drivers/iommu/tegra-smmu.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 4b43c63..e848a45 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -22,6 +22,9 @@ #include #include +#define MSI_IOVA_BASE 0x800 ...because this arbitrary IOVA relies on writes to the MSI doorbell undergoing address translation at the IOMMU just like any other access. Looking at Tegra 134, that seems to implement an MSI doorbell internal to the PCIe root complex, far upstream of translation, therefore at the very least you'd need to reserve whatever IOVA region is shadowed by that doorbell in PCI memory space... +#define MSI_IOVA_LENGTH0x10 + struct tegra_smmu_group { struct list_head list; const struct tegra_smmu_group_soc *soc; @@ -882,6 +885,31 @@ static int tegra_smmu_of_xlate(struct device *dev, return iommu_fwspec_add_ids(dev, , 1); } +static void tegra_smmu_get_resv_regions(struct device *dev, + struct list_head *head) +{ + struct iommu_resv_region *region; + int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, + prot, IOMMU_RESV_SW_MSI); ...and as an untranslatable hardware MSI region, not a software-managed one. Robin. + if (!region) + return; + + list_add_tail(>list, head); + + iommu_dma_get_resv_regions(dev, head); +} + +static void tegra_smmu_put_resv_regions(struct device *dev, + struct list_head *head) +{ + struct iommu_resv_region *entry, *next; + + list_for_each_entry_safe(entry, next, head, list) + kfree(entry); +} + static const struct iommu_ops tegra_smmu_ops = { .capable = tegra_smmu_capable, .domain_alloc = tegra_smmu_domain_alloc, @@ -896,6 +924,9 @@ static const struct iommu_ops tegra_smmu_ops = { .iova_to_phys = tegra_smmu_iova_to_phys, .of_xlate = tegra_smmu_of_xlate, .pgsize_bitmap = SZ_4K, + + .get_resv_regions = tegra_smmu_get_resv_regions, + .put_resv_regions = tegra_smmu_put_resv_regions, }; static void tegra_smmu_ahb_enable(void) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/tegra-smmu: Fix domain_alloc
On 17/01/2019 15:13, Dmitry Osipenko wrote: 16.01.2019 23:50, Navneet Kumar пишет: * Allocate dma iova cookie for a domain while adding dma iommu devices. * Perform a stricter check for domain type parameter. Commit message should tell what exactly is getting "fixed". Apparently you're trying to support T132 ARM64 here. Signed-off-by: Navneet Kumar --- drivers/iommu/tegra-smmu.c | 43 +++ 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 543f7c9..ee4d8a8 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -271,8 +272,10 @@ static bool tegra_smmu_capable(enum iommu_cap cap) static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) { struct tegra_smmu_as *as; + int ret; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA && + type != IOMMU_DOMAIN_IDENTITY) return NULL; Should be better to write these lines like this for the sake of readability: if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_IDENTITY) return NULL; Furthermore, AFAICS there's no special handling being added for identity domains - don't pretend to support them by giving back a regular translation domain, that's just misleading and broken. as = kzalloc(sizeof(*as), GFP_KERNEL); @@ -281,26 +284,22 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE; + ret = (type == IOMMU_DOMAIN_DMA) ? iommu_get_dma_cookie(>domain) : + -ENODEV; This makes to fail allocation of domain that has type other than IOMMU_DOMAIN_DMA. Should be: if (type == IOMMU_DOMAIN_DMA) { err = iommu_get_dma_cookie(>domain); if (err) goto free_as; } + if (ret) + goto free_as; + as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO); - if (!as->pd) { - kfree(as); - return NULL; - } + if (!as->pd) + goto put_dma_cookie; as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL); - if (!as->count) { - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->count) + goto free_pd_range; as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL); - if (!as->pts) { - kfree(as->count); - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->pts) + goto free_pts; /* setup aperture */ as->domain.geometry.aperture_start = 0; @@ -308,6 +307,18 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->domain.geometry.force_aperture = true; return >domain; + +free_pts: + kfree(as->pts); +free_pd_range: + __free_page(as->pd); +put_dma_cookie: + if (type == IOMMU_DOMAIN_DMA) FWIW you don't strictly need that check - since domain->iova_cookie won't be set for other domain types anyway, iommu_put_dma_cookie() will simply ignore them. + iommu_put_dma_cookie(>domain); +free_as: + kfree(as); + + return NULL; } static void tegra_smmu_domain_free(struct iommu_domain *domain) How about not leaking the cookie when you free a DMA ops domain? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used
On Mon, Dec 10, 2018 at 08:37:58AM +0800, Dongli Zhang wrote: > This patch uses io_tlb_used to help check whether swiotlb buffer is full. > io_tlb_used is no longer used for only debugfs. It is also used to help > optimize swiotlb_tbl_map_single(). Please split this up. That is have the 'if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))' as a seperate patch. And the #ifdef folding in the previous patch. Also rebase on top of latest Linus please. > > Suggested-by: Joe Jin > Signed-off-by: Dongli Zhang > --- > kernel/dma/swiotlb.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 3979c2c..9300341 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -76,12 +76,10 @@ static phys_addr_t io_tlb_start, io_tlb_end; > */ > static unsigned long io_tlb_nslabs; > > -#ifdef CONFIG_DEBUG_FS > /* > * The number of used IO TLB block > */ > static unsigned long io_tlb_used; > -#endif > > /* > * This is a free list describing the number of free entries available from > @@ -489,6 +487,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, >* request and allocate a buffer from that IO TLB pool. >*/ > spin_lock_irqsave(_tlb_lock, flags); > + > + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used)) > + goto not_found; > + > index = ALIGN(io_tlb_index, stride); > if (index >= io_tlb_nslabs) > index = 0; > @@ -538,9 +540,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, > dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", > size); > return SWIOTLB_MAP_ERROR; > found: > -#ifdef CONFIG_DEBUG_FS > io_tlb_used += nslots; > -#endif > spin_unlock_irqrestore(_tlb_lock, flags); > > /* > @@ -602,9 +602,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, > phys_addr_t tlb_addr, > for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != > IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--) > io_tlb_list[i] = ++count; > > -#ifdef CONFIG_DEBUG_FS > io_tlb_used -= nslots; > -#endif > } > spin_unlock_irqrestore(_tlb_lock, flags); > } > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/tegra-smmu: Use non-secure register for flushing
16.01.2019 23:50, Navneet Kumar пишет: > Use PTB_ASID instead of SMMU_CONFIG to flush smmu. > PTB_ASID can be accessed from non-secure mode, SMMU_CONFIG cannot be. > Using SMMU_CONFIG could pose a problem when kernel doesn't have secure > mode access enabled from boot. > > Signed-off-by: Navneet Kumar > --- > drivers/iommu/tegra-smmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c > index ee4d8a8..fa175d9 100644 > --- a/drivers/iommu/tegra-smmu.c > +++ b/drivers/iommu/tegra-smmu.c > @@ -235,7 +235,7 @@ static inline void smmu_flush_tlb_group(struct tegra_smmu > *smmu, > > static inline void smmu_flush(struct tegra_smmu *smmu) > { > - smmu_readl(smmu, SMMU_CONFIG); > + smmu_readl(smmu, SMMU_PTB_ASID); > } > > static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp) > Driver writes to SMMU_CONFIG during of the probing. Looks like it's better to drop this patch for now and make it part of a complete solution that will add proper support for a stricter insecure-mode platforms. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/tegra-smmu: Fix domain_alloc
16.01.2019 23:50, Navneet Kumar пишет: > * Allocate dma iova cookie for a domain while adding dma iommu > devices. > * Perform a stricter check for domain type parameter. > Commit message should tell what exactly is getting "fixed". Apparently you're trying to support T132 ARM64 here. > Signed-off-by: Navneet Kumar > --- > drivers/iommu/tegra-smmu.c | 43 +++ > 1 file changed, 27 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c > index 543f7c9..ee4d8a8 100644 > --- a/drivers/iommu/tegra-smmu.c > +++ b/drivers/iommu/tegra-smmu.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -271,8 +272,10 @@ static bool tegra_smmu_capable(enum iommu_cap cap) > static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) > { > struct tegra_smmu_as *as; > + int ret; > > - if (type != IOMMU_DOMAIN_UNMANAGED) > + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA && > + type != IOMMU_DOMAIN_IDENTITY) > return NULL; Should be better to write these lines like this for the sake of readability: if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_IDENTITY) return NULL; > > as = kzalloc(sizeof(*as), GFP_KERNEL); > @@ -281,26 +284,22 @@ static struct iommu_domain > *tegra_smmu_domain_alloc(unsigned type) > > as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE; > > + ret = (type == IOMMU_DOMAIN_DMA) ? iommu_get_dma_cookie(>domain) : > + -ENODEV; This makes to fail allocation of domain that has type other than IOMMU_DOMAIN_DMA. Should be: if (type == IOMMU_DOMAIN_DMA) { err = iommu_get_dma_cookie(>domain); if (err) goto free_as; } > + if (ret) > + goto free_as; > + > as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO); > - if (!as->pd) { > - kfree(as); > - return NULL; > - } > + if (!as->pd) > + goto put_dma_cookie; > > as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL); > - if (!as->count) { > - __free_page(as->pd); > - kfree(as); > - return NULL; > - } > + if (!as->count) > + goto free_pd_range; > > as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL); > - if (!as->pts) { > - kfree(as->count); > - __free_page(as->pd); > - kfree(as); > - return NULL; > - } > + if (!as->pts) > + goto free_pts; > > /* setup aperture */ > as->domain.geometry.aperture_start = 0; > @@ -308,6 +307,18 @@ static struct iommu_domain > *tegra_smmu_domain_alloc(unsigned type) > as->domain.geometry.force_aperture = true; > > return >domain; > + > +free_pts: > + kfree(as->pts); > +free_pd_range: > + __free_page(as->pd); > +put_dma_cookie: > + if (type == IOMMU_DOMAIN_DMA) > + iommu_put_dma_cookie(>domain); > +free_as: > + kfree(as); > + > + return NULL; > } > > static void tegra_smmu_domain_free(struct iommu_domain *domain) > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"
On Thu, Jan 17, 2019 at 11:43:49AM +, Julien Grall wrote: > Looking at the change for arm64, you will always call dma-direct API. In > previous Linux version, xen-swiotlb will call dev->archdata.dev_dma_ops (a > copy of dev->dma_ops before setting Xen DMA ops) if not NULL. Does it mean > we expect dev->dma_ops to always be NULL and hence using dma-direct API? The way I understood the code from inspecting it and sking the maintainers a few askings is that for DOM0 we always use xen-swiotlb as the actual dma_map_ops, but then use the functions in page-coherent.h only to deal with cache maintainance, so it should be safe. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
On 15/01/2019 17:37, Pierre Morel wrote: The s390 iommu can only allow DMA transactions between the zPCI device entries start_dma and end_dma. Let's declare the regions before start_dma and after end_dma as reserved regions using the appropriate callback in iommu_ops. The reserved region may later be retrieved from sysfs or from the vfio iommu internal interface. For this particular case, I think the best solution is to give VFIO the ability to directly interrogate the domain geometry (which s390 appears to set correctly already). The idea of reserved regions was really for 'unexpected' holes inside the usable address space - using them to also describe places that are entirely outside that address space rather confuses things IMO. Furthermore, even if we *did* end up going down the route of actively reserving addresses beyond the usable aperture, it doesn't seem sensible for individual drivers to do it themselves when the core API already describes the relevant information generically. Robin. This seems to me related with the work Shameer has started on vfio_iommu_type1 so I add Alex and Shameer to the CC list. Pierre Morel (1): iommu/s390: Declare s390 iommu reserved regions drivers/iommu/s390-iommu.c | 29 + 1 file changed, 29 insertions(+) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
On 17/01/2019 10:23, Shameerali Kolothum Thodi wrote: Hi Pierre, -Original Message- From: Pierre Morel [mailto:pmo...@linux.ibm.com] Sent: 15 January 2019 17:37 To: gerald.schae...@de.ibm.com Cc: j...@8bytes.org; linux-s...@vger.kernel.org; iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; alex.william...@redhat.com; Shameerali Kolothum Thodi ; wall...@linux.ibm.com Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions The s390 iommu can only allow DMA transactions between the zPCI device entries start_dma and end_dma. Let's declare the regions before start_dma and after end_dma as reserved regions using the appropriate callback in iommu_ops. The reserved region may later be retrieved from sysfs or from the vfio iommu internal interface. Just in case you are planning to use the sysfs interface to retrieve the valid regions, and intend to use that in Qemu vfio path, please see the discussion here [1] (If you haven't seen this already) Thanks, Shameer [1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html This seems to me related with the work Shameer has started on vfio_iommu_type1 so I add Alex and Shameer to the CC list. Pierre Morel (1): iommu/s390: Declare s390 iommu reserved regions drivers/iommu/s390-iommu.c | 29 + 1 file changed, 29 insertions(+) -- 2.7.4 Thanks Shameer, Interesting discussion indeed. AFAIK the patch series you are working on will provide a way to retrieve the reserved region through the VFIO IOMMU interface, using capabilities in the VFIO_IOMMU_GET_INFO. Before this patch, the iommu_type1 was not able to retrieve the forbidden region from the s390_iommu. See this patch is a contribution, so that these regions will appear in the reserved list when the VFIO_IOMM_GET_INFO will be able to report the information. I am expecting to be able to to retrieve this information from the VFIO_IOMMU_GET_INFO syscall as soon as it is available. Regards, Pierre -- Pierre Morel Linux/KVM/QEMU in Böblingen - Germany ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/17/19 3:44 PM, Suravee Suthikulpanit wrote: > Joerg, > > On 1/17/19 12:08 AM, j...@8bytes.org wrote: >> On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote: >>> Actually, I am not sure how we would be missing the flush on the last >>> device. >>> In my test, I am seeing the flush command being issued correctly during >>> vfio_unmap_unpin(), which is after all devices are detached. >>> Although, I might be missing your point here. Could you please elaborate? >> >> Okay, you are right, the patch effectivly adds an unconditional flush of >> the domain on all iommus when the last device is detached. So it is >> correct in that regard. But that code path is also used in the >> iommu_unmap() path. >> >> The problem now is, that with your change we send flush commands to all >> IOMMUs in the unmap path when no device is attached to the domain. >> This will hurt performance there, no? >> >> Regards, >> >> Joerg >> > > Sounds like we need a way to track state of each IOMMU for a domain. > What if we define the following: > > enum IOMMU_DOMAIN_STATES { > DOMAIN_FREE = -1, > DOMAIN_DETACHED = 0, > DOMAIN_ATTACHED >= 1 > } > > We should be able to use the dev_iommu[] to help track the state. > - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE > - In do_attach(), we change to DOMAIN_ATTACH or we can increment the > count > if it is already in DOMAIN_ATTACH state. > - In do_detach(). we change to DOMAIN_DETACH. > > Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. > This should preserve previous behavior, and only add flushing condition to > the specific IOMMU in detached state. Please let me know what you think. > > Regards, > Suravee By the way, I just sent V2 of this patch since it might be more clear. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
From: Suravee Suthikulpanit When a VM is terminated, the VFIO driver detaches all pass-through devices from VFIO domain by clearing domain id and page table root pointer from each device table entry (DTE), and then invalidates the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages. Currently, the IOMMU driver keeps track of which IOMMU and how many devices are attached to the domain. When invalidate IOMMU pages, the driver checks if the IOMMU is still attached to the domain before issuing the invalidate page command. However, since VFIO has already detached all devices from the domain, the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as there is no IOMMU attached to the domain. This results in data corruption and could cause the PCI device to end up in indeterministic state. Fix this by introducing IOMMU domain states attached, detached, and free. Then only issuing the IOMMU pages invalidate command when domain is in attached and detached states. Cc: Boris Ostrovsky Signed-off-by: Brijesh Singh Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 25 - drivers/iommu/amd_iommu_types.h | 8 +++- 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 525659b88ade..1c64a26c50fb 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1248,7 +1248,14 @@ static void __domain_flush_pages(struct protection_domain *domain, build_inv_iommu_pages(, address, size, domain->id, pde); for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (!domain->dev_iommu[i]) + /* +* We issue the command only when the domain is +* in ATTACHED and DETACHED state. This is required +* since VFIO detaches all devices from the group +* before invalidating IOMMU pages. Please see +* vfio_iommu_type1_detach_group(). +*/ + if (domain->dev_iommu[i] == IOMMU_DOMAIN_STATE_FREE) continue; /* @@ -1292,7 +1299,8 @@ static void domain_flush_complete(struct protection_domain *domain) int i; for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (domain && !domain->dev_iommu[i]) + if (domain && + domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED) continue; /* @@ -1978,8 +1986,11 @@ static void do_attach(struct iommu_dev_data *dev_data, list_add(_data->list, >dev_list); /* Do reference counting */ - domain->dev_iommu[iommu->index] += 1; - domain->dev_cnt += 1; + if (domain->dev_iommu[iommu->index] < IOMMU_DOMAIN_STATE_ATTACHED) + domain->dev_iommu[iommu->index] = IOMMU_DOMAIN_STATE_ATTACHED; + else + domain->dev_iommu[iommu->index] += 1; + domain->dev_cnt += 1; /* Update device table */ set_dte_entry(dev_data->devid, domain, ats, dev_data->iommu_v2); @@ -2904,6 +2915,7 @@ static int protection_domain_init(struct protection_domain *domain) static struct protection_domain *protection_domain_alloc(void) { + int i; struct protection_domain *domain; domain = kzalloc(sizeof(*domain), GFP_KERNEL); @@ -2915,6 +2927,9 @@ static struct protection_domain *protection_domain_alloc(void) add_domain_to_list(domain); + for (i = 0; i < MAX_IOMMUS; i++) + domain->dev_iommu[i] = IOMMU_DOMAIN_STATE_FREE; + return domain; out_err: @@ -3364,7 +3379,7 @@ static int __flush_pasid(struct protection_domain *domain, int pasid, * prevent device TLB refill from IOMMU TLB */ for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { - if (domain->dev_iommu[i] == 0) + if (domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED) continue; ret = iommu_queue_command(amd_iommus[i], ); diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index eae0741f72dc..6d17cdcfa306 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -461,6 +461,12 @@ struct amd_irte_ops; #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0) +enum IOMMU_DOMAIN_STATES { + IOMMU_DOMAIN_STATE_FREE = -1, + IOMMU_DOMAIN_STATE_DETTACHED, + IOMMU_DOMAIN_STATE_ATTACHED +}; + /* * This structure contains generic data for IOMMU protection domains * independent of their use. @@ -480,7 +486,7 @@ struct protection_domain { unsigned long flags;/* flags to find out type of domain */ bool updated; /* complete domain flush required */ unsigned dev_cnt; /* devices assigned to this domain */ - unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
Re: Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"
(+ Leo Yan who reported a similar issues on xen-devel) Hi Christoph, On 16/01/2019 18:19, Christoph Hellwig wrote: Does this fix your problem? Thank you for the patch. This allows me to boot Dom0 on Juno r2. My knowledge of DMA is quite limited, so I may have missed something. Looking at the change for arm64, you will always call dma-direct API. In previous Linux version, xen-swiotlb will call dev->archdata.dev_dma_ops (a copy of dev->dma_ops before setting Xen DMA ops) if not NULL. Does it mean we expect dev->dma_ops to always be NULL and hence using dma-direct API? Cheers, -- Julien Grall ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/9] Use vm_insert_range and vm_insert_range_buggy
On Fri, Jan 11, 2019 at 8:31 PM Souptick Joarder wrote: > > Previouly drivers have their own way of mapping range of > kernel pages/memory into user vma and this was done by > invoking vm_insert_page() within a loop. > > As this pattern is common across different drivers, it can > be generalized by creating new functions and use it across > the drivers. > > vm_insert_range() is the API which could be used to mapped > kernel memory/pages in drivers which has considered vm_pgoff > > vm_insert_range_buggy() is the API which could be used to map > range of kernel memory/pages in drivers which has not considered > vm_pgoff. vm_pgoff is passed default as 0 for those drivers. > > We _could_ then at a later "fix" these drivers which are using > vm_insert_range_buggy() to behave according to the normal vm_pgoff > offsetting simply by removing the _buggy suffix on the function > name and if that causes regressions, it gives us an easy way to revert. > > There is an existing bug in [7/9], where user passed length is not > verified against object_count. For any value of length > object_count > it will end up overrun page array which could lead to a potential bug. > This is fixed as part of these conversion. > > Souptick Joarder (9): > mm: Introduce new vm_insert_range and vm_insert_range_buggy API > arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range > drivers/firewire/core-iso.c: Convert to use vm_insert_range_buggy > drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range > drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range > iommu/dma-iommu.c: Convert to use vm_insert_range > videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range_buggy > xen/gntdev.c: Convert to use vm_insert_range > xen/privcmd-buf.c: Convert to use vm_insert_range_buggy Any further comment on these patches ? > > arch/arm/mm/dma-mapping.c | 22 ++ > drivers/firewire/core-iso.c | 15 + > drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 17 + > drivers/gpu/drm/xen/xen_drm_front_gem.c | 18 ++--- > drivers/iommu/dma-iommu.c | 12 +--- > drivers/media/common/videobuf2/videobuf2-dma-sg.c | 22 ++ > drivers/xen/gntdev.c | 16 ++--- > drivers/xen/privcmd-buf.c | 8 +-- > include/linux/mm.h| 4 ++ > mm/memory.c | 81 > +++ > mm/nommu.c| 14 > 11 files changed, 129 insertions(+), 100 deletions(-) > > -- > 1.9.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] i2c: imx: dma map the i2c data i/o register
Hi Robin, On 16.01.2019 19:55, Robin Murphy wrote: > On 16/01/2019 16:17, Laurentiu Tudor wrote: >> This is an attempt to fix an iommu exception when doing dma to the >> i2c controller with EDMA. Without these mappings the smmu raises a >> context fault [1] exactly with the address of the i2c data i/o reg. >> This was seen on an NXP LS1043A chip while working on enabling SMMU. > > Rather than gradually adding much the same code to potentially every > possible client driver, can it not be implemented once in the edma > driver as was done for pl330 and rcar-dmac? That also sidesteps any of > the nastiness of smuggling a dma_addr_t via a phys_addr_t variable. Thanks for the pointer. I was actually unsure where this should be tackled: either i2c or dma side. Plus I somehow managed to completely miss the support added in the dma drivers you mention. I'll start looking into stealing some of your code [1]. :-) [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d6d74e22096543cb3b35e717cf1b9aea3655f37 --- Best Regards, Laurentiu > >> [1] arm-smmu 900.iommu: Unhandled context fault: fsr=0x402, >> iova=0x02180004, fsynr=0x150021, cb=7 >> >> Signed-off-by: Laurentiu Tudor >> --- >> drivers/i2c/busses/i2c-imx.c | 57 +--- >> 1 file changed, 47 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c >> index 4e34b1572756..07cc8f4b45b9 100644 >> --- a/drivers/i2c/busses/i2c-imx.c >> +++ b/drivers/i2c/busses/i2c-imx.c >> @@ -202,6 +202,9 @@ struct imx_i2c_struct { >> struct pinctrl_state *pinctrl_pins_gpio; >> struct imx_i2c_dma *dma; >> + >> + dma_addr_t dma_tx_addr; >> + dma_addr_t dma_rx_addr; >> }; >> static const struct imx_i2c_hwdata imx1_i2c_hwdata = { >> @@ -274,17 +277,20 @@ static inline unsigned char >> imx_i2c_read_reg(struct imx_i2c_struct *i2c_imx, >> /* Functions for DMA support */ >> static int i2c_imx_dma_request(struct imx_i2c_struct *i2c_imx, >> - dma_addr_t phy_addr) >> + phys_addr_t phy_addr) >> { >> struct imx_i2c_dma *dma; >> struct dma_slave_config dma_sconfig; >> struct device *dev = _imx->adapter.dev; >> int ret; >> + phys_addr_t i2dr_pa; >> dma = devm_kzalloc(dev, sizeof(*dma), GFP_KERNEL); >> if (!dma) >> return -ENOMEM; >> + i2dr_pa = phy_addr + (IMX_I2C_I2DR << i2c_imx->hwdata->regshift); >> + >> dma->chan_tx = dma_request_chan(dev, "tx"); >> if (IS_ERR(dma->chan_tx)) { >> ret = PTR_ERR(dma->chan_tx); >> @@ -293,15 +299,25 @@ static int i2c_imx_dma_request(struct >> imx_i2c_struct *i2c_imx, >> goto fail_al; >> } >> - dma_sconfig.dst_addr = phy_addr + >> - (IMX_I2C_I2DR << i2c_imx->hwdata->regshift); >> + i2c_imx->dma_tx_addr = dma_map_resource(dma->chan_tx->device->dev, >> + i2dr_pa, >> + DMA_SLAVE_BUSWIDTH_1_BYTE, >> + DMA_MEM_TO_DEV, 0); >> + ret = dma_mapping_error(dma->chan_tx->device->dev, >> + i2c_imx->dma_tx_addr); >> + if (ret) { >> + dev_err(dev, "can't dma map tx destination (%d)\n", ret); >> + goto fail_tx; >> + } >> + >> + dma_sconfig.dst_addr = i2c_imx->dma_tx_addr; >> dma_sconfig.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE; >> dma_sconfig.dst_maxburst = 1; >> dma_sconfig.direction = DMA_MEM_TO_DEV; >> ret = dmaengine_slave_config(dma->chan_tx, _sconfig); >> if (ret < 0) { >> dev_err(dev, "can't configure tx channel (%d)\n", ret); >> - goto fail_tx; >> + goto fail_tx_dma; >> } >> dma->chan_rx = dma_request_chan(dev, "rx"); >> @@ -309,18 +325,28 @@ static int i2c_imx_dma_request(struct >> imx_i2c_struct *i2c_imx, >> ret = PTR_ERR(dma->chan_rx); >> if (ret != -ENODEV && ret != -EPROBE_DEFER) >> dev_err(dev, "can't request DMA rx channel (%d)\n", ret); >> - goto fail_tx; >> + goto fail_tx_dma; >> } >> - dma_sconfig.src_addr = phy_addr + >> - (IMX_I2C_I2DR << i2c_imx->hwdata->regshift); >> + i2c_imx->dma_rx_addr = dma_map_resource(dma->chan_rx->device->dev, >> + i2dr_pa, >> + DMA_SLAVE_BUSWIDTH_1_BYTE, >> + DMA_DEV_TO_MEM, 0); >> + ret = dma_mapping_error(dma->chan_rx->device->dev, >> + i2c_imx->dma_rx_addr); >> + if (ret) { >> + dev_err(dev, "can't dma map rx source (%d)\n", ret); >> + goto fail_rx; >> + } >> + >> + dma_sconfig.src_addr = i2c_imx->dma_rx_addr; >> dma_sconfig.src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE; >> dma_sconfig.src_maxburst = 1; >> dma_sconfig.direction = DMA_DEV_TO_MEM; >> ret = dmaengine_slave_config(dma->chan_rx, _sconfig); >> if (ret < 0)
Re: use generic DMA mapping code in powerpc V4
On Thu, Jan 17, 2019 at 10:21:11AM +0100, Christian Zigotzky wrote: > The X1000 boots and the PASEMI onboard ethernet works! > > Bad news for the X5000 (P5020 board). U-Boot loads the kernel and the dtb > file. Then the kernel starts but it doesn't find any hard disks > (partitions). Thanks for bisecting so far, and lets stop here until I manage to resolve the problem. Can you send me the .config and the dtb file for this board? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings
Adding a device tree option for arm smmu to enable non-cacheable memory for page tables. We already enable a smmu feature for coherent walk based on whether the smmu device is dma-coherent or not. Have an option to enable non-cacheable page table memory to force set it for particular smmu devices. Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index af18a7e7f917..7ebbcf1b2eb3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -188,6 +188,7 @@ struct arm_smmu_device { u32 features; #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0) +#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1) u32 options; enum arm_smmu_arch_version version; enum arm_smmu_implementationmodel; @@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding; static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" }, + { ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" }, { 0, NULL}, }; @@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + /* Non coherent page table mappings only for Stage-1 */ + if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT && + smmu_domain->stage == ARM_SMMU_DOMAIN_S1) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT; + smmu_domain->smmu = smmu; pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain); if (!pgtbl_ops) { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] iommu/arm: Add support for non-coherent page tables
As discussed in the Qcom system cache support thread [1], it is imperative that we enable the support for non-cacheable page tables for SMMU implementations for which removing snoop latency on walks by making mappings as non-cacheable, outweighs the cost of cache maintenance on PTE updates. This series adds a new SMMU device tree option to let the particular SMMU configuration setup cacheable or non-cacheable mappings for page-tables out of box. We set a new quirk for i/o page tables - IO_PGTABLE_QUIRK_NON_COHERENT, that lets us set different TCR configurations. This quirk enables the non-cacheable page tables for all masters sitting on SMMU. Should this control be available per smmu_domain as each master may have a different perf requirement? Enabling this for the entire SMMU may not be desirable for all masters. [1] https://lore.kernel.org/patchwork/patch/1020906/ Vivek Gautam (2): iommu/io-pgtable-arm: Add support for non-coherent page tables iommu/arm-smmu: Add support for non-coherent page table mappings drivers/iommu/arm-smmu.c | 7 +++ drivers/iommu/io-pgtable-arm.c | 17 - drivers/iommu/io-pgtable.h | 6 ++ 3 files changed, 25 insertions(+), 5 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables
>From Robin's comment [1] about touching TCR configurations - "TBH if we're going to touch the TCR attributes at all then we should probably correct that sloppiness first - there's an occasional argument for using non-cacheable pagetables even on a coherent SMMU if reducing snoop traffic/latency on walks outweighs the cost of cache maintenance on PTE updates, but anyone thinking they can get that by overriding dma-coherent silently gets the worst of both worlds thanks to this current TCR value." We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force anybody _not_ using dma-coherent smmu to have non-cacheable page table mappings. Having another quirk flag can help in having non-cacheable memory for page tables once and for all. [1] https://lore.kernel.org/patchwork/patch/1020906/ Signed-off-by: Vivek Gautam --- drivers/iommu/io-pgtable-arm.c | 17 - drivers/iommu/io-pgtable.h | 6 ++ 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 237cacd4a62b..c76919c30f1a 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) struct arm_lpae_io_pgtable *data; if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | - IO_PGTABLE_QUIRK_NON_STRICT)) + IO_PGTABLE_QUIRK_NON_STRICT | + IO_PGTABLE_QUIRK_NON_COHERENT)) return NULL; data = arm_lpae_alloc_pgtable(cfg); @@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) return NULL; /* TCR */ - reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); + reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT; + + if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT) + reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT; + else + reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT | + ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT; switch (ARM_LPAE_GRANULE(data)) { case SZ_4K: @@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) /* The NS quirk doesn't apply at stage 2 */ if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA | - IO_PGTABLE_QUIRK_NON_STRICT)) + IO_PGTABLE_QUIRK_NON_STRICT | + IO_PGTABLE_QUIRK_NON_COHERENT)) return NULL; data = arm_lpae_alloc_pgtable(cfg); diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 47d5ae559329..46604cf7b017 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -75,6 +75,11 @@ struct io_pgtable_cfg { * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs * on unmap, for DMA domains using the flush queue mechanism for * delayed invalidation. +* +* IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for +* pagetables even on a coherent SMMU for cases where reducing +* snoop traffic/latency on walks outweighs the cost of cache +* maintenance on PTE updates. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -82,6 +87,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3) #define IO_PGTABLE_QUIRK_NO_DMA BIT(4) #define IO_PGTABLE_QUIRK_NON_STRICT BIT(5) + #define IO_PGTABLE_QUIRK_NON_COHERENT BIT(6) unsigned long quirks; unsigned long pgsize_bitmap; unsigned intias; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
Hi Pierre, > -Original Message- > From: Pierre Morel [mailto:pmo...@linux.ibm.com] > Sent: 15 January 2019 17:37 > To: gerald.schae...@de.ibm.com > Cc: j...@8bytes.org; linux-s...@vger.kernel.org; > iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > alex.william...@redhat.com; Shameerali Kolothum Thodi > ; wall...@linux.ibm.com > Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions > > The s390 iommu can only allow DMA transactions between the zPCI device > entries start_dma and end_dma. > > Let's declare the regions before start_dma and after end_dma as > reserved regions using the appropriate callback in iommu_ops. > > The reserved region may later be retrieved from sysfs or from > the vfio iommu internal interface. Just in case you are planning to use the sysfs interface to retrieve the valid regions, and intend to use that in Qemu vfio path, please see the discussion here [1] (If you haven't seen this already) Thanks, Shameer [1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html > This seems to me related with the work Shameer has started on > vfio_iommu_type1 so I add Alex and Shameer to the CC list. > > Pierre Morel (1): > iommu/s390: Declare s390 iommu reserved regions > > drivers/iommu/s390-iommu.c | 29 + > 1 file changed, 29 insertions(+) > > -- > 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: use generic DMA mapping code in powerpc V4
Hi All, I compiled the fixed '257002094bc5935dd63207a380d9698ab81f0775' (powerpc/dma: use the dma-direct allocator for coherent platforms) today. git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a git checkout 257002094bc5935dd63207a380d9698ab81f0775 Link to the Git: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6 env LANG=C make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc zImage env LANG=C make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc uImage The X1000 boots and the PASEMI onboard ethernet works! Bad news for the X5000 (P5020 board). U-Boot loads the kernel and the dtb file. Then the kernel starts but it doesn't find any hard disks (partitions). Cheers, Christian On 15 January 2019 at 4:17PM, Christoph Hellwig wrote: So 257002094bc5935dd63207a380d9698ab81f0775 above is the fixed version for the commit - this switched the ifdef in dma.c around that I had inverted. Can you try that one instead? And then move on with the commits after it in the updated powerpc-dma.6 branch - they are identical to the original branch except for carrying this fix forward. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain
Joerg, On 1/17/19 12:08 AM, j...@8bytes.org wrote: > On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote: >> Actually, I am not sure how we would be missing the flush on the last device. >> In my test, I am seeing the flush command being issued correctly during >> vfio_unmap_unpin(), which is after all devices are detached. >> Although, I might be missing your point here. Could you please elaborate? > > Okay, you are right, the patch effectivly adds an unconditional flush of > the domain on all iommus when the last device is detached. So it is > correct in that regard. But that code path is also used in the > iommu_unmap() path. > > The problem now is, that with your change we send flush commands to all > IOMMUs in the unmap path when no device is attached to the domain. > This will hurt performance there, no? > > Regards, > > Joerg > Sounds like we need a way to track state of each IOMMU for a domain. What if we define the following: enum IOMMU_DOMAIN_STATES { DOMAIN_FREE = -1, DOMAIN_DETACHED = 0, DOMAIN_ATTACHED >= 1 } We should be able to use the dev_iommu[] to help track the state. - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE - In do_attach(), we change to DOMAIN_ATTACH or we can increment the count if it is already in DOMAIN_ATTACH state. - In do_detach(). we change to DOMAIN_DETACH. Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0. This should preserve previous behavior, and only add flushing condition to the specific IOMMU in detached state. Please let me know what you think. Regards, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu