Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map
On Mon, May 06, 2019 at 07:52:03PM +0100, Tom Murphy via iommu wrote: > We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap. > iommu_map doesn’t lock while mapping and so no two calls should touch > the same iova range. The AMD driver already handles the page table page > allocations without locks so we can safely remove the locks. Btw, this really should be a separate patch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map
On Tue, Jun 4, 2019 at 11:11 AM Robin Murphy wrote: > > On 06/05/2019 19:52, Tom Murphy wrote: > > Add a gfp_t parameter to the iommu_ops::map function. > > Remove the needless locking in the AMD iommu driver. > > > > The iommu_ops::map function (or the iommu_map function which calls it) > > was always supposed to be sleepable (according to Joerg's comment in > > this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so > > should probably have had a "might_sleep()" since it was written. However > > currently the dma-iommu api can call iommu_map in an atomic context, > > which it shouldn't do. This doesn't cause any problems because any iommu > > driver which uses the dma-iommu api uses gfp_atomic in it's > > iommu_ops::map function. But doing this wastes the memory allocators > > atomic pools. > > Hmm, in some cases iommu_ops::unmap may need to allocate as well, > primarily if it needs to split a hugepage mapping. Should we pass flags > through there as well, or are we prepared to assume that that case will > happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't > happen for DMA ops, but it's conceivable that other users such as GPU > drivers might make partial unmaps, and I doubt we could totally rule out > the wackiest ones doing so from non-sleeping contexts. > jfyi, today we (well, drm/msm) only unmap entire buffers, so assuming there isn't any coelescense of adjacent buffers that happen to form a hugepage on ::map(), we are probably ok on ::unmap().. we do always only call ::map or ::unmap under sleepable conditions. btw, would it be simpler to just make gfp_t a domain a attribute? That seems like it would be less churn, but maybe I'm overlooking something. BR, -R
Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map
On Tue, Jun 4, 2019 at 7:11 PM Robin Murphy wrote: > > On 06/05/2019 19:52, Tom Murphy wrote: > > Add a gfp_t parameter to the iommu_ops::map function. > > Remove the needless locking in the AMD iommu driver. > > > > The iommu_ops::map function (or the iommu_map function which calls it) > > was always supposed to be sleepable (according to Joerg's comment in > > this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so > > should probably have had a "might_sleep()" since it was written. However > > currently the dma-iommu api can call iommu_map in an atomic context, > > which it shouldn't do. This doesn't cause any problems because any iommu > > driver which uses the dma-iommu api uses gfp_atomic in it's > > iommu_ops::map function. But doing this wastes the memory allocators > > atomic pools. > > Hmm, in some cases iommu_ops::unmap may need to allocate as well, > primarily if it needs to split a hugepage mapping. Should we pass flags Are you sure that's the case? I don't see that in the amd or intel iommu_ops::unmap code and from looking at the intel code it seems like we actually unmap the whole large page: /* Cope with horrid API which requires us to unmap more than the size argument if it happens to be a large-page mapping. */ BUG_ON(!pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, )); if (size < VTD_PAGE_SIZE << level_to_offset_bits(level)) size = VTD_PAGE_SIZE << level_to_offset_bits(level); > through there as well, or are we prepared to assume that that case will > happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't > happen for DMA ops, but it's conceivable that other users such as GPU > drivers might make partial unmaps, and I doubt we could totally rule out > the wackiest ones doing so from non-sleeping contexts. > > Robin. > > > We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap. > > iommu_map doesn’t lock while mapping and so no two calls should touch > > the same iova range. The AMD driver already handles the page table page > > allocations without locks so we can safely remove the locks. > > > > Signed-off-by: Tom Murphy > > --- > > drivers/iommu/amd_iommu.c | 14 --- > > drivers/iommu/arm-smmu-v3.c| 2 +- > > drivers/iommu/arm-smmu.c | 2 +- > > drivers/iommu/dma-iommu.c | 6 ++--- > > drivers/iommu/exynos-iommu.c | 2 +- > > drivers/iommu/intel-iommu.c| 2 +- > > drivers/iommu/iommu.c | 43 +- > > drivers/iommu/ipmmu-vmsa.c | 2 +- > > drivers/iommu/msm_iommu.c | 2 +- > > drivers/iommu/mtk_iommu.c | 2 +- > > drivers/iommu/mtk_iommu_v1.c | 2 +- > > drivers/iommu/omap-iommu.c | 2 +- > > drivers/iommu/qcom_iommu.c | 2 +- > > drivers/iommu/rockchip-iommu.c | 2 +- > > drivers/iommu/s390-iommu.c | 2 +- > > drivers/iommu/tegra-gart.c | 2 +- > > drivers/iommu/tegra-smmu.c | 2 +- > > include/linux/iommu.h | 21 - > > 18 files changed, 78 insertions(+), 34 deletions(-) > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index ebd062522cf5..ea3a5dc61bb0 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -3092,7 +3092,8 @@ static int amd_iommu_attach_device(struct > > iommu_domain *dom, > > } > > > > static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, > > - phys_addr_t paddr, size_t page_size, int iommu_prot) > > + phys_addr_t paddr, size_t page_size, int iommu_prot, > > + gfp_t gfp) > > { > > struct protection_domain *domain = to_pdomain(dom); > > int prot = 0; > > @@ -3106,9 +3107,7 @@ static int amd_iommu_map(struct iommu_domain *dom, > > unsigned long iova, > > if (iommu_prot & IOMMU_WRITE) > > prot |= IOMMU_PROT_IW; > > > > - mutex_lock(>api_lock); > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, > > GFP_KERNEL); > > - mutex_unlock(>api_lock); > > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); > > > > domain_flush_np_cache(domain, iova, page_size); > > > > @@ -3119,16 +3118,11 @@ static size_t amd_iommu_unmap(struct iommu_domain > > *dom, unsigned long iova, > > size_t page_size) > > { > > struct protection_domain *domain = to_pdomain(dom); > > - size_t unmap_size; > > > > if (domain->mode == PAGE_MODE_NONE) > > return 0; > > > > - mutex_lock(>api_lock); > > - unmap_size = iommu_unmap_page(domain, iova, page_size); > > - mutex_unlock(>api_lock); > > - > > - return unmap_size; > > + return iommu_unmap_page(domain, iova, page_size); > > } > > > > static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom, > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > > index d3880010c6cf..e5c48089b49f 100644 > > ---
Re: [PATCH v3 1/4] iommu: Add gfp parameter to iommu_ops::map
On 06/05/2019 19:52, Tom Murphy wrote: Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver. The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools. Hmm, in some cases iommu_ops::unmap may need to allocate as well, primarily if it needs to split a hugepage mapping. Should we pass flags through there as well, or are we prepared to assume that that case will happen rarely enough that it's fair to just assume GFP_ATOMIC? It won't happen for DMA ops, but it's conceivable that other users such as GPU drivers might make partial unmaps, and I doubt we could totally rule out the wackiest ones doing so from non-sleeping contexts. Robin. We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap. iommu_map doesn’t lock while mapping and so no two calls should touch the same iova range. The AMD driver already handles the page table page allocations without locks so we can safely remove the locks. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 14 --- drivers/iommu/arm-smmu-v3.c| 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 6 ++--- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c| 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- include/linux/iommu.h | 21 - 18 files changed, 78 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ebd062522cf5..ea3a5dc61bb0 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3092,7 +3092,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, } static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, -phys_addr_t paddr, size_t page_size, int iommu_prot) +phys_addr_t paddr, size_t page_size, int iommu_prot, +gfp_t gfp) { struct protection_domain *domain = to_pdomain(dom); int prot = 0; @@ -3106,9 +3107,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - mutex_lock(>api_lock); - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); - mutex_unlock(>api_lock); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); @@ -3119,16 +3118,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, size_t page_size) { struct protection_domain *domain = to_pdomain(dom); - size_t unmap_size; if (domain->mode == PAGE_MODE_NONE) return 0; - mutex_lock(>api_lock); - unmap_size = iommu_unmap_page(domain, iova, page_size); - mutex_unlock(>api_lock); - - return unmap_size; + return iommu_unmap_page(domain, iova, page_size); } static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom, diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index d3880010c6cf..e5c48089b49f 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1777,7 +1777,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 045d93884164..2d50db55b788 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1286,7 +1286,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, -