Re: [PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited
On Wed, Nov 27, 2019 at 06:22:57PM +, Thomas Hellstrom wrote: > > bool dma_addressing_limited(struct device *dev) > > { > > + if (force_dma_unencrypted(dev)) > > + return true; > > return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) < > > dma_get_required_mask(dev); > > } > > Any chance to have the case > > (swiotlb_force == SWIOTLB_FORCE) > > also included? We have a hard time handling that in generic code. Do we have any good use case for SWIOTLB_FORCE not that we have force_dma_unencrypted? I'd love to be able to get rid of it.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Wed, Nov 27, 2019 at 02:11:28PM -0800, David Rientjes wrote: > So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic > even when the DMA needs to be unencrypted for SEV. Christoph's suggestion > was to wire up dmapool in kernel/dma/remap.c for this. Is that necessary > to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or > can we do it within the DMA API itself so it's transparent to the driver? It needs to be transparent to the driver. Lots of drivers use GFP_ATOMIC dma allocations, and all of them are broken on SEV setups currently. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/8] iommu/vt-d: Implement second level page table ops
This adds the implementation of page table callbacks for the second level page table. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 81 + 1 file changed, 81 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 7752ff299cb5..96ead4e3395a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -413,6 +413,7 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info, } const struct iommu_ops intel_iommu_ops; +static const struct pgtable_ops second_lvl_pgtable_ops; static bool translation_pre_enabled(struct intel_iommu *iommu) { @@ -1720,6 +1721,7 @@ static struct dmar_domain *alloc_domain(int flags) domain->nid = NUMA_NO_NODE; domain->flags = flags; domain->has_iotlb_device = false; + domain->ops = _lvl_pgtable_ops; INIT_LIST_HEAD(>devices); return domain; @@ -2334,6 +2336,85 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return 0; } +static int second_lvl_domain_map_range(struct dmar_domain *domain, + unsigned long iova, phys_addr_t paddr, + size_t size, int prot) +{ + return __domain_mapping(domain, iova >> VTD_PAGE_SHIFT, NULL, + paddr >> VTD_PAGE_SHIFT, + aligned_nrpages(paddr, size), prot); +} + +static struct page * +second_lvl_domain_unmap_range(struct dmar_domain *domain, + unsigned long iova, size_t size) +{ + unsigned long start_pfn, end_pfn, nrpages; + + start_pfn = mm_to_dma_pfn(IOVA_PFN(iova)); + nrpages = aligned_nrpages(iova, size); + end_pfn = start_pfn + nrpages - 1; + + return dma_pte_clear_level(domain, agaw_to_level(domain->agaw), + domain->pgd, 0, start_pfn, end_pfn, NULL); +} + +static phys_addr_t +second_lvl_domain_iova_to_phys(struct dmar_domain *domain, + unsigned long iova) +{ + struct dma_pte *pte; + int level = 0; + u64 phys = 0; + + pte = pfn_to_dma_pte(domain, iova >> VTD_PAGE_SHIFT, ); + if (pte) + phys = dma_pte_addr(pte); + + return phys; +} + +static void +second_lvl_domain_flush_tlb_range(struct dmar_domain *domain, + struct intel_iommu *iommu, + unsigned long addr, size_t size, + bool ih) +{ + unsigned long pages = aligned_nrpages(addr, size); + u16 did = domain->iommu_did[iommu->seq_id]; + unsigned int mask; + + if (pages) { + mask = ilog2(__roundup_pow_of_two(pages)); + addr &= (u64)-1 << (VTD_PAGE_SHIFT + mask); + } else { + mask = MAX_AGAW_PFN_WIDTH; + addr = 0; + } + + /* +* Fallback to domain selective flush if no PSI support or the size is +* too big. +* PSI requires page size to be 2 ^ x, and the base address is naturally +* aligned to the size +*/ + if (!pages || !cap_pgsel_inv(iommu->cap) || + mask > cap_max_amask_val(iommu->cap)) + iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); + else + iommu->flush.iotlb_inv(iommu, did, addr | ((int)ih << 6), + mask, DMA_TLB_PSI_FLUSH); + + iommu_flush_dev_iotlb(domain, addr, mask); +} + +static const struct pgtable_ops second_lvl_pgtable_ops = { + .map_range = second_lvl_domain_map_range, + .unmap_range= second_lvl_domain_unmap_range, + .iova_to_phys = second_lvl_domain_iova_to_phys, + .flush_tlb_range= second_lvl_domain_flush_tlb_range, +}; + static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, struct scatterlist *sg, unsigned long phys_pfn, unsigned long nr_pages, int prot) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/8] iommu/vt-d: Implement first level page table ops
This adds the implementation of page table callbacks for the first level page table. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 56 + 1 file changed, 56 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a314892ee72b..695a7a5fbe8e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -414,6 +414,7 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info, } const struct iommu_ops intel_iommu_ops; +static const struct pgtable_ops first_lvl_pgtable_ops; static const struct pgtable_ops second_lvl_pgtable_ops; static bool translation_pre_enabled(struct intel_iommu *iommu) @@ -2330,6 +2331,61 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return 0; } +static int first_lvl_domain_map_range(struct dmar_domain *domain, + unsigned long iova, phys_addr_t paddr, + size_t size, int prot) +{ + return first_lvl_map_range(domain, PAGE_ALIGN(iova), + round_up(iova + size, PAGE_SIZE), + PAGE_ALIGN(paddr), prot); +} + +static struct page * +first_lvl_domain_unmap_range(struct dmar_domain *domain, +unsigned long iova, size_t size) +{ + return first_lvl_unmap_range(domain, PAGE_ALIGN(iova), +round_up(iova + size, PAGE_SIZE)); +} + +static phys_addr_t +first_lvl_domain_iova_to_phys(struct dmar_domain *domain, + unsigned long iova) +{ + return first_lvl_iova_to_phys(domain, iova); +} + +static void +first_lvl_domain_flush_tlb_range(struct dmar_domain *domain, +struct intel_iommu *iommu, +unsigned long iova, size_t size, bool ih) +{ + unsigned long pages = aligned_nrpages(iova, size); + u16 did = domain->iommu_did[iommu->seq_id]; + unsigned int mask; + + if (pages) { + mask = ilog2(__roundup_pow_of_two(pages)); + iova &= (u64)-1 << (VTD_PAGE_SHIFT + mask); + } else { + mask = MAX_AGAW_PFN_WIDTH; + iova = 0; + pages = -1; + } + + iommu->flush.p_iotlb_inv(iommu, did, domain->default_pasid, +iova, pages, ih); + + iommu_flush_dev_iotlb(domain, iova, mask); +} + +static const struct pgtable_ops first_lvl_pgtable_ops = { + .map_range = first_lvl_domain_map_range, + .unmap_range= first_lvl_domain_unmap_range, + .iova_to_phys = first_lvl_domain_iova_to_phys, + .flush_tlb_range= first_lvl_domain_flush_tlb_range, +}; + static int second_lvl_domain_map_range(struct dmar_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 8/8] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr
This adds the Intel VT-d specific callback of setting DOMAIN_ATTR_NESTING domain attribution. It is necessary to let the VT-d driver know that the domain represents a virutual machine which requires the IOMMU hardware to support nested translation mode. Return success if the IOMMU hardware suports nested mode, otherwise failure. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Signed-off-by: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 56 + 1 file changed, 56 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 68b2f98ecd65..ee717dcb9644 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -308,6 +308,12 @@ static int hw_pass_through = 1; */ #define DOMAIN_FLAG_LOSE_CHILDREN BIT(1) +/* + * Domain represents a virtual machine which demands iommu nested + * translation mode support. + */ +#define DOMAIN_FLAG_NESTING_MODE BIT(2) + #define for_each_domain_iommu(idx, domain) \ for (idx = 0; idx < g_num_of_iommus; idx++) \ if (domain->iommu_refcnt[idx]) @@ -5929,6 +5935,24 @@ static inline bool iommu_pasid_support(void) return ret; } +static inline bool nested_mode_support(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + bool ret = true; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) { + ret = false; + break; + } + } + rcu_read_unlock(); + + return ret; +} + static bool intel_iommu_capable(enum iommu_cap cap) { if (cap == IOMMU_CAP_CACHE_COHERENCY) @@ -6305,10 +6329,42 @@ static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain, return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO; } +static int +intel_iommu_domain_set_attr(struct iommu_domain *domain, + enum iommu_attr attr, void *data) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + unsigned long flags; + int ret = 0; + + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; + + switch (attr) { + case DOMAIN_ATTR_NESTING: + spin_lock_irqsave(_domain_lock, flags); + if (nested_mode_support() && + list_empty(_domain->devices)) { + dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE; + dmar_domain->ops = _lvl_pgtable_ops; + } else { + ret = -ENODEV; + } + spin_unlock_irqrestore(_domain_lock, flags); + break; + default: + ret = -EINVAL; + break; + } + + return ret; +} + const struct iommu_ops intel_iommu_ops = { .capable= intel_iommu_capable, .domain_alloc = intel_iommu_domain_alloc, .domain_free= intel_iommu_domain_free, + .domain_set_attr= intel_iommu_domain_set_attr, .attach_dev = intel_iommu_attach_device, .detach_dev = intel_iommu_detach_device, .aux_attach_dev = intel_iommu_aux_attach_device, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 7/8] iommu/vt-d: Identify domains using first level page table
This checks whether a domain should use first level page table for map/unmap. And if so, we should attach the domain to the device in first level translation mode. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Cc: Sanjay Kumar Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 63 +++-- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 695a7a5fbe8e..68b2f98ecd65 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -417,6 +417,11 @@ const struct iommu_ops intel_iommu_ops; static const struct pgtable_ops first_lvl_pgtable_ops; static const struct pgtable_ops second_lvl_pgtable_ops; +static inline bool domain_pgtable_first_lvl(struct dmar_domain *domain) +{ + return domain->ops == _lvl_pgtable_ops; +} + static bool translation_pre_enabled(struct intel_iommu *iommu) { return (iommu->flags & VTD_FLAG_TRANS_PRE_ENABLED); @@ -1702,6 +1707,44 @@ static bool first_lvl_5lp_support(void) return first_level_5lp_supported; } +/* + * Check and return whether first level is used by default for + * DMA translation. + */ +static bool first_level_by_default(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + static int first_level_support = -1; + + if (likely(first_level_support != -1)) + return first_level_support; + + first_level_support = 1; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) { + first_level_support = 0; + break; + } +#ifdef CONFIG_X86 + /* +* Currently we don't support paging mode missmatching. +* Could be turned on later if there is a case. +*/ + if (cpu_feature_enabled(X86_FEATURE_LA57) && + !cap_5lp_support(iommu->cap)) { + first_level_support = 0; + break; + } +#endif /* #ifdef CONFIG_X86 */ + } + rcu_read_unlock(); + + return first_level_support; +} + static struct dmar_domain *alloc_domain(int flags) { struct dmar_domain *domain; @@ -1714,7 +1757,10 @@ static struct dmar_domain *alloc_domain(int flags) domain->nid = NUMA_NO_NODE; domain->flags = flags; domain->has_iotlb_device = false; - domain->ops = _lvl_pgtable_ops; + if (first_level_by_default()) + domain->ops = _lvl_pgtable_ops; + else + domain->ops = _lvl_pgtable_ops; domain->first_lvl_5lp = first_lvl_5lp_support(); spin_lock_init(>page_table_lock); INIT_LIST_HEAD(>devices); @@ -2710,6 +2756,11 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (hw_pass_through && domain_type_is_si(domain)) ret = intel_pasid_setup_pass_through(iommu, domain, dev, PASID_RID2PASID); + else if (domain_pgtable_first_lvl(domain)) + ret = intel_pasid_setup_first_level(iommu, dev, + domain->pgd, PASID_RID2PASID, + domain->iommu_did[iommu->seq_id], + PASID_FLAG_SUPERVISOR_MODE); else ret = intel_pasid_setup_second_level(iommu, domain, dev, PASID_RID2PASID); @@ -5597,8 +5648,14 @@ static int aux_domain_add_dev(struct dmar_domain *domain, goto attach_failed; /* Setup the PASID entry for mediated devices: */ - ret = intel_pasid_setup_second_level(iommu, domain, dev, -domain->default_pasid); + if (domain_pgtable_first_lvl(domain)) + ret = intel_pasid_setup_first_level(iommu, dev, + domain->pgd, domain->default_pasid, + domain->iommu_did[iommu->seq_id], + PASID_FLAG_SUPERVISOR_MODE); + else + ret = intel_pasid_setup_second_level(iommu, domain, dev, +domain->default_pasid); if (ret) goto table_failed; spin_unlock(>lock); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/8] iommu/vt-d: Add first level page table interfaces
This adds functions to manipulate first level page tables which could be used by a scalale mode capable IOMMU unit. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/Makefile | 2 +- drivers/iommu/intel-iommu.c| 33 +++ drivers/iommu/intel-pgtable.c | 376 + include/linux/intel-iommu.h| 33 ++- include/trace/events/intel_iommu.h | 60 + 5 files changed, 502 insertions(+), 2 deletions(-) create mode 100644 drivers/iommu/intel-pgtable.c diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 35d17094fe3b..aa04f4c3ae26 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -18,7 +18,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o -obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 66f76f6df2c2..a314892ee72b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1670,6 +1670,37 @@ static void free_dmar_iommu(struct intel_iommu *iommu) #endif } +/* First level 5-level paging support */ +static bool first_lvl_5lp_support(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + static int first_level_5lp_supported = -1; + + if (likely(first_level_5lp_supported != -1)) + return first_level_5lp_supported; + + first_level_5lp_supported = 1; +#ifdef CONFIG_X86 + /* Match IOMMU first level and CPU paging mode */ + if (!cpu_feature_enabled(X86_FEATURE_LA57)) { + first_level_5lp_supported = 0; + return first_level_5lp_supported; + } +#endif /* #ifdef CONFIG_X86 */ + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!cap_5lp_support(iommu->cap)) { + first_level_5lp_supported = 0; + break; + } + } + rcu_read_unlock(); + + return first_level_5lp_supported; +} + static struct dmar_domain *alloc_domain(int flags) { struct dmar_domain *domain; @@ -1683,6 +1714,8 @@ static struct dmar_domain *alloc_domain(int flags) domain->flags = flags; domain->has_iotlb_device = false; domain->ops = _lvl_pgtable_ops; + domain->first_lvl_5lp = first_lvl_5lp_support(); + spin_lock_init(>page_table_lock); INIT_LIST_HEAD(>devices); return domain; diff --git a/drivers/iommu/intel-pgtable.c b/drivers/iommu/intel-pgtable.c new file mode 100644 index ..4a26d08a7570 --- /dev/null +++ b/drivers/iommu/intel-pgtable.c @@ -0,0 +1,376 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * intel-pgtable.c - Intel IOMMU page table manipulation library + * + * Copyright (C) 2019 Intel Corporation + * + * Author: Lu Baolu + */ + +#define pr_fmt(fmt) "DMAR: " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * first_lvl_map: Map a range of IO virtual address to physical addresses. + */ +#ifdef CONFIG_X86 +#define pgtable_populate(domain, nm) \ +do { \ + void *__new = alloc_pgtable_page(domain->nid); \ + if (!__new) \ + return -ENOMEM; \ + smp_wmb(); \ + spin_lock(&(domain)->page_table_lock); \ + if (nm ## _present(*nm)) { \ + free_pgtable_page(__new); \ + } else {\ + set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\ + domain_flush_cache(domain, nm, sizeof(nm##_t)); \ + } \ + spin_unlock(&(domain)->page_table_lock);\ +} while (0) + +static int +first_lvl_map_pte_range(struct dmar_domain *domain, pmd_t *pmd, + unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot) +{ + pte_t *pte, *first_pte; + u64 pfn; + + pfn = phys_addr >> PAGE_SHIFT; + if (unlikely(pmd_none(*pmd))) + pgtable_populate(domain, pmd); + + first_pte = pte = pte_offset_kernel(pmd, addr); + + do { +
[PATCH v2 0/8] Use 1st-level for DMA remapping
Intel VT-d in scalable mode supports two types of page talbes for DMA translation: the first level page table and the second level page table. The first level page table uses the same format as the CPU page table, while the second level page table keeps compatible with previous formats. The software is able to choose any one of them for DMA remapping according to the use case. This patchset aims to move IOVA (I/O Virtual Address) translation to 1st-level page table in scalable mode. This will simplify vIOMMU (IOMMU simulated by VM hypervisor) design by using the two-stage translation, a.k.a. nested mode translation. As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA) support is now implemented in a shadow page manner. The device simulation software, like QEMU, has to figure out GIOVA->GPA mappings and write them to a shadowed page table, which will be used by the physical IOMMU. Each time when mappings are created or destroyed in vIOMMU, the simulation software has to intervene. Hence, the changes on GIOVA->GPA could be shadowed to host. .---. | vIOMMU | |---| .. | |IOTLB flush trap |QEMU| .---. (map/unmap) || |GIOVA->GPA |>|.. | '---' || GIOVA->HPA | | | | |'' | '---' || || '' | < | v VFIO/IOMMU API .---. | pIOMMU | |---| | | .---. |GIOVA->HPA | '---' | | '---' In VT-d 3.0, scalable mode is introduced, which offers two-level translation page tables and nested translation mode. Regards to GIOVA support, it can be simplified by 1) moving the GIOVA support over 1st-level page table to store GIOVA->GPA mapping in vIOMMU, 2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU second level for GPA->HPA translation, and 4) enable nested (a.k.a. dual-stage) translation in host. Compared with current shadow GIOVA support, the new approach makes the vIOMMU design simpler and more efficient as we only need to flush the pIOMMU IOTLB and possible device-IOTLB when an IOVA mapping in vIOMMU is torn down. .---. | vIOMMU | |---| .---. | |IOTLB flush trap | QEMU| .---.(unmap) |---| |GIOVA->GPA |>| | '---' '---' | | | '---' | <-- | VFIO/IOMMU | cache invalidation and | guest gpd bind interfaces v .---. | pIOMMU | |---| .---. |GIOVA->GPA |<---First level '---' | GPA->HPA |<---Scond level '---' '---' This patch set includes two parts. The former part implements the per-domain page table abstraction, which makes the page table difference transparent to various map/unmap APIs. The later part applies the first level page table for IOVA translation unless the DOMAIN_ATTR_NESTING domain attribution has been set, which indicates nested mode in use. Based-on-idea-by: Ashok Raj Based-on-idea-by: Kevin Tian Based-on-idea-by: Liu Yi L Based-on-idea-by: Jacob Pan Based-on-idea-by: Sanjay Kumar Based-on-idea-by: Lu Baolu Change log: v1->v2 - The first series was posted here https://lkml.org/lkml/2019/9/23/297 - Use per domain page table ops to handle different page tables. - Use first level for DMA remapping by default on both bare metal and vm guest. - Code refine according to code review comments for v1. Lu Baolu (8): iommu/vt-d: Add per domain page table ops iommu/vt-d: Move domain_flush_cache helper into header iommu/vt-d: Implement second level page table ops iommu/vt-d: Apply per domain second level page table ops iommu/vt-d: Add first level page table interfaces iommu/vt-d: Implement first level page table ops iommu/vt-d: Identify domains using first level page table iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr drivers/iommu/Makefile | 2 +- drivers/iommu/intel-iommu.c| 412 +++-- drivers/iommu/intel-pgtable.c | 376 ++ include/linux/intel-iommu.h| 64 - include/trace/events/intel_iommu.h | 60 + 5 files changed, 837 insertions(+), 77 deletions(-) create mode 100644
[PATCH v2 2/8] iommu/vt-d: Move domain_flush_cache helper into header
So that it could be used in other source files as well. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 7 --- include/linux/intel-iommu.h | 7 +++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 677e82a828f0..7752ff299cb5 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -833,13 +833,6 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf return iommu; } -static void domain_flush_cache(struct dmar_domain *domain, - void *addr, int size) -{ - if (!domain->iommu_coherency) - clflush_cache_range(addr, size); -} - static int device_context_mapped(struct intel_iommu *iommu, u8 bus, u8 devfn) { struct context_entry *context; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index e8bfe7466ebb..9b259756057b 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -681,6 +681,13 @@ static inline int first_pte_in_page(struct dma_pte *pte) return !((unsigned long)pte & ~VTD_PAGE_MASK); } +static inline void +domain_flush_cache(struct dmar_domain *domain, void *addr, int size) +{ + if (!domain->iommu_coherency) + clflush_cache_range(addr, size); +} + extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); extern int dmar_find_matched_atsr_unit(struct pci_dev *dev); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/8] iommu/vt-d: Apply per domain second level page table ops
This applies per domain page table ops to various domain mapping and unmapping interfaces. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 118 1 file changed, 52 insertions(+), 66 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 96ead4e3395a..66f76f6df2c2 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -80,6 +80,7 @@ #define IOVA_START_PFN (1) #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT) +#define PFN_ADDR(pfn) ((pfn) << PAGE_SHIFT) /* page table handling */ #define LEVEL_STRIDE (9) @@ -1153,8 +1154,8 @@ static struct page *domain_unmap(struct dmar_domain *domain, BUG_ON(start_pfn > last_pfn); /* we don't need lock here; nobody else touches the iova range */ - freelist = dma_pte_clear_level(domain, agaw_to_level(domain->agaw), - domain->pgd, 0, start_pfn, last_pfn, NULL); + freelist = domain->ops->unmap_range(domain, PFN_ADDR(start_pfn), + PFN_ADDR(last_pfn - start_pfn + 1)); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -1484,39 +1485,6 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, spin_unlock_irqrestore(_domain_lock, flags); } -static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, - struct dmar_domain *domain, - unsigned long pfn, unsigned int pages, - int ih, int map) -{ - unsigned int mask = ilog2(__roundup_pow_of_two(pages)); - uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; - u16 did = domain->iommu_did[iommu->seq_id]; - - BUG_ON(pages == 0); - - if (ih) - ih = 1 << 6; - /* -* Fallback to domain selective flush if no PSI support or the size is -* too big. -* PSI requires page size to be 2 ^ x, and the base address is naturally -* aligned to the size -*/ - if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap)) - iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); - else - iommu->flush.iotlb_inv(iommu, did, addr | ih, - mask, DMA_TLB_PSI_FLUSH); - - /* -* In caching mode, changes of pages from non-present to present require -* flush. However, device IOTLB doesn't need to be flushed in this case. -*/ - if (!cap_caching_mode(iommu->cap) || !map) - iommu_flush_dev_iotlb(domain, addr, mask); -} - /* Notification for newly created mappings */ static inline void __mapping_notify_one(struct intel_iommu *iommu, struct dmar_domain *domain, @@ -1524,7 +1492,8 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu, { /* It's a non-present to present mapping. Only flush if caching mode */ if (cap_caching_mode(iommu->cap)) - iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1); + domain->ops->flush_tlb_range(domain, iommu, PFN_ADDR(pfn), +PFN_ADDR(pages), 0); else iommu_flush_write_buffer(iommu); } @@ -1536,16 +1505,8 @@ static void iommu_flush_iova(struct iova_domain *iovad) domain = container_of(iovad, struct dmar_domain, iovad); - for_each_domain_iommu(idx, domain) { - struct intel_iommu *iommu = g_iommus[idx]; - u16 did = domain->iommu_did[iommu->seq_id]; - - iommu->flush.iotlb_inv(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); - - if (!cap_caching_mode(iommu->cap)) - iommu_flush_dev_iotlb(get_iommu_domain(iommu, did), - 0, MAX_AGAW_PFN_WIDTH); - } + for_each_domain_iommu(idx, domain) + domain->ops->flush_tlb_range(domain, g_iommus[idx], 0, 0, 0); } static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu) @@ -2419,13 +2380,43 @@ static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, struct scatterlist *sg, unsigned long phys_pfn, unsigned long nr_pages, int prot) { - int iommu_id, ret; struct intel_iommu *iommu; + int iommu_id, ret; /* Do the real mapping first */ - ret = __domain_mapping(domain, iov_pfn, sg, phys_pfn, nr_pages, prot); - if (ret) - return ret; + if (!sg) { + ret = domain->ops->map_range(domain, PFN_ADDR(iov_pfn), +PFN_ADDR(phys_pfn), +PFN_ADDR(nr_pages), +
[PATCH v2 1/8] iommu/vt-d: Add per domain page table ops
The Intel VT-d in scalable mode supports two types of page talbes for DMA translation: the first level page table and the second level page table. The IOMMU driver is able to choose one of them for DMA remapping according to the use case. The first level page table uses the same format as the CPU page table, while the second level page table keeps compatible with previous formats. This abstracts the page tables used in Intel IOMMU driver by defining a per domain page table ops structure which contains callbacks for various page table operations. Signed-off-by: Lu Baolu --- include/linux/intel-iommu.h | 24 1 file changed, 24 insertions(+) diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 326146a36dbf..e8bfe7466ebb 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -499,6 +499,28 @@ struct context_entry { u64 hi; }; +struct dmar_domain; + +/* + * struct pgtable_ops - page table ops + * @map_range: map a physically contiguous memory region to iova + * @unmap_range: unmap a physically contiguous memory region + * @iova_to_phys: return the physical address mapped to @iova + * @flush_tlb_range: flush the tlb caches as the result of map or unmap + */ +struct pgtable_ops { + int (*map_range)(struct dmar_domain *domain, +unsigned long iova, phys_addr_t paddr, +size_t size, int prot); + struct page *(*unmap_range)(struct dmar_domain *domain, + unsigned long iova, size_t size); + phys_addr_t (*iova_to_phys)(struct dmar_domain *domain, + unsigned long iova); + void (*flush_tlb_range)(struct dmar_domain *domain, + struct intel_iommu *iommu, + unsigned long iova, size_t size, bool ih); +}; + struct dmar_domain { int nid;/* node id */ @@ -517,8 +539,10 @@ struct dmar_domain { struct list_head auxd; /* link to device's auxiliary list */ struct iova_domain iovad; /* iova's that belong to this domain */ + /* page table used by this domain */ struct dma_pte *pgd; /* virtual address */ int gaw;/* max guest address width */ + const struct pgtable_ops *ops; /* page table ops */ /* adjusted guest address width, 0 is level 2 30-bit */ int agaw; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
How to read BIOS'es memory from IOMMU
Hi, I am interested in writing a program to verify BIOS integrity via SHA512. Most modern BIOS'es seem to be accessible through IOMMU. I'm wondering if there is a device driver for IOMMU or whether one could be created for the purposes of accessing BIOS memory. If anyone has any leads or advice or even can show me how to do this in code via example/pseudo code I would appreciate it. Regards, Aaron -- Aaron Gray Independent Open Source Software Engineer, Computer Language Researcher, Information Theorist, and amateur computer scientist. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Wed, 18 Sep 2019, Christoph Hellwig wrote: > On Tue, Sep 17, 2019 at 06:41:02PM +, Lendacky, Thomas wrote: > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > > --- a/drivers/nvme/host/pci.c > > > +++ b/drivers/nvme/host/pci.c > > > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev > > > *dev) > > > dev->admin_tagset.timeout = ADMIN_TIMEOUT; > > > dev->admin_tagset.numa_node = dev_to_node(dev->dev); > > > dev->admin_tagset.cmd_size = sizeof(struct nvme_iod); > > > - dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED; > > > + dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED | > > > + BLK_MQ_F_BLOCKING; > > > > I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required > > to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called > > from a module. Is there a DMA API that could be called to get that info? > > The DMA API must support non-blocking calls, and various drivers rely > on that. So we need to provide that even for the SEV case. If the > actual blocking can't be made to work we'll need to wire up the DMA > pool in kernel/dma/remap.c for it (and probably move it to separate > file). > Resurrecting this thread from a couple months ago because it appears that this is still an issue with 5.4 guests. dma_pool_alloc(), regardless of whether mem_flags allows blocking or not, can always sleep if the device's DMA must be unencrypted and mem_encrypt_active() == true. We know this because vm_unmap_aliases() can always block. NVMe's setup of PRPs and SGLs uses dma_pool_alloc(GFP_ATOMIC) but when this is a SEV-enabled guest this allocation may block due to the possibility of allocating DMA coherent memory through dma_direct_alloc(). It seems like one solution would be to add significant latency by doing BLK_MQ_F_BLOCKING if force_dma_unencrypted() is true for the device but this comes with significant downsides. So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic even when the DMA needs to be unencrypted for SEV. Christoph's suggestion was to wire up dmapool in kernel/dma/remap.c for this. Is that necessary to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or can we do it within the DMA API itself so it's transparent to the driver? Thomas/Brijesh: separately, it seems the use of set_memory_encrypted() or set_memory_decrypted() must be possible without blocking; is this only an issue from the DMA API point of view or can it be done elsewhere? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited
Hi, On Wed, 2019-11-27 at 15:40 +0100, Christoph Hellwig wrote: > Devices that are forced to DMA through unencrypted bounce buffers > need to be treated as if they are addressing limited. > > Signed-off-by: Christoph Hellwig > --- > kernel/dma/mapping.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c > index 1dbe6d725962..f6c35b53d996 100644 > --- a/kernel/dma/mapping.c > +++ b/kernel/dma/mapping.c > @@ -416,6 +416,8 @@ EXPORT_SYMBOL_GPL(dma_get_merge_boundary); > */ > bool dma_addressing_limited(struct device *dev) > { > + if (force_dma_unencrypted(dev)) > + return true; > return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) < > dma_get_required_mask(dev); > } Any chance to have the case (swiotlb_force == SWIOTLB_FORCE) also included? Otherwise for the series Reviewed-by: Thomas Hellström ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On Wed, 2019-11-27 at 19:06 +, Robin Murphy wrote: > On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote: > > On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote: > > > On 26/11/2019 12:51 pm, Leon Romanovsky wrote: > > > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote: > > > > > Some users need to make sure their rounding function accepts and > > > > > returns > > > > > 64bit long variables regardless of the architecture. Sadly > > > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a > > > > > new generic 64bit variant of the function and cleanup rougue custom > > > > > implementations. > > > > > > > > Is it possible to create general roundup/rounddown_pow_two() which will > > > > work correctly for any type of variables, instead of creating special > > > > variant for every type? > > > > > > In fact, that is sort of the case already - roundup_pow_of_two() itself > > > wraps ilog2() such that the constant case *is* type-independent. And > > > since ilog2() handles non-constant values anyway, might it be reasonable > > > to just take the strongly-typed __roundup_pow_of_two() helper out of the > > > loop as below? > > > > > > Robin > > > > > > > That looks way better that's for sure. Some questions. > > > > > ->8- > > > diff --git a/include/linux/log2.h b/include/linux/log2.h > > > index 83a4a3ca3e8a..e825f8a6e8b5 100644 > > > --- a/include/linux/log2.h > > > +++ b/include/linux/log2.h > > > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) > > > */ > > >#define roundup_pow_of_two(n) \ > > >( \ > > > - __builtin_constant_p(n) ? ( \ > > > - (n == 1) ? 1 : \ > > > - (1UL << (ilog2((n) - 1) + 1)) \ > > > -) : \ > > > - __roundup_pow_of_two(n) \ > > > + (__builtin_constant_p(n) && (n == 1)) ? \ > > > + 1 : (1UL << (ilog2((n) - 1) + 1)) \ > > > > Then here you'd have to use ULL instead of UL, right? I want my 64bit value > > everywhere regardless of the CPU arch. The downside is that would affect > > performance to some extent (i.e. returning a 64bit value where you used to > > have > > a 32bit one)? > > True, although it's possible that 1ULL might result in the same codegen > if the compiler can see that the result is immediately truncated back to > long anyway. Or at worst, I suppose "(typeof(n))1" could suffice, > however ugly. Either way, this diff was only an illustration rather than > a concrete proposal, but it might be an interesting diversion to > investigate. > > On that note, though, you should probably be using ULL in your current > patch too. I actually meant to, the fix got lost. Thanks for pointing it out. As I see Leon also likes this, I'll try out this implementation and come back with some results. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On Wed, Nov 27, 2019 at 07:06:12PM +, Robin Murphy wrote: > On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote: > > On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote: > > > On 26/11/2019 12:51 pm, Leon Romanovsky wrote: > > > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote: > > > > > Some users need to make sure their rounding function accepts and > > > > > returns > > > > > 64bit long variables regardless of the architecture. Sadly > > > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a > > > > > new generic 64bit variant of the function and cleanup rougue custom > > > > > implementations. > > > > > > > > Is it possible to create general roundup/rounddown_pow_two() which will > > > > work correctly for any type of variables, instead of creating special > > > > variant for every type? > > > > > > In fact, that is sort of the case already - roundup_pow_of_two() itself > > > wraps ilog2() such that the constant case *is* type-independent. And > > > since ilog2() handles non-constant values anyway, might it be reasonable > > > to just take the strongly-typed __roundup_pow_of_two() helper out of the > > > loop as below? > > > > > > Robin > > > > > > > That looks way better that's for sure. Some questions. > > > > > ->8- > > > diff --git a/include/linux/log2.h b/include/linux/log2.h > > > index 83a4a3ca3e8a..e825f8a6e8b5 100644 > > > --- a/include/linux/log2.h > > > +++ b/include/linux/log2.h > > > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) > > > */ > > >#define roundup_pow_of_two(n) \ > > >( \ > > > - __builtin_constant_p(n) ? ( \ > > > - (n == 1) ? 1 : \ > > > - (1UL << (ilog2((n) - 1) + 1)) \ > > > -) : \ > > > - __roundup_pow_of_two(n) \ > > > + (__builtin_constant_p(n) && (n == 1)) ? \ > > > + 1 : (1UL << (ilog2((n) - 1) + 1)) \ > > > > Then here you'd have to use ULL instead of UL, right? I want my 64bit value > > everywhere regardless of the CPU arch. The downside is that would affect > > performance to some extent (i.e. returning a 64bit value where you used to > > have > > a 32bit one)? > > True, although it's possible that 1ULL might result in the same codegen if > the compiler can see that the result is immediately truncated back to long > anyway. Or at worst, I suppose "(typeof(n))1" could suffice, however ugly. > Either way, this diff was only an illustration rather than a concrete > proposal, but it might be an interesting diversion to investigate. > > On that note, though, you should probably be using ULL in your current patch > too. > > > Also, what about callers to this function on platforms with 32bit 'unsigned > > longs' that happen to input a 64bit value into this. IIUC we'd have a > > change of > > behaviour. > > Indeed, although the change in such a case would be "start getting the > expected value instead of nonsense", so it might very well be welcome ;) Agree, if code overflowed with 32 bits before this change, the code was already broken. At least now, it won't overflow. > > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote: On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote: On 26/11/2019 12:51 pm, Leon Romanovsky wrote: On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote: Some users need to make sure their rounding function accepts and returns 64bit long variables regardless of the architecture. Sadly roundup/rounddown_pow_two() takes and returns unsigned longs. Create a new generic 64bit variant of the function and cleanup rougue custom implementations. Is it possible to create general roundup/rounddown_pow_two() which will work correctly for any type of variables, instead of creating special variant for every type? In fact, that is sort of the case already - roundup_pow_of_two() itself wraps ilog2() such that the constant case *is* type-independent. And since ilog2() handles non-constant values anyway, might it be reasonable to just take the strongly-typed __roundup_pow_of_two() helper out of the loop as below? Robin That looks way better that's for sure. Some questions. ->8- diff --git a/include/linux/log2.h b/include/linux/log2.h index 83a4a3ca3e8a..e825f8a6e8b5 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) */ #define roundup_pow_of_two(n)\ (\ - __builtin_constant_p(n) ? ( \ - (n == 1) ? 1 : \ - (1UL << (ilog2((n) - 1) + 1)) \ - ) : \ - __roundup_pow_of_two(n) \ + (__builtin_constant_p(n) && (n == 1)) ? \ + 1 : (1UL << (ilog2((n) - 1) + 1)) \ Then here you'd have to use ULL instead of UL, right? I want my 64bit value everywhere regardless of the CPU arch. The downside is that would affect performance to some extent (i.e. returning a 64bit value where you used to have a 32bit one)? True, although it's possible that 1ULL might result in the same codegen if the compiler can see that the result is immediately truncated back to long anyway. Or at worst, I suppose "(typeof(n))1" could suffice, however ugly. Either way, this diff was only an illustration rather than a concrete proposal, but it might be an interesting diversion to investigate. On that note, though, you should probably be using ULL in your current patch too. Also, what about callers to this function on platforms with 32bit 'unsigned longs' that happen to input a 64bit value into this. IIUC we'd have a change of behaviour. Indeed, although the change in such a case would be "start getting the expected value instead of nonsense", so it might very well be welcome ;) Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu: match the original algorithm
> On Nov 27, 2019, at 1:01 PM, John Garry wrote: > > I haven't gone into the details, but this patch alone is giving this: > > root@(none)$ [ 123.857024] kmemleak: 8 new suspected memory leaks (see > /sys/kernel/debug/kmemleak) > > root@(none)$ cat /sys/kernel/debug/kmemleak > unreferenced object 0x002339843000 (size 2048): > comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s) > hex dump (first 32 bytes): >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > backtrace: >[<1d2710bf>] kmem_cache_alloc+0x188/0x260 >[] init_iova_domain+0x1e8/0x2a8 >[<2646fc92>] iommu_setup_dma_ops+0x200/0x710 >[ ] arch_setup_dma_ops+0x80/0x128 >[<994e1e43>] acpi_dma_configure+0x11c/0x140 >[ ] pci_dma_configure+0xe0/0x108 >[ ] really_probe+0x210/0x548 >[<87884b1b>] driver_probe_device+0x7c/0x148 >[<10af2936>] device_driver_attach+0x94/0xa0 >[ ] __driver_attach+0xa4/0x110 >[ ] bus_for_each_dev+0xe8/0x158 >[ ] driver_attach+0x30/0x40 >[<3cf39ba8>] bus_add_driver+0x234/0x2f0 >[<43830a45>] driver_register+0xbc/0x1d0 >[ ] __pci_register_driver+0xb0/0xc8 >[ ] sas_v3_pci_driver_init+0x20/0x28 > unreferenced object 0x002339844000 (size 2048): > comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s) > > [snip] > > And I don't feel like continuing until it's resolved Thanks for talking a hit by this before me. It is frustrating that people tend not to test their patches properly with things like kmemleak. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote: > On 26/11/2019 12:51 pm, Leon Romanovsky wrote: > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote: > > > Some users need to make sure their rounding function accepts and returns > > > 64bit long variables regardless of the architecture. Sadly > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a > > > new generic 64bit variant of the function and cleanup rougue custom > > > implementations. > > > > Is it possible to create general roundup/rounddown_pow_two() which will > > work correctly for any type of variables, instead of creating special > > variant for every type? > > In fact, that is sort of the case already - roundup_pow_of_two() itself > wraps ilog2() such that the constant case *is* type-independent. And > since ilog2() handles non-constant values anyway, might it be reasonable > to just take the strongly-typed __roundup_pow_of_two() helper out of the > loop as below? > > Robin > That looks way better that's for sure. Some questions. > ->8- > diff --git a/include/linux/log2.h b/include/linux/log2.h > index 83a4a3ca3e8a..e825f8a6e8b5 100644 > --- a/include/linux/log2.h > +++ b/include/linux/log2.h > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) >*/ > #define roundup_pow_of_two(n) \ > ( \ > - __builtin_constant_p(n) ? ( \ > - (n == 1) ? 1 : \ > - (1UL << (ilog2((n) - 1) + 1)) \ > -) : \ > - __roundup_pow_of_two(n) \ > + (__builtin_constant_p(n) && (n == 1)) ? \ > + 1 : (1UL << (ilog2((n) - 1) + 1)) \ Then here you'd have to use ULL instead of UL, right? I want my 64bit value everywhere regardless of the CPU arch. The downside is that would affect performance to some extent (i.e. returning a 64bit value where you used to have a 32bit one)? Also, what about callers to this function on platforms with 32bit 'unsigned longs' that happen to input a 64bit value into this. IIUC we'd have a change of behaviour. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On 26/11/2019 12:51 pm, Leon Romanovsky wrote: On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote: Some users need to make sure their rounding function accepts and returns 64bit long variables regardless of the architecture. Sadly roundup/rounddown_pow_two() takes and returns unsigned longs. Create a new generic 64bit variant of the function and cleanup rougue custom implementations. Is it possible to create general roundup/rounddown_pow_two() which will work correctly for any type of variables, instead of creating special variant for every type? In fact, that is sort of the case already - roundup_pow_of_two() itself wraps ilog2() such that the constant case *is* type-independent. And since ilog2() handles non-constant values anyway, might it be reasonable to just take the strongly-typed __roundup_pow_of_two() helper out of the loop as below? Robin ->8- diff --git a/include/linux/log2.h b/include/linux/log2.h index 83a4a3ca3e8a..e825f8a6e8b5 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) */ #define roundup_pow_of_two(n) \ ( \ - __builtin_constant_p(n) ? ( \ - (n == 1) ? 1 : \ - (1UL << (ilog2((n) - 1) + 1)) \ - ) : \ - __roundup_pow_of_two(n) \ + (__builtin_constant_p(n) && (n == 1)) ? \ + 1 : (1UL << (ilog2((n) - 1) + 1)) \ ) /** ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu: match the original algorithm
On 21/11/2019 00:13, Cong Wang wrote: The IOVA cache algorithm implemented in IOMMU code does not exactly match the original algorithm described in the paper. Particularly, it doesn't need to free the loaded empty magazine when trying to put it back to global depot. This patch makes it exactly match the original algorithm. I haven't gone into the details, but this patch alone is giving this: root@(none)$ [ 123.857024] kmemleak: 8 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@(none)$ cat /sys/kernel/debug/kmemleak unreferenced object 0x002339843000 (size 2048): comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace: [<1d2710bf>] kmem_cache_alloc+0x188/0x260 [] init_iova_domain+0x1e8/0x2a8 [<2646fc92>] iommu_setup_dma_ops+0x200/0x710 [ ] arch_setup_dma_ops+0x80/0x128 [<994e1e43>] acpi_dma_configure+0x11c/0x140 [ ] pci_dma_configure+0xe0/0x108 [ ] really_probe+0x210/0x548 [<87884b1b>] driver_probe_device+0x7c/0x148 [<10af2936>] device_driver_attach+0x94/0xa0 [ ] __driver_attach+0xa4/0x110 [ ] bus_for_each_dev+0xe8/0x158 [ ] driver_attach+0x30/0x40 [<3cf39ba8>] bus_add_driver+0x234/0x2f0 [<43830a45>] driver_register+0xbc/0x1d0 [ ] __pci_register_driver+0xb0/0xc8 [ ] sas_v3_pci_driver_init+0x20/0x28 unreferenced object 0x002339844000 (size 2048): comm "swapper/0", pid 1, jiffies 4294898165 (age 122.688s) [snip] And I don't feel like continuing until it's resolved Thanks, John Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 41c605b0058f..92f72a85e62a 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -900,7 +900,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, if (!iova_magazine_full(cpu_rcache->loaded)) { can_insert = true; - } else if (!iova_magazine_full(cpu_rcache->prev)) { + } else if (iova_magazine_empty(cpu_rcache->prev)) { swap(cpu_rcache->prev, cpu_rcache->loaded); can_insert = true; } else { @@ -909,8 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, if (new_mag) { spin_lock(>lock); if (rcache->depot_size < MAX_GLOBAL_MAGS) { - rcache->depot[rcache->depot_size++] = - cpu_rcache->loaded; + swap(rcache->depot[rcache->depot_size], cpu_rcache->prev); + swap(cpu_rcache->prev, cpu_rcache->loaded); + rcache->depot_size++; } else { mag_to_free = cpu_rcache->loaded; } @@ -963,14 +964,15 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, if (!iova_magazine_empty(cpu_rcache->loaded)) { has_pfn = true; - } else if (!iova_magazine_empty(cpu_rcache->prev)) { + } else if (iova_magazine_full(cpu_rcache->prev)) { swap(cpu_rcache->prev, cpu_rcache->loaded); has_pfn = true; } else { spin_lock(>lock); if (rcache->depot_size > 0) { - iova_magazine_free(cpu_rcache->loaded); - cpu_rcache->loaded = rcache->depot[--rcache->depot_size]; + swap(rcache->depot[rcache->depot_size - 1], cpu_rcache->prev); + swap(cpu_rcache->prev, cpu_rcache->loaded); + rcache->depot_size--; has_pfn = true; } spin_unlock(>lock); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] PCI: Add parameter nr_devfns to pci_add_dma_alias
On 2019-11-27 6:27 a.m., James Sewart wrote: > * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask > * which is used to program permissible bus-devfn source addresses for DMA > @@ -5873,8 +5874,12 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode, > * cannot be left as a userspace activity). DMA aliases should therefore > * be configured via quirks, such as the PCI fixup header quirk. > */ > -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) > +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, int nr_devfns) > { > + int devfn_to = devfn_from + nr_devfns - 1; > + > + BUG_ON(nr_devfns < 1); Why not just make nr_devfns unsigned and do nothing if it's zero? It might also be worth checking that nr_devfns + devfn_from is less than U8_MAX... But I'd probably avoid the BUG_ON and just truncate it. Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
On Tue, 2019-11-26 at 10:19 +0100, Nicolas Saenz Julienne wrote: > Some users need to make sure their rounding function accepts and returns > 64bit long variables regardless of the architecture. Sadly > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a > new generic 64bit variant of the function and cleanup rougue custom > implementations. > > Signed-off-by: Nicolas Saenz Julienne Small Nit: I corrected the patch subject for next version. linux/log2.h: Add roundup/rounddown_pow_two_u64() family of functions Note the change here: Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 0/7] Raspberry Pi 4 PCIe support
Hi Bjorn, On Tue, 2019-11-26 at 15:50 -0600, Bjorn Helgaas wrote: > On Tue, Nov 26, 2019 at 10:19:38AM +0100, Nicolas Saenz Julienne wrote: > > This series aims at providing support for Raspberry Pi 4's PCIe > > controller, which is also shared with the Broadcom STB family of > > devices. > > Jim Quinlan (3): > > dt-bindings: PCI: Add bindings for brcmstb's PCIe device > > PCI: brcmstb: add Broadcom STB PCIe host controller driver > > PCI: brcmstb: add MSI capability > > Please update these subjects to match the others, i.e., capitalize > "Add". Also, I think "Add MSI capability" really means "Add support > for MSI ..."; in PCIe terms the "MSI Capability" is a structure in > config space and it's there whether the OS supports it or not. > > No need to repost just for this. Noted, I'll update them. Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()
On 27/11/2019 2:16 pm, Thierry Reding wrote: [...] Nevermind that, I figured out that I was missingthe initialization of some of the S2CR variables. I've got something that I think is working now, though I don't know yet how to go about cleaning up the initial mapping and "recycling" it. I'll clean things up a bit, run some more tests and post a new patch that can serve as a basis for discussion. I'm a little puzzled by the smmu->identity domain - disregarding the fact that it's not actually used by the given diff ;) - if isolation is the reason for not simply using a bypass S2CR for the window between reset and the relevant .add_device call where the default domain proper comes in[1], then surely exposing the union of memory regions to the union of all associated devices isn't all that desirable either. Either way, I'll give you the pre-emptive warning that this is the SMMU in the way of my EFI framebuffer ;) ... arm-smmu 7fb2.iommu:1 context banks (1 stage-2 only) ... Robin. [1] the fact that it currently depends on probe order whether getting that .add_device call requires a driver probing for the device is an error as discussed elsewhere, and will get fixed separately, I promise. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu: support SMMU module probing from the IORT
On 25/11/2019 16:04, Lorenzo Pieralisi wrote: On Fri, Nov 22, 2019 at 06:41:25PM +0100, Ard Biesheuvel wrote: Add support for SMMU drivers built as modules to the ACPI/IORT device probing path, by deferring the probe of the master if the SMMU driver is known to exist but has not been loaded yet. Given that the IORT code registers a platform device for each SMMU that it discovers, we can easily trigger the udev based autoloading of the SMMU drivers by making the platform device identifier part of the module alias. Signed-off-by: Ard Biesheuvel --- drivers/acpi/arm64/iort.c | 4 ++-- drivers/iommu/arm-smmu-v3.c | 1 + drivers/iommu/arm-smmu.c| 1 + 3 files changed, 4 insertions(+), 2 deletions(-) I think it is best if Will picks this up and add it to the series that modularize the SMMU drivers: Acked-by: Lorenzo Pieralisi Tested-by: John Garry # only manual smmu ko loading diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 5a7551d060f2..a696457a9b11 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -850,9 +850,9 @@ static inline bool iort_iommu_driver_enabled(u8 type) { switch (type) { case ACPI_IORT_NODE_SMMU_V3: - return IS_BUILTIN(CONFIG_ARM_SMMU_V3); + return IS_ENABLED(CONFIG_ARM_SMMU_V3); case ACPI_IORT_NODE_SMMU: - return IS_BUILTIN(CONFIG_ARM_SMMU); + return IS_ENABLED(CONFIG_ARM_SMMU); default: pr_warn("IORT node type %u does not describe an SMMU\n", type); return false; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 7669beafc493..bf6a1e8eb9b0 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3733,4 +3733,5 @@ module_platform_driver(arm_smmu_driver); MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations"); MODULE_AUTHOR("Will Deacon "); +MODULE_ALIAS("platform:arm-smmu-v3"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d55acc48aee3..db5106b0955b 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2292,4 +2292,5 @@ module_platform_driver(arm_smmu_driver); MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); +MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); -- 2.20.1 . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M
On 27 November 2019 at 07:56 am, Mike Rapoport wrote: Maybe we'll simply force bottom up allocation before calling swiotlb_init()? Anyway, it's the last memblock allocation. diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 62f74b1b33bd..771e6cf7e2b9 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -286,14 +286,15 @@ void __init mem_init(void) /* * book3s is limited to 16 page sizes due to encoding this in * a 4-bit field for slices. */ BUILD_BUG_ON(MMU_PAGE_COUNT > 16); #ifdef CONFIG_SWIOTLB + memblock_set_bottom_up(true); swiotlb_init(0); #endif high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); set_max_mapnr(max_pfn); memblock_free_all(); Hello Mike, I tested the latest Git kernel with your new patch today. My PCI TV card works without any problems. Thanks, Christian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
make dma_addressing_limited work for memory encryption setups
Hi all, this little series fixes dma_addressing_limited to return true for systems that use bounce buffers due to memory encryption. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dma-mapping: move dma_addressing_limited out of line
This function isn't used in the fast path, and moving it out of line will reduce include clutter with the next change. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 14 +- kernel/dma/mapping.c| 15 +++ 2 files changed, 16 insertions(+), 13 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index c4d8741264bd..94ef74ecd18a 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -687,19 +687,7 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask) return dma_set_mask_and_coherent(dev, mask); } -/** - * dma_addressing_limited - return if the device is addressing limited - * @dev: device to check - * - * Return %true if the devices DMA mask is too small to address all memory in - * the system, else %false. Lack of addressing bits is the prime reason for - * bounce buffering, but might not be the only one. - */ -static inline bool dma_addressing_limited(struct device *dev) -{ - return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) < - dma_get_required_mask(dev); -} +bool dma_addressing_limited(struct device *dev); #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 12ff766ec1fa..1dbe6d725962 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -405,3 +405,18 @@ unsigned long dma_get_merge_boundary(struct device *dev) return ops->get_merge_boundary(dev); } EXPORT_SYMBOL_GPL(dma_get_merge_boundary); + +/** + * dma_addressing_limited - return if the device is addressing limited + * @dev: device to check + * + * Return %true if the devices DMA mask is too small to address all memory in + * the system, else %false. Lack of addressing bits is the prime reason for + * bounce buffering, but might not be the only one. + */ +bool dma_addressing_limited(struct device *dev) +{ + return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) < + dma_get_required_mask(dev); +} +EXPORT_SYMBOL_GPL(dma_addressing_limited); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] dma-mapping: force unencryped devices are always addressing limited
Devices that are forced to DMA through unencrypted bounce buffers need to be treated as if they are addressing limited. Signed-off-by: Christoph Hellwig --- kernel/dma/mapping.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 1dbe6d725962..f6c35b53d996 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -416,6 +416,8 @@ EXPORT_SYMBOL_GPL(dma_get_merge_boundary); */ bool dma_addressing_limited(struct device *dev) { + if (force_dma_unencrypted(dev)) + return true; return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) < dma_get_required_mask(dev); } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()
On Wed, Nov 27, 2019 at 01:16:54PM +0100, Thierry Reding wrote: > On Tue, Sep 17, 2019 at 06:59:50PM +0100, Will Deacon wrote: > > On Mon, Sep 02, 2019 at 04:52:45PM +0200, Thierry Reding wrote: > > > On Mon, Sep 02, 2019 at 03:22:35PM +0100, Robin Murphy wrote: > > > > On 29/08/2019 12:14, Thierry Reding wrote: > > > > > From: Thierry Reding > > > > > > > > > > For device tree nodes, use the standard of_iommu_get_resv_regions() > > > > > implementation to obtain the reserved memory regions associated with a > > > > > device. > > > > > > > > This covers the window between iommu_probe_device() setting up a default > > > > domain and the device's driver finally probing and taking control, but > > > > iommu_probe_device() represents the point that the IOMMU driver first > > > > knows > > > > about this device - there's still a window from whenever the IOMMU > > > > driver > > > > itself probed up to here where the "unidentified" traffic may have > > > > already > > > > been disrupted. Some IOMMU drivers have no option but to make the > > > > necessary > > > > configuration during their own probe routine, at which point a struct > > > > device > > > > for the display/etc. endpoint may not even exist yet. > > > > > > Yeah, I think I'm actually running into this issue with the ARM SMMU > > > driver. The above works fine with the Tegra SMMU driver, though, because > > > it doesn't touch the SMMU configuration until a device is attached to a > > > domain. > > > > > > For anything earlier than iommu_probe_device(), I don't see a way of > > > doing this generically. I've been working on a prototype to make these > > > reserved memory regions early on for ARM SMMU but I've been failing so > > > far. I think it would possibly work if we just switched the default for > > > stream IDs to be "bypass" if they have any devices that have reserved > > > memory regions, but again, this isn't quite working (yet). > > > > I think we should avoid the use of "bypass" outside of the IOMMU probe() > > routine if at all possible, since it leaves the thing wide open if we don't > > subsequently probe the master. > > > > Why can't we initialise a page-table early for StreamIDs with these > > reserved regions, and then join that up later on if we see a device with > > one of those StreamIDs attaching to a DMA domain? I suppose the nasty > > case would be attaching to a domain that already has other masters > > attached to it. Can we forbid that somehow for these devices? Otherwise, > > I think we'd have to transiently switch to bypass whilst switching page > > table. > > > > What problems did you run into trying to implement this? > > I picked this up again and was trying to make this work with your > suggestion. Below is a rough draft and it seems to be working to some > degree. Here's an extract of the log when I run this on Jetson TX2: > > [3.755328] arm-smmu 1200.iommu: probing hardware > configuration... > [3.762187] arm-smmu 1200.iommu: SMMUv2 with: > [3.767137] arm-smmu 1200.iommu: stage 1 translation > [3.772806] arm-smmu 1200.iommu: stage 2 translation > [3.778471] arm-smmu 1200.iommu: nested translation > [3.784050] arm-smmu 1200.iommu: stream matching with > 128 register groups > [3.791651] arm-smmu 1200.iommu: 64 context banks (0 > stage-2 only) > [3.798603] arm-smmu 1200.iommu: Supported page sizes: > 0x61311000 > [3.805460] arm-smmu 1200.iommu: Stage-1: 48-bit VA -> > 48-bit IPA > [3.812310] arm-smmu 1200.iommu: Stage-2: 48-bit IPA -> > 48-bit PA > [3.819159] arm-smmu 1200.iommu: > > arm_smmu_setup_identity(smmu=0001eabcec80) > [3.827373] arm-smmu 1200.iommu: identity domain: > 0001eaf8cae8 (ops: 0x0) > [3.835614] arm-smmu 1200.iommu: np: /ethernet@249 > [3.841635] arm-smmu 1200.iommu: np: /sdhci@340 > [3.847315] arm-smmu 1200.iommu: np: /sdhci@342 > [3.852990] arm-smmu 1200.iommu: np: /sdhci@344 > [3.858683] arm-smmu 1200.iommu: np: /sdhci@346 > [3.864370] arm-smmu 1200.iommu: np: /hda@351 > [3.869897] arm-smmu 1200.iommu: np: /usb@353 > [3.875421] arm-smmu 1200.iommu: np: /pcie@10003000 > [3.881109] arm-smmu 1200.iommu: np: /host1x@13e0 > [3.887012] arm-smmu 1200.iommu: np: > /host1x@13e0/display-hub@1520/display@1520 > [3.896344] arm-smmu 1200.iommu: region: > /reserved-memory/framebuffer@9607c000 > [3.904707] arm-smmu 1200.iommu: [mem > 0x9607c000-0x9687bfff] > [3.915719] arm-smmu 1200.iommu: /iommu@1200: 1 arguments > [3.922487] arm-smmu 1200.iommu: 0: 0009 > [
[PATCH v3 2/2] PCI: Add DMA alias quirk for PLX PEX NTB
The PLX PEX NTB forwards DMA transactions using Requester ID's that don't exist as PCI devices. The devfn for a transaction is used as an index into a lookup table storing the origin of a transaction on the other side of the bridge. This patch aliases all possible devfn's to the NTB device so that any transaction coming in is governed by the mappings for the NTB. Reviewed-by: Logan Gunthorpe Signed-off-by: James Sewart --- drivers/pci/quirks.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 0f3f5afc73fd..3a67049ca630 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5315,6 +5315,21 @@ SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */ SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */ SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */ +/* + * PLX NTB uses devfn proxy IDs to move TLPs between NT endpoints. These IDs + * are used to forward responses to the originator on the other side of the + * NTB. Alias all possible IDs to the NTB to permit access when the IOMMU is + * turned on. + */ +static void quirk_plx_ntb_dma_alias(struct pci_dev *pdev) +{ + pci_info(pdev, "Setting PLX NTB proxy ID aliases\n"); + /* PLX NTB may use all 256 devfns */ + pci_add_dma_alias(pdev, 0, 256); +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b0, quirk_plx_ntb_dma_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b1, quirk_plx_ntb_dma_alias); + /* * On Lenovo Thinkpad P50 SKUs with a Nvidia Quadro M1000M, the BIOS does * not always reset the secondary Nvidia GPU between reboots if the system -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/2] PCI: Add parameter nr_devfns to pci_add_dma_alias
pci_add_dma_alias can now be used to create a dma alias for a range of devfns. Signed-off-by: James Sewart --- drivers/pci/pci.c| 21 - drivers/pci/quirks.c | 14 +++--- include/linux/pci.h | 2 +- 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a97e2571a527..71dbabf5ee3d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5857,7 +5857,8 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode, /** * pci_add_dma_alias - Add a DMA devfn alias for a device * @dev: the PCI device for which alias is added - * @devfn: alias slot and function + * @devfn_from: alias slot and function + * @nr_devfns: Number of subsequent devfns to alias * * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask * which is used to program permissible bus-devfn source addresses for DMA @@ -5873,8 +5874,12 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode, * cannot be left as a userspace activity). DMA aliases should therefore * be configured via quirks, such as the PCI fixup header quirk. */ -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, int nr_devfns) { + int devfn_to = devfn_from + nr_devfns - 1; + + BUG_ON(nr_devfns < 1); + if (!dev->dma_alias_mask) dev->dma_alias_mask = bitmap_zalloc(U8_MAX, GFP_KERNEL); if (!dev->dma_alias_mask) { @@ -5882,9 +5887,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) return; } - set_bit(devfn, dev->dma_alias_mask); - pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n", -PCI_SLOT(devfn), PCI_FUNC(devfn)); + bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns); + + if (nr_devfns == 1) + pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n", + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from)); + else + pci_info(dev, "Enabling fixed DMA alias for devfn range from %02x.%d to %02x.%d\n", + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from), + PCI_SLOT(devfn_to), PCI_FUNC(devfn_to)); } bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 320255e5e8f8..0f3f5afc73fd 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3932,7 +3932,7 @@ int pci_dev_specific_reset(struct pci_dev *dev, int probe) static void quirk_dma_func0_alias(struct pci_dev *dev) { if (PCI_FUNC(dev->devfn) != 0) - pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0)); + pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 0), 1); } /* @@ -3946,7 +3946,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe476, quirk_dma_func0_alias); static void quirk_dma_func1_alias(struct pci_dev *dev) { if (PCI_FUNC(dev->devfn) != 1) - pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1)); + pci_add_dma_alias(dev, PCI_DEVFN(PCI_SLOT(dev->devfn), 1), 1); } /* @@ -4031,7 +4031,7 @@ static void quirk_fixed_dma_alias(struct pci_dev *dev) id = pci_match_id(fixed_dma_alias_tbl, dev); if (id) - pci_add_dma_alias(dev, id->driver_data); + pci_add_dma_alias(dev, id->driver_data, 1); } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ADAPTEC2, 0x0285, quirk_fixed_dma_alias); @@ -4073,9 +4073,9 @@ DECLARE_PCI_FIXUP_HEADER(0x8086, 0x244e, quirk_use_pcie_bridge_dma_alias); */ static void quirk_mic_x200_dma_alias(struct pci_dev *pdev) { - pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0)); - pci_add_dma_alias(pdev, PCI_DEVFN(0x11, 0x0)); - pci_add_dma_alias(pdev, PCI_DEVFN(0x12, 0x3)); + pci_add_dma_alias(pdev, PCI_DEVFN(0x10, 0x0), 1); + pci_add_dma_alias(pdev, PCI_DEVFN(0x11, 0x0), 1); + pci_add_dma_alias(pdev, PCI_DEVFN(0x12, 0x3), 1); } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, quirk_mic_x200_dma_alias); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, quirk_mic_x200_dma_alias); @@ -5273,7 +5273,7 @@ static void quirk_switchtec_ntb_dma_alias(struct pci_dev *pdev) pci_dbg(pdev, "Aliasing Partition %d Proxy ID %02x.%d\n", pp, PCI_SLOT(devfn), PCI_FUNC(devfn)); - pci_add_dma_alias(pdev, devfn); + pci_add_dma_alias(pdev, devfn, 1); } } diff --git a/include/linux/pci.h b/include/linux/pci.h index 1a6cf19eac2d..f51bdda49e9a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2323,7 +2323,7 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) } #endif -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn); +void pci_add_dma_alias(struct pci_dev
Re: [PATCH 2/2] iommu: dma: Use of_iommu_get_resv_regions()
On Tue, Sep 17, 2019 at 06:59:50PM +0100, Will Deacon wrote: > On Mon, Sep 02, 2019 at 04:52:45PM +0200, Thierry Reding wrote: > > On Mon, Sep 02, 2019 at 03:22:35PM +0100, Robin Murphy wrote: > > > On 29/08/2019 12:14, Thierry Reding wrote: > > > > From: Thierry Reding > > > > > > > > For device tree nodes, use the standard of_iommu_get_resv_regions() > > > > implementation to obtain the reserved memory regions associated with a > > > > device. > > > > > > This covers the window between iommu_probe_device() setting up a default > > > domain and the device's driver finally probing and taking control, but > > > iommu_probe_device() represents the point that the IOMMU driver first > > > knows > > > about this device - there's still a window from whenever the IOMMU driver > > > itself probed up to here where the "unidentified" traffic may have already > > > been disrupted. Some IOMMU drivers have no option but to make the > > > necessary > > > configuration during their own probe routine, at which point a struct > > > device > > > for the display/etc. endpoint may not even exist yet. > > > > Yeah, I think I'm actually running into this issue with the ARM SMMU > > driver. The above works fine with the Tegra SMMU driver, though, because > > it doesn't touch the SMMU configuration until a device is attached to a > > domain. > > > > For anything earlier than iommu_probe_device(), I don't see a way of > > doing this generically. I've been working on a prototype to make these > > reserved memory regions early on for ARM SMMU but I've been failing so > > far. I think it would possibly work if we just switched the default for > > stream IDs to be "bypass" if they have any devices that have reserved > > memory regions, but again, this isn't quite working (yet). > > I think we should avoid the use of "bypass" outside of the IOMMU probe() > routine if at all possible, since it leaves the thing wide open if we don't > subsequently probe the master. > > Why can't we initialise a page-table early for StreamIDs with these > reserved regions, and then join that up later on if we see a device with > one of those StreamIDs attaching to a DMA domain? I suppose the nasty > case would be attaching to a domain that already has other masters > attached to it. Can we forbid that somehow for these devices? Otherwise, > I think we'd have to transiently switch to bypass whilst switching page > table. > > What problems did you run into trying to implement this? I picked this up again and was trying to make this work with your suggestion. Below is a rough draft and it seems to be working to some degree. Here's an extract of the log when I run this on Jetson TX2: [3.755328] arm-smmu 1200.iommu: probing hardware configuration... [3.762187] arm-smmu 1200.iommu: SMMUv2 with: [3.767137] arm-smmu 1200.iommu: stage 1 translation [3.772806] arm-smmu 1200.iommu: stage 2 translation [3.778471] arm-smmu 1200.iommu: nested translation [3.784050] arm-smmu 1200.iommu: stream matching with 128 register groups [3.791651] arm-smmu 1200.iommu: 64 context banks (0 stage-2 only) [3.798603] arm-smmu 1200.iommu: Supported page sizes: 0x61311000 [3.805460] arm-smmu 1200.iommu: Stage-1: 48-bit VA -> 48-bit IPA [3.812310] arm-smmu 1200.iommu: Stage-2: 48-bit IPA -> 48-bit PA [3.819159] arm-smmu 1200.iommu: > arm_smmu_setup_identity(smmu=0001eabcec80) [3.827373] arm-smmu 1200.iommu: identity domain: 0001eaf8cae8 (ops: 0x0) [3.835614] arm-smmu 1200.iommu: np: /ethernet@249 [3.841635] arm-smmu 1200.iommu: np: /sdhci@340 [3.847315] arm-smmu 1200.iommu: np: /sdhci@342 [3.852990] arm-smmu 1200.iommu: np: /sdhci@344 [3.858683] arm-smmu 1200.iommu: np: /sdhci@346 [3.864370] arm-smmu 1200.iommu: np: /hda@351 [3.869897] arm-smmu 1200.iommu: np: /usb@353 [3.875421] arm-smmu 1200.iommu: np: /pcie@10003000 [3.881109] arm-smmu 1200.iommu: np: /host1x@13e0 [3.887012] arm-smmu 1200.iommu: np: /host1x@13e0/display-hub@1520/display@1520 [3.896344] arm-smmu 1200.iommu: region: /reserved-memory/framebuffer@9607c000 [3.904707] arm-smmu 1200.iommu: [mem 0x9607c000-0x9687bfff] [3.915719] arm-smmu 1200.iommu: /iommu@1200: 1 arguments [3.922487] arm-smmu 1200.iommu: 0: 0009 [3.927888] arm-smmu 1200.iommu: SID: 0009 MASK: 7f80 [3.934132] arm-smmu 1200.iommu: found index: 0 [3.939840] arm-smmu 1200.iommu: np:
Re: [PATCH v3 09/14] iommu/arm-smmu: Prevent forced unbinding of Arm SMMU drivers
On 27/11/2019 11:04, John Garry wrote: On 26/11/2019 20:27, Saravana Kannan wrote: On Tue, Nov 26, 2019 at 1:13 AM John Garry wrote: On 21/11/2019 11:49, Will Deacon wrote: Forcefully unbinding the Arm SMMU drivers is a pretty dangerous operation, since it will likely lead to catastrophic failure for any DMA devices mastering through the SMMU being unbound. When the driver then attempts to "handle" the fatal faults, it's very easy to trip over dead data structures, leading to use-after-free. On John's machine, he reports that the machine was "unusable" due to loss of the storage controller following a forced unbind of the SMMUv3 driver: | # cd ./bus/platform/drivers/arm-smmu-v3 | # echo arm-smmu-v3.0.auto > unbind | hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found! | platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x0146 | [hwprod 0x0146, hwcons 0x] Prevent this forced unbinding of the drivers by setting "suppress_bind_attrs" to true. This seems a reasonable approach for now. BTW, I'll give this series a spin this week, which again looks to be your iommu/module branch, excluding the new IORT patch. Hi Saravana, Is this on a platform where of_devlink creates device links between the iommu device and its suppliers?I'm guessing no? Because device links should for unbinding of all the consumers before unbinding the supplier. I'm only really interested in ACPI, TBH. Looks like it'll still allow the supplier to unbind if the consumers don't allow unbinding. Is that the case here? So just unbinding the driver from a device does not delete the device nor exit the device from it's IOMMU group - so we keep the reference to the SMMU ko. As such, I don't know how to realistically test unloading the SMMU ko when we have platform devices involved. Maybe someone can enlighten me... But I could do it on our D06 dev board, where all devices behind the SMMUs are PCI based: --- Initial state --- root@(none)$ dmesg | grep Adding [ 30.271801] pcieport :00:00.0: Adding to iommu group 0 [ 30.296088] pcieport :00:04.0: Adding to iommu group 1 [ 30.322234] pcieport :00:08.0: Adding to iommu group 2 [ 30.335641] pcieport :00:0c.0: Adding to iommu group 3 [ 30.343114] pcieport :00:10.0: Adding to iommu group 4 [ 30.355650] pcieport :00:12.0: Adding to iommu group 5 [ 30.366794] pcieport :7c:00.0: Adding to iommu group 6 [ 30.377993] hns3 :7d:00.0: Adding to iommu group 7 [ 31.861957] hns3 :7d:00.1: Adding to iommu group 8 [ 33.313967] hns3 :7d:00.2: Adding to iommu group 9 [ 33.436029] hns3 :7d:00.3: Adding to iommu group 10 [ 33.555935] hns3 :bd:00.0: Adding to iommu group 11 [ 35.143851] pcieport :74:00.0: Adding to iommu group 12 [ 35.150736] pcieport :80:00.0: Adding to iommu group 13 [ 35.158910] pcieport :80:08.0: Adding to iommu group 14 [ 35.166860] pcieport :80:0c.0: Adding to iommu group 15 [ 35.174813] pcieport :80:10.0: Adding to iommu group 16 [ 35.182854] pcieport :bc:00.0: Adding to iommu group 17 [ 35.189702] pcieport :b4:00.0: Adding to iommu group 18 [ 35.196445] hisi_sas_v3_hw :74:02.0: Adding to iommu group 19 [ 39.410693] ahci :74:03.0: Adding to iommu group 20 root@(none)$ lsmod Not tainted arm_smmu_v3 40960 21 - Live 0x88c6 --- Start removing devices --- root@(none)$ echo 1 > ./sys/devices/pci:00/:00:00.0/remove [ 55.567808] pci_bus :01: busn_res: [bus 01] is released [ 55.573514] pci :00:00.0: Removing from iommu group 0 root@(none)$ echo 1 > ./sys/devices/pci:00/:00:04.0/remove [ 61.767425] pci_bus :02: busn_res: [bus 02] is released [ 61.773132] pci :00:04.0: Removing from iommu group 1 root@(none)$ echo 1 > ./sys/devices/pci:00/:00:04.0/remove sh: ./sys/devices/pci:00/:00:04.0/remove: No such file or directory root@(none)$ echo 1 > ./sys/devices/pci:00/:00:08.0/remove [ 75.635417] pci_bus :03: busn_res: [bus 03] is released [ 75.641124] pci :00:08.0: Removing from iommu group 2 root@(none)$ echo 1 > ./sys/devices/pci:00/:00:0c.0/remove [ 81.587419] pci_bus :04: busn_res: [bus 04] is released [ 81.593110] pci :00:0c.0: Removing from iommu group 3 root@(none)$ echo 1 > ./sys/devices/pci:00/:00:10.0/remove [ 85.607605] pci_bus :05: busn_res: [bus 05] is released [ 85.613300] pci :00:10.0: Removing from iommu group 4 root@(none)$ echo 1 > ./sys/devices/pci:00/:00:12.0/remove [ 92.731421] pci_bus :06: busn_res: [bus 06] is released [ 92.737125] pci :00:12.0: Removing from iommu group 5 root@(none)$ echo 1 > ./sys/devices/pci:7c/:7c:00.0/remove [ 102.286726] pci :7d:00.0: Removing from iommu group 7 [ 102.294157] pci :7d:00.1: Removing from iommu group 8 [ 102.301634] pci :7d:00.2: Removing from iommu group 9 [ 102.308973] pci
Re: [PATCH v3 09/14] iommu/arm-smmu: Prevent forced unbinding of Arm SMMU drivers
On 26/11/2019 20:27, Saravana Kannan wrote: On Tue, Nov 26, 2019 at 1:13 AM John Garry wrote: On 21/11/2019 11:49, Will Deacon wrote: Forcefully unbinding the Arm SMMU drivers is a pretty dangerous operation, since it will likely lead to catastrophic failure for any DMA devices mastering through the SMMU being unbound. When the driver then attempts to "handle" the fatal faults, it's very easy to trip over dead data structures, leading to use-after-free. On John's machine, he reports that the machine was "unusable" due to loss of the storage controller following a forced unbind of the SMMUv3 driver: | # cd ./bus/platform/drivers/arm-smmu-v3 | # echo arm-smmu-v3.0.auto > unbind | hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found! | platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x0146 | [hwprod 0x0146, hwcons 0x] Prevent this forced unbinding of the drivers by setting "suppress_bind_attrs" to true. This seems a reasonable approach for now. BTW, I'll give this series a spin this week, which again looks to be your iommu/module branch, excluding the new IORT patch. Hi Saravana, Is this on a platform where of_devlink creates device links between the iommu device and its suppliers?I'm guessing no? Because device links should for unbinding of all the consumers before unbinding the supplier. I'm only really interested in ACPI, TBH. Looks like it'll still allow the supplier to unbind if the consumers don't allow unbinding. Is that the case here? So just unbinding the driver from a device does not delete the device nor exit the device from it's IOMMU group - so we keep the reference to the SMMU ko. As such, I don't know how to realistically test unloading the SMMU ko when we have platform devices involved. Maybe someone can enlighten me... Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M
On Wed, Nov 27, 2019 at 08:56:25AM +0200, Mike Rapoport wrote: > Maybe we'll simply force bottom up allocation before calling > swiotlb_init()? Anyway, it's the last memblock allocation. That should work, but I don't think it is the proper fix. The underlying issue here is that ZONE_DMA/DMA32 sizing is something that needs to be propagated to memblock and dma-direct as is based around addressing limitations. But our zone initialization is such a mess that we can't just reuse a variable. Nicolas has started to clean some of this up, but we need to clean that whole zone initialization mess up a lot more. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu