Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility
On Wed, May 11, 2022 at 03:15:22AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, May 11, 2022 3:00 AM > > > > On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote: > > > Ok... here's a revised version of my proposal which I think addresses > > > your concerns and simplfies things. > > > > > > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY > > > will probably need matching changes) > > > > > > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA > > > is chosen by the kernel within the aperture(s). This is closer to > > > how mmap() operates, and DPDK and similar shouldn't care about > > > having specific IOVAs, even at the individual mapping level. > > > > > > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED, > > > for when you really do want to control the IOVA (qemu, maybe some > > > special userspace driver cases) > > > > We already did both of these, the flag is called > > IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will > > select the IOVA internally. > > > > > - ATTACH will fail if the new device would shrink the aperture to > > > exclude any already established mappings (I assume this is already > > > the case) > > > > Yes > > > > > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a > > > PROT_NONE mmap(). It reserves that IOVA space, so other (non-FIXED) > > > MAPs won't use it, but doesn't actually put anything into the IO > > > pagetables. > > > - Like a regular mapping, ATTACHes that are incompatible with an > > > IOMAP_RESERVEed region will fail > > > - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED > > > mapping > > > > Yeah, this seems OK, I'm thinking a new API might make sense because > > you don't really want mmap replacement semantics but a permanent > > record of what IOVA must always be valid. > > > > IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to > > IOMMUFD_CMD_IOAS_IOVA_RANGES: > > > > struct iommu_ioas_require_iova { > > __u32 size; > > __u32 ioas_id; > > __u32 num_iovas; > > __u32 __reserved; > > struct iommu_required_iovas { > > __aligned_u64 start; > > __aligned_u64 last; > > } required_iovas[]; > > }; > > As a permanent record do we want to enforce that once the required > range list is set all FIXED and non-FIXED allocations must be within the > list of ranges? No, I don't think so. In fact the way I was envisaging this, non-FIXED mappings will *never* go into the reserved ranges. This is for the benefit of any use cases that need both mappings where they don't care about the IOVA and those which do. Essentially, reserving a region here is saying to the kernel "I want to manage this IOVA space; make sure nothing else touches it". That means both that the kernel must disallow any hw associated changes (like ATTACH) which would impinge on the reserved region, and also any IOVA allocations that would take parts away from that space. Whether we want to restrict FIXED mappings to the reserved regions is an interesting question. I wasn't thinking that would be necessary (just as you can use mmap() MAP_FIXED anywhere). However.. much as MAP_FIXED is very dangerous to use if you don't previously reserve address space, I think IOMAP_FIXED is dangerous if you haven't previously reserved space. So maybe it would make sense to only allow FIXED mappings within reserved regions. Strictly dividing the IOVA space into kernel managed and user managed regions does make a certain amount of sense. > If yes we can take the end of the last range as the max size of the iova > address space to optimize the page table layout. > > otherwise we may need another dedicated hint for that optimization. Right. With the revised model where reserving windows is optional, not required, I don't think we can quite re-use this for optimization hints. Which is a bit unfortunate. I can't immediately see a way to tweak this which handles both more neatly, but I like the idea if we can figure out a way. > > > So, for DPDK the sequence would be: > > > > > > 1. Create IOAS > > > 2. ATTACH devices > > > 3. IOAS_MAP some stuff > > > 4. Do DMA with the IOVAs that IOAS_MAP returned > > > > > > (Note, not even any need for QUERY in simple cases) > > > > Yes, this is done already > > > > > For (unoptimized) qemu it would be: > > > > > > 1. Create IOAS > > > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of > > the > > >guest platform > > > 3. ATTACH devices (this will fail if they're not compatible with the > > >reserved IOVA regions) > > > 4. Boot the guest > > I suppose above is only the sample flow for PPC vIOMMU. For non-PPC > vIOMMUs regular mappings are required before booting the guest and > reservation might be done but not mandatory (at least not what current > Qemu vfio can afford as it simply rep
Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()
On Fri, May 13, 2022 at 11:32:11AM +0800, Baolu Lu wrote: > External email: Use caution opening links or attachments > > > On 2022/5/13 08:32, Nicolin Chen wrote: > > Local boot test and VFIO sanity test show that info->iommu can be > > used in device_to_iommu() as a fast path. So this patch adds it. > > > > Signed-off-by: Nicolin Chen > > --- > > drivers/iommu/intel/iommu.c | 5 + > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > > index 2990f80c5e08..412fca5ab9cd 100644 > > --- a/drivers/iommu/intel/iommu.c > > +++ b/drivers/iommu/intel/iommu.c > > @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, > > struct device *dev) > > struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 > > *devfn) > > { > > struct dmar_drhd_unit *drhd = NULL; > > + struct device_domain_info *info; > > struct pci_dev *pdev = NULL; > > struct intel_iommu *iommu; > > struct device *tmp; > > @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device > > *dev, u8 *bus, u8 *devfn) > > if (!dev) > > return NULL; > > > > + info = dev_iommu_priv_get(dev); > > + if (info) > > + return info->iommu; > > device_to_iommu() also returns device source id (@bus, @devfn). > > Perhaps, we can make device_to_iommu() only for probe_device() where the > per-device info data is not initialized yet. After probe_device(), iommu > and sid are retrieved through other helpers by looking up the device > info directly? That should work I think. I was just not sure when the priv could be unset. But it seems that you have cleaned up those places other than probe/release_device() in recent version :) Nic ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()
On 2022/5/13 08:32, Nicolin Chen wrote: Local boot test and VFIO sanity test show that info->iommu can be used in device_to_iommu() as a fast path. So this patch adds it. Signed-off-by: Nicolin Chen --- drivers/iommu/intel/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2990f80c5e08..412fca5ab9cd 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, struct device *dev) struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; + struct device_domain_info *info; struct pci_dev *pdev = NULL; struct intel_iommu *iommu; struct device *tmp; @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) if (!dev) return NULL; + info = dev_iommu_priv_get(dev); + if (info) + return info->iommu; device_to_iommu() also returns device source id (@bus, @devfn). Perhaps, we can make device_to_iommu() only for probe_device() where the per-device info data is not initialized yet. After probe_device(), iommu and sid are retrieved through other helpers by looking up the device info directly? Best regards, baolu + if (dev_is_pci(dev)) { struct pci_dev *pf_pdev; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit
Hi, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on arm-perf/for-next/perf] [also build test WARNING on linus/master v5.18-rc6] [cannot apply to joro-iommu/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603 base: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf config: arm-qcom_defconfig (https://download.01.org/0day-ci/archive/20220513/202205131021.3gskebg2-...@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/916a5fc41cbb8ddfe343193598f250d06b09e3fa git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603 git checkout 916a5fc41cbb8ddfe343193598f250d06b09e3fa # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/iommu/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from include/linux/ratelimit_types.h:5, from include/linux/ratelimit.h:5, from include/linux/dev_printk.h:16, from include/linux/device.h:15, from include/linux/dma-mapping.h:7, from drivers/iommu/io-pgtable-arm-v7s.c:25: drivers/iommu/io-pgtable-arm-v7s.c: In function 'arm_v7s_alloc_pgtable': include/linux/bits.h:35:29: warning: left shift count >= width of type [-Wshift-count-overflow] 35 | (((~UL(0)) - (UL(1) << (l)) + 1) & \ | ^~ include/linux/bits.h:38:38: note: in expansion of macro '__GENMASK' 38 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ drivers/iommu/io-pgtable-arm-v7s.c:154:46: note: in expansion of macro 'GENMASK' 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) | ^~~ drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 'ARM_V7S_TTBR_35BIT_PA' 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); | ^ include/linux/bits.h:36:18: warning: right shift count is negative [-Wshift-count-negative] 36 | (~UL(0) >> (BITS_PER_LONG - 1 - (h | ^~ include/linux/bits.h:38:38: note: in expansion of macro '__GENMASK' 38 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) | ^ drivers/iommu/io-pgtable-arm-v7s.c:154:46: note: in expansion of macro 'GENMASK' 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) | ^~~ drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 'ARM_V7S_TTBR_35BIT_PA' 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); | ^ >> drivers/iommu/io-pgtable-arm-v7s.c:154:63: warning: right shift count >= >> width of type [-Wshift-count-overflow] 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) | ^~ drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 'ARM_V7S_TTBR_35BIT_PA' 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); | ^ vim +154 drivers/iommu/io-pgtable-arm-v7s.c 145 146 #define ARM_V7S_TTBR_S BIT(1) 147 #define ARM_V7S_TTBR_NOSBIT(5) 148 #define ARM_V7S_TTBR_ORGN_ATTR(attr)(((attr) & 0x3) << 3) 149 #define ARM_V7S_TTBR_IRGN_ATTR(attr) \ 150 attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
Re: [PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit
Hi, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on arm-perf/for-next/perf] [also build test WARNING on linus/master v5.18-rc6 next-20220512] [cannot apply to joro-iommu/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603 base: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-next/perf config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20220513/202205131016.ati0kpnr-...@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 9519dacab7b8afd537811fc2abaceb4d14f4e16a) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/916a5fc41cbb8ddfe343193598f250d06b09e3fa git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603 git checkout 916a5fc41cbb8ddfe343193598f250d06b09e3fa # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/iommu/ drivers/rtc/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count >= width of >> type [-Wshift-count-overflow] ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); ^~~ drivers/iommu/io-pgtable-arm-v7s.c:154:39: note: expanded from macro 'ARM_V7S_TTBR_35BIT_PA' ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) ^~~ include/linux/bits.h:38:31: note: expanded from macro 'GENMASK' (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) ^~~ include/linux/bits.h:35:22: note: expanded from macro '__GENMASK' (((~UL(0)) - (UL(1) << (l)) + 1) & \ ^ ~~~ >> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count is negative >> [-Wshift-count-negative] ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); ^~~ drivers/iommu/io-pgtable-arm-v7s.c:154:39: note: expanded from macro 'ARM_V7S_TTBR_35BIT_PA' ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) ^~~ include/linux/bits.h:38:31: note: expanded from macro 'GENMASK' (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) ^~~ include/linux/bits.h:36:11: note: expanded from macro '__GENMASK' (~UL(0) >> (BITS_PER_LONG - 1 - (h ^ ~ >> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count >= width of >> type [-Wshift-count-overflow] ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr); ^~~ drivers/iommu/io-pgtable-arm-v7s.c:154:56: note: expanded from macro 'ARM_V7S_TTBR_35BIT_PA' ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) ^ ~~ 3 warnings generated. vim +886 drivers/iommu/io-pgtable-arm-v7s.c 795 796 static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, 797 void *cookie) 798 { 799 slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS; 800 struct arm_v7s_io_pgtable *data; 801 phys_addr_t paddr; 802 803 if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS)) 804 return NULL; 805 806 if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS)) 807 return NULL; 808 809 if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | 810
Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting
On 2022/5/13 07:12, Steve Wahl wrote: On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote: To support up to 64 sockets with 10 DMAR units each (640), make the value of DMAR_UNITS_SUPPORTED adjustable by a config variable, CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is set. If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled"; and the system fails to boot properly. Signed-off-by: Steve Wahl I've received a report from the kernel test robot , that this patch causes an error (shown below) when CONFIG_IOMMU_SUPPORT is not set. In my opinion, this is because include/linux/dmar.h and include/linux/intel-iommu are being #included when they are not really being used. I've tried placing the contents of intel-iommu.h within an #ifdef CONFIG_INTEL_IOMMU, and that fixes the problem. Two questions: A) Is this the desired approach to to the fix? Most part of include/linux/intel-iommu.h is private to Intel IOMMU driver. They should be put in a header like drivers/iommu/intel /iommu.h. Eventually, we should remove include/linux/intel-iommu.h and device drivers interact with iommu subsystem through the IOMMU kAPIs. Best regards, baolu B) Should it be a separate patch, or added onto this patch as a v3? Error message: -- In file included from include/linux/intel-iommu.h:21, from arch/x86/kvm/x86.c:44: include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'? 21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED | ^~~ include/linux/intel-iommu.h:531:35: note: in expansion of macro 'DMAR_UNITS_SUPPORTED' 531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED]; | ^~~~ vim +21 include/linux/dmar.h 20 > 21 #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED 22 Initial stab at fixing it: -- diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 2f9891cb3d00..916fd7b5bcb5 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -10,6 +10,8 @@ #ifndef _INTEL_IOMMU_H_ #define _INTEL_IOMMU_H_ +#ifdef CONFIG_INTEL_IOMMU + #include #include #include @@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, size_t size, return str; } +#endif /* CONFIG_IOMMU_SUPPORT */ + #endif Thanks. --> Steve Wahl ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
On Wed, 11 May 2022, Christoph Hellwig wrote: > nslabs can shrink when allocations or the remap don't succeed, so make > sure to use it for all sizing. For that remove the bytes value that > can get stale and replace it with local calculations and a boolean to > indicate if the originally requested size could not be allocated. > > Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and > swiotlb_init_late_with_tbl") > Signed-off-by: Christoph Hellwig Reviewed-by: Stefano Stabellini > --- > kernel/dma/swiotlb.c | 19 +++ > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 113e1e8aaca37..d6e62a6a42ceb 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -297,9 +297,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > { > struct io_tlb_mem *mem = &io_tlb_default_mem; > unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE); > - unsigned long bytes; > unsigned char *vstart = NULL; > unsigned int order; > + bool retried = false; > int rc = 0; > > if (swiotlb_force_disable) > @@ -308,7 +308,6 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > retry: > order = get_order(nslabs << IO_TLB_SHIFT); > nslabs = SLABS_PER_PAGE << order; > - bytes = nslabs << IO_TLB_SHIFT; > > while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) { > vstart = (void *)__get_free_pages(gfp_mask | __GFP_NOWARN, > @@ -316,16 +315,13 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > if (vstart) > break; > order--; > + nslabs = SLABS_PER_PAGE << order; > + retried = true; > } > > if (!vstart) > return -ENOMEM; > > - if (order != get_order(bytes)) { > - pr_warn("only able to allocate %ld MB\n", > - (PAGE_SIZE << order) >> 20); > - nslabs = SLABS_PER_PAGE << order; > - } > if (remap) > rc = remap(vstart, nslabs); > if (rc) { > @@ -334,9 +330,15 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE); > if (nslabs < IO_TLB_MIN_SLABS) > return rc; > + retried = true; > goto retry; > } > > + if (retried) { > + pr_warn("only able to allocate %ld MB\n", > + (PAGE_SIZE << order) >> 20); > + } > + > mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, > get_order(array_size(sizeof(*mem->slots), nslabs))); > if (!mem->slots) { > @@ -344,7 +346,8 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask, > return -ENOMEM; > } > > - set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT); > + set_memory_decrypted((unsigned long)vstart, > + (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT); > swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true); > > swiotlb_print_info(); > -- > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] swiotlb: use the right nslabs value in swiotlb_init_remap
On Wed, 11 May 2022, Christoph Hellwig wrote: > default_nslabs should only be used to initialize nslabs, after that we > need to use the local variable that can shrink when allocations or the > remap don't succeed. > > Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and > swiotlb_init_late_with_tbl") > Signed-off-by: Christoph Hellwig Reviewed-by: Stefano Stabellini > --- > kernel/dma/swiotlb.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 3e992a308c8a1..113e1e8aaca37 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -234,7 +234,7 @@ void __init swiotlb_init_remap(bool addressing_limit, > unsigned int flags, > { > struct io_tlb_mem *mem = &io_tlb_default_mem; > unsigned long nslabs = default_nslabs; > - size_t alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs)); > + size_t alloc_size; > size_t bytes; > void *tlb; > > @@ -249,7 +249,7 @@ void __init swiotlb_init_remap(bool addressing_limit, > unsigned int flags, >* memory encryption. >*/ > retry: > - bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT); > + bytes = PAGE_ALIGN(nslabs << IO_TLB_SHIFT); > if (flags & SWIOTLB_ANY) > tlb = memblock_alloc(bytes, PAGE_SIZE); > else > @@ -269,12 +269,13 @@ void __init swiotlb_init_remap(bool addressing_limit, > unsigned int flags, > goto retry; > } > > + alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs)); > mem->slots = memblock_alloc(alloc_size, PAGE_SIZE); > if (!mem->slots) > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > __func__, alloc_size, PAGE_SIZE); > > - swiotlb_init_io_tlb_mem(mem, __pa(tlb), default_nslabs, false); > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > mem->force_bounce = flags & SWIOTLB_FORCE; > > if (flags & SWIOTLB_VERBOSE) > -- > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] swiotlb: don't panic when the swiotlb buffer can't be allocated
On Wed, 11 May 2022, Christoph Hellwig wrote: > For historical reasons the switlb code paniced when the metadata could > not be allocated, but just printed a warning when the actual main > swiotlb buffer could not be allocated. Restore this somewhat unexpected > behavior as changing it caused a boot failure on the Microchip RISC-V > PolarFire SoC Icicle kit. > > Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and > swiotlb_init_late_with_tbl") > Reported-by: Conor Dooley > Signed-off-by: Christoph Hellwig > Tested-by: Conor Dooley Reviewed-by: Stefano Stabellini > --- > kernel/dma/swiotlb.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index e2ef0864eb1e5..3e992a308c8a1 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -254,8 +254,10 @@ void __init swiotlb_init_remap(bool addressing_limit, > unsigned int flags, > tlb = memblock_alloc(bytes, PAGE_SIZE); > else > tlb = memblock_alloc_low(bytes, PAGE_SIZE); > - if (!tlb) > - panic("%s: failed to allocate tlb structure\n", __func__); > + if (!tlb) { > + pr_warn("%s: failed to allocate tlb structure\n", __func__); > + return; > + } > > if (remap && remap(tlb, nslabs) < 0) { > memblock_free(tlb, PAGE_ALIGN(bytes)); > -- > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()
Local boot test and VFIO sanity test show that info->iommu can be used in device_to_iommu() as a fast path. So this patch adds it. Signed-off-by: Nicolin Chen --- drivers/iommu/intel/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2990f80c5e08..412fca5ab9cd 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, struct device *dev) struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; + struct device_domain_info *info; struct pci_dev *pdev = NULL; struct intel_iommu *iommu; struct device *tmp; @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) if (!dev) return NULL; + info = dev_iommu_priv_get(dev); + if (info) + return info->iommu; + if (dev_is_pci(dev)) { struct pci_dev *pf_pdev; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 01/26] dma-direct: take dma-ranges/offsets into account in resource mapping
On Mon, May 09, 2022 at 08:15:52AM +0200, Christoph Hellwig wrote: > So I think the big problem pointed out by Robin is that existing DTs > might not take this into account. I'd say that the biggest problem isn't in the DT part but in the drivers using the dma_map_resource() method since they don't expect the non-uniform DMA-direct mapping can be available. > So I think we need to do a little > research and at least Cc all maintainers and lists for relevant in-tree > DTs for drivers that use dma_map_resource to discuss this. Right. I'll send the next patchset revision out with all possibly interested maintainers and lists in Cc of this patch. -Sergey > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting
On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote: > To support up to 64 sockets with 10 DMAR units each (640), make the > value of DMAR_UNITS_SUPPORTED adjustable by a config variable, > CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is > set. > > If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set > to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to > allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ > remapping doesn't support X2APIC mode x2apic disabled"; and the system > fails to boot properly. > > Signed-off-by: Steve Wahl I've received a report from the kernel test robot , that this patch causes an error (shown below) when CONFIG_IOMMU_SUPPORT is not set. In my opinion, this is because include/linux/dmar.h and include/linux/intel-iommu are being #included when they are not really being used. I've tried placing the contents of intel-iommu.h within an #ifdef CONFIG_INTEL_IOMMU, and that fixes the problem. Two questions: A) Is this the desired approach to to the fix? B) Should it be a separate patch, or added onto this patch as a v3? Error message: -- In file included from include/linux/intel-iommu.h:21, from arch/x86/kvm/x86.c:44: >> include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared >> here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'? 21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED | ^~~ include/linux/intel-iommu.h:531:35: note: in expansion of macro 'DMAR_UNITS_SUPPORTED' 531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED]; | ^~~~ vim +21 include/linux/dmar.h 20 > 21 #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED 22 Initial stab at fixing it: -- diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 2f9891cb3d00..916fd7b5bcb5 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -10,6 +10,8 @@ #ifndef _INTEL_IOMMU_H_ #define _INTEL_IOMMU_H_ +#ifdef CONFIG_INTEL_IOMMU + #include #include #include @@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, size_t size, return str; } +#endif /* CONFIG_IOMMU_SUPPORT */ + #endif Thanks. --> Steve Wahl -- Steve Wahl, Hewlett Packard Enterprise ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 5/5] iommu/tegra-smmu: Support managed domains
From: Navneet Kumar Allow creating identity and DMA API compatible IOMMU domains. When creating a DMA API compatible domain, make sure to also create the required cookie. Signed-off-by: Navneet Kumar Signed-off-by: Thierry Reding --- Changes in v5: - remove DMA cookie initialization that's now no longer needed drivers/iommu/tegra-smmu.c | 37 - 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 93879c40056c..f8b2b863c111 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -277,7 +278,9 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) { struct tegra_smmu_as *as; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && + type != IOMMU_DOMAIN_DMA && + type != IOMMU_DOMAIN_IDENTITY) return NULL; as = kzalloc(sizeof(*as), GFP_KERNEL); @@ -287,25 +290,16 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE; as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO); - if (!as->pd) { - kfree(as); - return NULL; - } + if (!as->pd) + goto free_as; as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL); - if (!as->count) { - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->count) + goto free_pd_range; as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL); - if (!as->pts) { - kfree(as->count); - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->pts) + goto free_pts; spin_lock_init(&as->lock); @@ -315,6 +309,15 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->domain.geometry.force_aperture = true; return &as->domain; + +free_pts: + kfree(as->pts); +free_pd_range: + __free_page(as->pd); +free_as: + kfree(as); + + return NULL; } static void tegra_smmu_domain_free(struct iommu_domain *domain) @@ -1009,7 +1012,7 @@ static const struct iommu_ops tegra_smmu_ops = { .probe_device = tegra_smmu_probe_device, .release_device = tegra_smmu_release_device, .device_group = tegra_smmu_device_group, - .get_resv_regions = of_iommu_get_resv_regions, + .get_resv_regions = iommu_dma_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, .of_xlate = tegra_smmu_of_xlate, .pgsize_bitmap = SZ_4K, -- 2.36.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 4/5] iommu/tegra-smmu: Add support for reserved regions
From: Thierry Reding The Tegra DRM driver currently uses the IOMMU API explicitly. This means that it has fine-grained control over when exactly the translation through the IOMMU is enabled. This currently happens after the driver probes, so the driver is in a DMA quiesced state when the IOMMU translation is enabled. During the transition of the Tegra DRM driver to use the DMA API instead of the IOMMU API explicitly, it was observed that on certain platforms the display controllers were still actively fetching from memory. When a DMA IOMMU domain is created as part of the DMA/IOMMU API setup during boot, the IOMMU translation for the display controllers can be enabled a significant amount of time before the driver has had a chance to reset the hardware into a sane state. This causes the SMMU to detect faults on the addresses that the display controller is trying to fetch. To avoid this, and as a byproduct paving the way for seamless transition of display from the bootloader to the kernel, add support for reserved regions in the Tegra SMMU driver. This is implemented using the standard reserved memory device tree bindings, which let us describe regions of memory which the kernel is forbidden from using for regular allocations. The Tegra SMMU driver will parse the nodes associated with each device via the "memory-region" property and return reserved regions that the IOMMU core will then create direct mappings for prior to attaching the IOMMU domains to the devices. This ensures that a 1:1 mapping is in place when IOMMU translation starts and prevents the SMMU from detecting any faults. Signed-off-by: Thierry Reding --- drivers/iommu/tegra-smmu.c | 47 -- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 2f2b12033618..93879c40056c 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -471,6 +472,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu, tegra_smmu_free_asid(smmu, as->id); dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE); + as->pd_dma = 0; as->smmu = NULL; @@ -534,6 +536,38 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova, struct tegra_smmu *smmu = as->smmu; u32 *pd = page_address(as->pd); unsigned long offset = pd_index * sizeof(*pd); + bool unmap = false; + + /* +* XXX Move this outside of this function. Perhaps add a struct +* iommu_domain parameter to ->{get,put}_resv_regions() so that +* the mapping can be done there. +* +* The problem here is that as->smmu is only known once we attach +* the domain to a device (because then we look up the right SMMU +* instance via the dev->archdata.iommu pointer). When the direct +* mappings are created for reserved regions, the domain has not +* been attached to a device yet, so we don't know. We currently +* fix that up in ->apply_resv_regions() because that is the first +* time where we have access to a struct device that will be used +* with the IOMMU domain. However, that's asymmetric and doesn't +* take care of the page directory mapping either, so we need to +* come up with something better. +*/ + if (WARN_ON_ONCE(as->pd_dma == 0)) { + as->pd_dma = dma_map_page(smmu->dev, as->pd, 0, SMMU_SIZE_PD, + DMA_TO_DEVICE); + if (dma_mapping_error(smmu->dev, as->pd_dma)) + return; + + if (!smmu_dma_addr_valid(smmu, as->pd_dma)) { + dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, + DMA_TO_DEVICE); + return; + } + + unmap = true; + } /* Set the page directory entry first */ pd[pd_index] = value; @@ -546,6 +580,12 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova, smmu_flush_ptc(smmu, as->pd_dma, offset); smmu_flush_tlb_section(smmu, as->id, iova); smmu_flush(smmu); + + if (unmap) { + dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, + DMA_TO_DEVICE); + as->pd_dma = 0; + } } static u32 *tegra_smmu_pte_offset(struct page *pt_page, unsigned long iova) @@ -846,7 +886,6 @@ static struct iommu_device *tegra_smmu_probe_device(struct device *dev) smmu = tegra_smmu_find(args.np); if (smmu) { err = tegra_smmu_configure(smmu, dev, &args); - if (err < 0) { of_node_put(args.np); return
[PATCH v5 3/5] iommu: dma: Use of_iommu_get_resv_regions()
From: Thierry Reding For device tree nodes, use the standard of_iommu_get_resv_regions() implementation to obtain the reserved memory regions associated with a device. Signed-off-by: Thierry Reding --- drivers/iommu/dma-iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 1ca85d37eeab..3a40ae7a450b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -386,6 +387,8 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode)) iort_iommu_msi_get_resv_regions(dev, list); + if (dev->of_node) + of_iommu_get_resv_regions(dev, list); } EXPORT_SYMBOL(iommu_dma_get_resv_regions); -- 2.36.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/5] iommu: Implement of_iommu_get_resv_regions()
From: Thierry Reding This is an implementation that IOMMU drivers can use to obtain reserved memory regions from a device tree node. It uses the reserved-memory DT bindings to find the regions associated with a given device. If these regions are marked accordingly, identity mappings will be created for them in the IOMMU domain that the devices will be attached to. Signed-off-by: Thierry Reding --- Changes in v5: - update for new "iommu-addresses" device tree bindings Changes in v4: - fix build failure on !CONFIG_OF_ADDRESS Changes in v3: - change "active" property to identity mapping flag that is part of the memory region specifier (as defined by #memory-region-cells) to allow per-reference flags to be used Changes in v2: - use "active" property to determine whether direct mappings are needed drivers/iommu/of_iommu.c | 90 include/linux/of_iommu.h | 8 2 files changed, 98 insertions(+) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 5696314ae69e..9e341b5e307f 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -11,12 +11,15 @@ #include #include #include +#include #include #include #include #include #include +#include + #define NO_IOMMU 1 static int of_iommu_xlate(struct device *dev, @@ -172,3 +175,90 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, return ops; } + +/** + * of_iommu_get_resv_regions - reserved region driver helper for device tree + * @dev: device for which to get reserved regions + * @list: reserved region list + * + * IOMMU drivers can use this to implement their .get_resv_regions() callback + * for memory regions attached to a device tree node. See the reserved-memory + * device tree bindings on how to use these: + * + * Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt + */ +void of_iommu_get_resv_regions(struct device *dev, struct list_head *list) +{ +#if IS_ENABLED(CONFIG_OF_ADDRESS) + struct of_phandle_iterator it; + int err; + + of_for_each_phandle(&it, err, dev->of_node, "memory-region", NULL, 0) { + struct iommu_resv_region *region; + struct resource res; + const __be32 *maps; + int size; + + memset(&res, 0, sizeof(res)); + + /* +* The "reg" property is optional and can be omitted by reserved-memory regions +* that represent reservations in the IOVA space, which are regions that should +* not be mapped. +*/ + if (of_find_property(it.node, "reg", NULL)) { + err = of_address_to_resource(it.node, 0, &res); + if (err < 0) { + dev_err(dev, "failed to parse memory region %pOF: %d\n", + it.node, err); + continue; + } + } + + maps = of_get_property(it.node, "iommu-addresses", &size); + if (maps) { + const __be32 *end = maps + size / sizeof(__be32); + struct device_node *np; + unsigned int index = 0; + u32 phandle; + int na, ns; + + while (maps < end) { + phys_addr_t start, end; + size_t length; + + phandle = be32_to_cpup(maps++); + np = of_find_node_by_phandle(phandle); + na = of_n_addr_cells(np); + ns = of_n_size_cells(np); + + start = of_translate_dma_address(np, maps); + length = of_read_number(maps + na, ns); + end = start + length - 1; + + if (np == dev->of_node) { + int prot = IOMMU_READ | IOMMU_WRITE; + enum iommu_resv_type type; + + /* +* IOMMU regions without an associated physical region +* cannot be mapped and are simply reservations. +*/ + if (res.end > res.start) + type = IOMMU_RESV_DIRECT_RELAXABLE; + else + type = IOMMU_RESV_RESERVED; + + region = iommu_alloc_resv_region(start, length, prot, type); + if (region) + list_add_tail(®ion->list, list); +
[PATCH v5 1/5] dt-bindings: reserved-memory: Document iommu-addresses
From: Thierry Reding This adds the "iommu-addresses" property to reserved-memory nodes, which allow describing the interaction of memory regions with IOMMUs. Two use- cases are supported: 1. Static mappings can be described by pairing the "iommu-addresses" property with a "reg" property. This is mostly useful for adopting firmware-allocated buffers via identity mappings. One common use- case where this is required is if early firmware or bootloaders have set up a bootsplash framebuffer that a display controller is actively scanning out from during the operating system boot process. 2. If an "iommu-addresses" property exists without a "reg" property, the reserved-memory node describes an IOVA reservation. Such memory regions are excluded from the IOVA space available to operating system drivers and can be used for regions that must not be used to map arbitrary buffers. Each mapping or reservation is tied to a specific device via a phandle to the device's device tree node. This allows a reserved-memory region to be reused across multiple devices. Signed-off-by: Thierry Reding --- .../reserved-memory/reserved-memory.txt | 1 - .../reserved-memory/reserved-memory.yaml | 62 +++ include/dt-bindings/reserved-memory.h | 8 +++ 3 files changed, 70 insertions(+), 1 deletion(-) delete mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 include/dt-bindings/reserved-memory.h diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt deleted file mode 100644 index 1810701a8509.. --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ /dev/null @@ -1 +0,0 @@ -This file has been moved to reserved-memory.yaml. diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml index 7a0744052ff6..3a769aa66e1c 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml @@ -52,6 +52,30 @@ properties: Address and Length pairs. Specifies regions of memory that are acceptable to allocate from. + iommu-addresses: +$ref: /schemas/types.yaml#/definitions/phandle-array +description: > + A list of phandle and specifier pairs that describe static IO virtual + address space mappings and carveouts associated with a given reserved + memory region. The phandle in the first cell refers to the device for + which the mapping or carveout is to be created. + + The specifier consists of an address/size pair and denotes the IO + virtual address range of the region for the given device. The exact + format depends on the values of the "#address-cells" and "#size-cells" + properties of the device referenced via the phandle. + + When used in combination with a "reg" property, an IOVA mapping is to + be established for this memory region. One example where this can be + useful is to create an identity mapping for physical memory that the + firmware has configured some hardware to access (such as a bootsplash + framebuffer). + + If no "reg" property is specified, the "iommu-addresses" property + defines carveout regions in the IOVA space for the given device. This + can be useful if a certain memory region should not be mapped through + the IOMMU. + no-map: type: boolean description: > @@ -97,4 +121,42 @@ oneOf: additionalProperties: true +examples: + - | +reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; + + adsp: reservation-adsp { +/* + * Restrict IOVA mappings for ADSP buffers to the 512 MiB region + * from 0x4000 - 0x5fff. Anything outside is reserved by + * the ADSP for I/O memory and private memory allocations. + */ +iommu-addresses = <0x0 0x 0x00 0x4000>, + <0x0 0x6000 0xff 0xa000>; + }; + + fb: framebuffer@9000 { +reg = <0x0 0x9000 0x0 0x0080>; +iommu-addresses = <&dc0 0x0 0x9000 0x0 0x0080>; + }; +}; + +bus@0 { + #address-cells = <2>; + #size-cells = <2>; + + adsp@299 { +reg = <0x0 0x299 0x0 0x2000>; +memory-region = <&adsp>; + }; + + display@1520 { +reg = <0x0 0x1520 0x0 0x1>; +memory-region = <&fb>; + }; +}; + ... diff --git a/include/dt-bindings/reserved-memory.h b/include/dt-bindings/reserved-memory.h new file mode 100644 index ..174ca3448342 --- /dev/null +++ b/include/dt-bindings/reserved-memory.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identif
[PATCH v5 0/5] iommu: Support mappings/reservations in reserved-memory regions
From: Thierry Reding Hi, this is another attempt at solving the problem of passing IOMMU configuration via device tree. It has significantly evolved since the last attempt, based on the discussion that followed. The discussion can be found here: https://lore.kernel.org/all/20210423163234.3651547-1-thierry.red...@gmail.com/ Rather than using a memory-region specifier, this new version introduces a new "iommu-addresses" property for the reserved-memory regions themselves. These are used to describe either a static mapping or reservation that should be created for a given device. If both "reg" and "iommu-addresses" properties are given, a mapping will be created (typically this would be an identity mapping) whereas if only an "iommu-addresses" property is specified, a reservation for the specified range will be installed. An example is included in the DT bindings, but here is an extract of what I've used to test this: reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; /* * Creates an identity mapping for the framebuffer that * the firmware has setup to scan out a bootsplash from. */ fb: framebuffer@92cb2000 { reg = <0x0 0x92cb2000 0x0 0x0080>; iommu-addresses = <&dc0 0x0 0x92cb2000 0x0 0x0080>; }; /* * Creates a reservation in the IOVA space to prevent * any buffers from being mapped to that region. Note * that on Tegra the range is actually quite different * from this, but it would conflict with the display * driver that I tested this against, so this is just * a dummy region for testing. */ adsp: reservation-adsp { iommu-addresses = <&dc0 0x0 0x9000 0x0 0x0001>; }; }; host1x@5000 { dc@5420 { memory-region = <&fb>, <&adsp>; }; }; This is abbreviated a little to focus on the essentials. Note also that the ADSP reservation is not actually used on this device and the driver for this doesn't exist yet, but I wanted to include this variant for testing, because we'll want to use these bindings for the reservation use-case as well at some point. Adding Alyssa and Janne who have in the past tried to make these bindings work on Apple M1. Also adding Sameer from the Tegra audio team to look at the ADSP reservation and double-check that this is suitable for our needs. Thierry Navneet Kumar (1): iommu/tegra-smmu: Support managed domains Thierry Reding (4): dt-bindings: reserved-memory: Document iommu-addresses iommu: Implement of_iommu_get_resv_regions() iommu: dma: Use of_iommu_get_resv_regions() iommu/tegra-smmu: Add support for reserved regions .../reserved-memory/reserved-memory.txt | 1 - .../reserved-memory/reserved-memory.yaml | 62 + drivers/iommu/dma-iommu.c | 3 + drivers/iommu/of_iommu.c | 90 +++ drivers/iommu/tegra-smmu.c| 82 + include/dt-bindings/reserved-memory.h | 8 ++ include/linux/of_iommu.h | 8 ++ 7 files changed, 235 insertions(+), 19 deletions(-) delete mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 include/dt-bindings/reserved-memory.h -- 2.36.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Stop using iommu_present()
On Tue, 5 Apr 2022 13:11:54 +0100 Robin Murphy wrote: > IOMMU groups have been mandatory for some time now, so a device without > one is necessarily a device without any usable IOMMU, therefore the > iommu_present() check is redundant (or at best unhelpful). > > Signed-off-by: Robin Murphy > --- > drivers/vfio/vfio.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index a4555014bd1e..7b0a7b85e77e 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c > @@ -745,11 +745,11 @@ static struct vfio_group > *vfio_group_find_or_alloc(struct device *dev) > > iommu_group = iommu_group_get(dev); > #ifdef CONFIG_VFIO_NOIOMMU > - if (!iommu_group && noiommu && !iommu_present(dev->bus)) { > + if (!iommu_group && noiommu) { > /* >* With noiommu enabled, create an IOMMU group for devices that > - * don't already have one and don't have an iommu_ops on their > - * bus. Taint the kernel because we're about to give a DMA > + * don't already have one, implying no IOMMU hardware/driver > + * exists. Taint the kernel because we're about to give a DMA >* capable device to a user without IOMMU protection. >*/ > group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU); Applied to vfio next branch for v5.19. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
Hi, On 5/10/22 20:13, Jason Gunthorpe wrote: > On Tue, May 10, 2022 at 06:52:06PM +0100, Robin Murphy wrote: >> On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote: >>> This control causes the ARM SMMU drivers to choose a stage 2 >>> implementation for the IO pagetable (vs the stage 1 usual default), >>> however this choice has no visible impact to the VFIO user. Further qemu >>> never implemented this and no other userspace user is known. >>> >>> The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add >>> new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide >>> SMMU translation services to the guest operating system" however the rest >>> of the API to set the guest table pointer for the stage 1 was never >>> completed, or at least never upstreamed, rendering this part useless dead >>> code. >>> >>> Since the current patches to enable nested translation, aka userspace page >>> tables, rely on iommufd and will not use the enable_nesting() >>> iommu_domain_op, remove this infrastructure. However, don't cut too deep >>> into the SMMU drivers for now expecting the iommufd work to pick it up - >>> we still need to create S2 IO page tables. >>> >>> Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the >>> enable_nesting iommu_domain_op. >>> >>> Just in-case there is some userspace using this continue to treat >>> requesting it as a NOP, but do not advertise support any more. >> I assume the nested translation/guest SVA patches that Eric and Vivek were >> working on pre-IOMMUFD made use of this, and given that they got quite far >> along, I wouldn't be too surprised if some eager cloud vendors might have >> even deployed something based on the patches off the list. thank you Robin for the heads up. > With upstream there is no way to make use of this flag, if someone is > using it they have other out of tree kernel, vfio, kvm and qemu > patches to make it all work. > > You can see how much is still needed in Eric's tree: > > https://github.com/eauger/linux/commits/v5.15-rc7-nested-v16 > >> I can't help feeling a little wary about removing this until IOMMUFD >> can actually offer a functional replacement - is it in the way of >> anything upcoming? > From an upstream perspective if someone has a patched kernel to > complete the feature, then they can patch this part in as well, we > should not carry dead code like this in the kernel and in the uapi. On the other end the code is in the kernel for 8 years now, I think we could wait for some additional weeks/months until the iommufd nested integration arises and gets tested. Thanks Eric > > It is not directly in the way, but this needs to get done at some > point, I'd rather just get it out of the way. > > Thanks, > Jason > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting
To support up to 64 sockets with 10 DMAR units each (640), make the value of DMAR_UNITS_SUPPORTED adjustable by a config variable, CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is set. If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled"; and the system fails to boot properly. Signed-off-by: Steve Wahl --- Note that we could not find a reason for connecting DMAR_UNITS_SUPPORTED to MAX_IO_APICS as was done previously. Perhaps it seemed like the two would continue to match on earlier processors. There doesn't appear to be kernel code that assumes that the value of one is related to the other. v2: Make this value a config option, rather than a fixed constant. The default values should match previous configuration except in the MAXSMP case. Keeping the value at a power of two was requested by Kevin Tian. drivers/iommu/intel/Kconfig | 6 ++ include/linux/dmar.h| 6 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig index 247d0f2d5fdf..fdbda77ac21e 100644 --- a/drivers/iommu/intel/Kconfig +++ b/drivers/iommu/intel/Kconfig @@ -9,6 +9,12 @@ config DMAR_PERF config DMAR_DEBUG bool +config DMAR_UNITS_SUPPORTED + int "Number of DMA Remapping Units supported" + default 1024 if MAXSMP + default 128 if X86_64 + default 64 + config INTEL_IOMMU bool "Support for Intel IOMMU using DMA Remapping Devices" depends on PCI_MSI && ACPI && (X86 || IA64) diff --git a/include/linux/dmar.h b/include/linux/dmar.h index 45e903d84733..0c03c1845c23 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -18,11 +18,7 @@ struct acpi_dmar_header; -#ifdef CONFIG_X86 -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS -#else -# define DMAR_UNITS_SUPPORTED64 -#endif +#defineDMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED /* DMAR Flags */ #define DMAR_INTR_REMAP0x1 -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/2] iomm/mediatek: Allow page table PA up to 35bit
From: Yunfei Wang Add the quirk IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT support, so that allows page table PA up to 35bit, not only in ZONE_DMA32. Signed-off-by: Ning Li Signed-off-by: Yunfei Wang --- drivers/iommu/mtk_iommu.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 6fd75a60abd6..1b9a876ef271 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -33,6 +33,7 @@ #define REG_MMU_PT_BASE_ADDR 0x000 #define MMU_PT_ADDR_MASK GENMASK(31, 7) +#define MMU_PT_ADDR_2_0_MASK GENMASK(2, 0) #define REG_MMU_INVALIDATE 0x020 #define F_ALL_INVLD0x2 @@ -118,6 +119,7 @@ #define WR_THROT_ENBIT(6) #define HAS_LEGACY_IVRP_PADDR BIT(7) #define IOVA_34_EN BIT(8) +#define PGTABLE_PA_35_EN BIT(9) #define MTK_IOMMU_HAS_FLAG(pdata, _x) \ pdata)->flags) & (_x)) == (_x)) @@ -401,6 +403,9 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom, .iommu_dev = data->dev, }; + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + dom->cfg.quirks |= IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT; + if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) dom->cfg.oas = data->enable_4GB ? 33 : 32; else @@ -450,6 +455,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, struct mtk_iommu_domain *dom = to_mtk_domain(domain); struct device *m4udev = data->dev; int ret, domid; + u32 regval; domid = mtk_iommu_get_domain_id(dev, data->plat_data); if (domid < 0) @@ -472,8 +478,14 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, return ret; } data->m4u_dom = dom; - writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, - data->base + REG_MMU_PT_BASE_ADDR); + + /* Bits[6:3] are invalid for mediatek platform */ + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + regval = (dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) | +(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK); + else + regval = dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK; + writel(regval, data->base + REG_MMU_PT_BASE_ADDR); pm_runtime_put(m4udev); } @@ -987,6 +999,7 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) struct mtk_iommu_suspend_reg *reg = &data->reg; struct mtk_iommu_domain *m4u_dom = data->m4u_dom; void __iomem *base = data->base; + u32 regval; int ret; ret = clk_prepare_enable(data->bclk); @@ -1010,7 +1023,14 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL); writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR); writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG); - writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + REG_MMU_PT_BASE_ADDR); + + /* Bits[6:3] are invalid for mediatek platform */ + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + regval = (m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) | +(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK); + else + regval = m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK; + writel(regval, base + REG_MMU_PT_BASE_ADDR); /* * Users may allocate dma buffer before they call pm_runtime_get, @@ -1038,7 +1058,8 @@ static const struct mtk_iommu_plat_data mt2712_data = { static const struct mtk_iommu_plat_data mt6779_data = { .m4u_plat = M4U_MT6779, - .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN, + .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN | +PGTABLE_PA_35_EN, .inv_sel_reg = REG_MMU_INV_SEL_GEN2, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit
From: Yunfei Wang The calling to kmem_cache_alloc for level 2 pgtable allocation may run in atomic context, and it fails sometimes when DMA32 zone runs out of memory. Since Mediatek IOMMU hardware support at most 35bit PA in pgtable, so add a quirk to allow the PA of pgtables support up to bit35. Signed-off-by: Ning Li Signed-off-by: Yunfei Wang --- drivers/iommu/io-pgtable-arm-v7s.c | 56 ++ include/linux/io-pgtable.h | 15 +--- 2 files changed, 52 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index be066c1503d3..57455ae052ac 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -149,6 +149,10 @@ #define ARM_V7S_TTBR_IRGN_ATTR(attr) \ attr) & 0x1) << 6) | (((attr) & 0x2) >> 1)) +/* Mediatek extend ttbr bits[2:0] for PA bits[34:32] */ +#define ARM_V7S_TTBR_35BIT_PA(ttbr, pa) \ + ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) + #ifdef CONFIG_ZONE_DMA32 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32 @@ -182,14 +186,8 @@ static bool arm_v7s_is_mtk_enabled(struct io_pgtable_cfg *cfg) (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT); } -static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, - struct io_pgtable_cfg *cfg) +static arm_v7s_iopte to_iopte_mtk(phys_addr_t paddr, arm_v7s_iopte pte) { - arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); - - if (!arm_v7s_is_mtk_enabled(cfg)) - return pte; - if (paddr & BIT_ULL(32)) pte |= ARM_V7S_ATTR_MTK_PA_BIT32; if (paddr & BIT_ULL(33)) @@ -199,6 +197,17 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, return pte; } +static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, + struct io_pgtable_cfg *cfg) +{ + arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); + + if (!arm_v7s_is_mtk_enabled(cfg)) + return pte; + + return to_iopte_mtk(paddr, pte); +} + static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl, struct io_pgtable_cfg *cfg) { @@ -234,6 +243,7 @@ static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int lvl, static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, struct arm_v7s_io_pgtable *data) { + gfp_t gfp_l1 = __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA; struct io_pgtable_cfg *cfg = &data->iop.cfg; struct device *dev = cfg->iommu_dev; phys_addr_t phys; @@ -241,9 +251,11 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg); void *table = NULL; + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) + gfp_l1 = __GFP_ZERO; + if (lvl == 1) - table = (void *)__get_free_pages( - __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size)); + table = (void *)__get_free_pages(gfp_l1, get_order(size)); else if (lvl == 2) table = kmem_cache_zalloc(data->l2_tables, gfp); @@ -251,7 +263,8 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, return NULL; phys = virt_to_phys(table); - if (phys != (arm_v7s_iopte)phys) { + if (phys != (arm_v7s_iopte)phys && + !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)) { /* Doesn't fit in PTE */ dev_err(dev, "Page table does not fit in PTE: %pa", &phys); goto out_free; @@ -457,9 +470,14 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table, arm_v7s_iopte curr, struct io_pgtable_cfg *cfg) { + phys_addr_t phys = virt_to_phys(table); arm_v7s_iopte old, new; - new = virt_to_phys(table) | ARM_V7S_PTE_TYPE_TABLE; + new = phys | ARM_V7S_PTE_TYPE_TABLE; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) + new = to_iopte_mtk(phys, new); + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS) new |= ARM_V7S_ATTR_NS_TABLE; @@ -778,7 +796,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops, static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) { + slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS; struct arm_v7s_io_pgtable *data; + phys_addr_t paddr; if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS)) return NULL; @@ -788,7 +808,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, if (cfg->q
[PATCH v2 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit
From: Yunfei Wang The calling to kmem_cache_alloc for level 2 pgtable allocation may run in atomic context, and it fails sometimes when DMA32 zone runs out of memory. Since Mediatek IOMMU hardware support at most 35bit PA in pgtable, so add a quirk to allow the PA of pgtables support up to bit35. Signed-off-by: Ning Li Signed-off-by: Yunfei Wang --- drivers/iommu/io-pgtable-arm-v7s.c | 56 ++ include/linux/io-pgtable.h | 15 +--- 2 files changed, 52 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index be066c1503d3..57455ae052ac 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -149,6 +149,10 @@ #define ARM_V7S_TTBR_IRGN_ATTR(attr) \ attr) & 0x1) << 6) | (((attr) & 0x2) >> 1)) +/* Mediatek extend ttbr bits[2:0] for PA bits[34:32] */ +#define ARM_V7S_TTBR_35BIT_PA(ttbr, pa) \ + ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32)) + #ifdef CONFIG_ZONE_DMA32 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32 @@ -182,14 +186,8 @@ static bool arm_v7s_is_mtk_enabled(struct io_pgtable_cfg *cfg) (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT); } -static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, - struct io_pgtable_cfg *cfg) +static arm_v7s_iopte to_iopte_mtk(phys_addr_t paddr, arm_v7s_iopte pte) { - arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); - - if (!arm_v7s_is_mtk_enabled(cfg)) - return pte; - if (paddr & BIT_ULL(32)) pte |= ARM_V7S_ATTR_MTK_PA_BIT32; if (paddr & BIT_ULL(33)) @@ -199,6 +197,17 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, return pte; } +static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, + struct io_pgtable_cfg *cfg) +{ + arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); + + if (!arm_v7s_is_mtk_enabled(cfg)) + return pte; + + return to_iopte_mtk(paddr, pte); +} + static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl, struct io_pgtable_cfg *cfg) { @@ -234,6 +243,7 @@ static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int lvl, static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, struct arm_v7s_io_pgtable *data) { + gfp_t gfp_l1 = __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA; struct io_pgtable_cfg *cfg = &data->iop.cfg; struct device *dev = cfg->iommu_dev; phys_addr_t phys; @@ -241,9 +251,11 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg); void *table = NULL; + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) + gfp_l1 = __GFP_ZERO; + if (lvl == 1) - table = (void *)__get_free_pages( - __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size)); + table = (void *)__get_free_pages(gfp_l1, get_order(size)); else if (lvl == 2) table = kmem_cache_zalloc(data->l2_tables, gfp); @@ -251,7 +263,8 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp, return NULL; phys = virt_to_phys(table); - if (phys != (arm_v7s_iopte)phys) { + if (phys != (arm_v7s_iopte)phys && + !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)) { /* Doesn't fit in PTE */ dev_err(dev, "Page table does not fit in PTE: %pa", &phys); goto out_free; @@ -457,9 +470,14 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table, arm_v7s_iopte curr, struct io_pgtable_cfg *cfg) { + phys_addr_t phys = virt_to_phys(table); arm_v7s_iopte old, new; - new = virt_to_phys(table) | ARM_V7S_PTE_TYPE_TABLE; + new = phys | ARM_V7S_PTE_TYPE_TABLE; + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT) + new = to_iopte_mtk(phys, new); + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS) new |= ARM_V7S_ATTR_NS_TABLE; @@ -778,7 +796,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops, static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) { + slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS; struct arm_v7s_io_pgtable *data; + phys_addr_t paddr; if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS)) return NULL; @@ -788,7 +808,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, if (cfg->q
[PATCH v2 2/2] iomm/mediatek: Allow page table PA up to 35bit
From: Yunfei Wang Add the quirk IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT support, so that allows page table PA up to 35bit, not only in ZONE_DMA32. Signed-off-by: Ning Li Signed-off-by: Yunfei Wang --- drivers/iommu/mtk_iommu.c | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 6fd75a60abd6..1b9a876ef271 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -33,6 +33,7 @@ #define REG_MMU_PT_BASE_ADDR 0x000 #define MMU_PT_ADDR_MASK GENMASK(31, 7) +#define MMU_PT_ADDR_2_0_MASK GENMASK(2, 0) #define REG_MMU_INVALIDATE 0x020 #define F_ALL_INVLD0x2 @@ -118,6 +119,7 @@ #define WR_THROT_ENBIT(6) #define HAS_LEGACY_IVRP_PADDR BIT(7) #define IOVA_34_EN BIT(8) +#define PGTABLE_PA_35_EN BIT(9) #define MTK_IOMMU_HAS_FLAG(pdata, _x) \ pdata)->flags) & (_x)) == (_x)) @@ -401,6 +403,9 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom, .iommu_dev = data->dev, }; + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + dom->cfg.quirks |= IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT; + if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) dom->cfg.oas = data->enable_4GB ? 33 : 32; else @@ -450,6 +455,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, struct mtk_iommu_domain *dom = to_mtk_domain(domain); struct device *m4udev = data->dev; int ret, domid; + u32 regval; domid = mtk_iommu_get_domain_id(dev, data->plat_data); if (domid < 0) @@ -472,8 +478,14 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, return ret; } data->m4u_dom = dom; - writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, - data->base + REG_MMU_PT_BASE_ADDR); + + /* Bits[6:3] are invalid for mediatek platform */ + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + regval = (dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) | +(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK); + else + regval = dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK; + writel(regval, data->base + REG_MMU_PT_BASE_ADDR); pm_runtime_put(m4udev); } @@ -987,6 +999,7 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) struct mtk_iommu_suspend_reg *reg = &data->reg; struct mtk_iommu_domain *m4u_dom = data->m4u_dom; void __iomem *base = data->base; + u32 regval; int ret; ret = clk_prepare_enable(data->bclk); @@ -1010,7 +1023,14 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev) writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL); writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR); writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG); - writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + REG_MMU_PT_BASE_ADDR); + + /* Bits[6:3] are invalid for mediatek platform */ + if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN)) + regval = (m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) | +(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK); + else + regval = m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK; + writel(regval, base + REG_MMU_PT_BASE_ADDR); /* * Users may allocate dma buffer before they call pm_runtime_get, @@ -1038,7 +1058,8 @@ static const struct mtk_iommu_plat_data mt2712_data = { static const struct mtk_iommu_plat_data mt6779_data = { .m4u_plat = M4U_MT6779, - .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN, + .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN | +PGTABLE_PA_35_EN, .inv_sel_reg = REG_MMU_INV_SEL_GEN2, .iova_region = single_domain, .iova_region_nr = ARRAY_SIZE(single_domain), -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 20:03, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote: On 2022/5/12 19:48, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. Why would we have a special unbind for SVA? It's about SVA kAPI for device drivers. The existing kAPIs include: struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); This is not what we agreed the API should be. We agreed: iommu_sva_domain_alloc() iommu_attach_device_pasid() iommu_detach_device_pasid() Again, SVA should not be different from normal domain stuff. Yes, agreed. I am trying to achieve this in two steps. This first step focuses on internal iommu implementation and keep the driver kAPI untouched. Then, the second step focus on the driver APIs. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 19:51, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote: + mutex_lock(&group->mutex); + domain = xa_load(&group->pasid_array, pasid); + if (domain && domain->type != type) + domain = NULL; + mutex_unlock(&group->mutex); + iommu_group_put(group); + + return domain; This is bad locking, group->pasid_array values cannot be taken outside the lock. It's not iommu core, but SVA (or other feature components) that manage the life cycle of a domain. The iommu core only provides a place to store the domain pointer. The feature components are free to fetch their domain pointers from iommu core as long as they are sure that the domain is alive during use. I'm not convinced. I'm sorry, I may not have explained it clearly. :-) This helper is safe for uses inside the IOMMU subsystem. We could trust ourselves that nobody will abuse this helper to obtain domains belonging to others as the pasid has been allocated for SVA code. No other code should be able to setup another type of domain for this pasid. The SVA code has its own lock mechanism to avoid using after free. Please correct me if I missed anything. :-) By the way, I can see some similar helpers in current IOMMU core. For example, struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) { struct iommu_domain *domain; struct iommu_group *group; group = iommu_group_get(dev); if (!group) return NULL; domain = group->domain; iommu_group_put(group); return domain; } EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
On 2022/5/12 下午7:59, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 07:57:26PM +0800, zhangfei@foxmail.com wrote: On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote: On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote: I can't help feeling a little wary about removing this until IOMMUFD can actually offer a functional replacement - is it in the way of anything upcoming? From an upstream perspective if someone has a patched kernel to complete the feature, then they can patch this part in as well, we should not carry dead code like this in the kernel and in the uapi. It is not directly in the way, but this needs to get done at some point, I'd rather just get it out of the way. We are using this interface for nested mode. How are you using it? It doesn't do anything. Do you have out of tree patches as well? Yes, there are some patches out of tree, since they are pending for almost one year. By the way, I am trying to test nesting mode based on iommufd, still requires iommu_enable_nesting, which is removed in this patch as well. This is temporary. So can we wait for alternative patch then remove them? Do you see a problem with patching this along with all the other patches you need? Not yet. Currently I am using two workarounds, but it can simply work. 1. Hard code to to enable nesting mode, both in kernel and qemu. Will consider then. 2. Adding VFIOIOASHwpt *hwpt in VFIOIOMMUFDContainer. And save hwpt when vfio_device_attach_container. While currently VFIOIOMMUFDContainer has hwpt_list now. Have asked Yi in another mail. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote: > On 2022/5/12 19:48, Jason Gunthorpe wrote: > > On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: > > > On 2022/5/12 13:01, Tian, Kevin wrote: > > > > > From: Baolu Lu > > > > > Sent: Thursday, May 12, 2022 11:03 AM > > > > > > > > > > On 2022/5/11 22:53, Jason Gunthorpe wrote: > > > > > > > > Also, given the current arrangement it might make sense to have > > > > > > > > a > > > > > > > > struct iommu_domain_sva given that no driver is wrappering this > > > > > > > > in > > > > > > > > something else. > > > > > > > Fair enough. How about below wrapper? > > > > > > > > > > > > > > +struct iommu_sva_domain { > > > > > > > + /* > > > > > > > +* Common iommu domain header,*must* be put at the top > > > > > > > +* of the structure. > > > > > > > +*/ > > > > > > > + struct iommu_domain domain; > > > > > > > + struct mm_struct *mm; > > > > > > > + struct iommu_sva bond; > > > > > > > +} > > > > > > > > > > > > > > The refcount is wrapped in bond. > > > > > > I'm still not sure that bond is necessary > > > > > > > > > > "bond" is the sva handle that the device drivers get through calling > > > > > iommu_sva_bind(). > > > > > > > > > > > > > 'bond' was required before because we didn't have a domain to wrap > > > > the page table at that time. > > > > > > > > Now we have a domain and it is 1:1 associated to bond. Probably > > > > make sense now by just returning the domain as the sva handle > > > > instead? > > > > > > It also includes the device information that the domain has been > > > attached. So the sva_unbind() looks like this: > > > > > > /** > > > * iommu_sva_unbind_device() - Remove a bond created with > > > iommu_sva_bind_device > > > * @handle: the handle returned by iommu_sva_bind_device() > > > * > > > * Put reference to a bond between device and address space. The device > > > should > > > * not be issuing any more transaction for this PASID. All outstanding > > > page > > > * requests for this PASID must have been flushed to the IOMMU. > > > */ > > > void iommu_sva_unbind_device(struct iommu_sva *handle) > > > > > > It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, > > > if we can include the device in the unbind() interface. > > > > Why would we have a special unbind for SVA? > > It's about SVA kAPI for device drivers. The existing kAPIs include: > > struct iommu_sva *iommu_sva_bind_device(struct device *dev, > struct mm_struct *mm, > void *drvdata); > void iommu_sva_unbind_device(struct iommu_sva *handle); > u32 iommu_sva_get_pasid(struct iommu_sva *handle); This is not what we agreed the API should be. We agreed: iommu_sva_domain_alloc() iommu_attach_device_pasid() iommu_detach_device_pasid() Again, SVA should not be different from normal domain stuff. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 19:48, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. Why would we have a special unbind for SVA? It's about SVA kAPI for device drivers. The existing kAPIs include: struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); SVA should not different from normal domains it should use the normal detach flow too. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
On Thu, May 12, 2022 at 07:57:26PM +0800, zhangfei@foxmail.com wrote: > > > On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote: > > On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote: > > > > > I can't help feeling a little wary about removing this until IOMMUFD > > > > > can actually offer a functional replacement - is it in the way of > > > > > anything upcoming? > > > > From an upstream perspective if someone has a patched kernel to > > > > complete the feature, then they can patch this part in as well, we > > > > should not carry dead code like this in the kernel and in the uapi. > > > > > > > > It is not directly in the way, but this needs to get done at some > > > > point, I'd rather just get it out of the way. > > > We are using this interface for nested mode. > > How are you using it? It doesn't do anything. Do you have out of tree > > patches as well? > > Yes, there are some patches out of tree, since they are pending for almost > one year. > > By the way, I am trying to test nesting mode based on iommufd, still > requires iommu_enable_nesting, > which is removed in this patch as well. This is temporary. > So can we wait for alternative patch then remove them? Do you see a problem with patching this along with all the other patches you need? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote: On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote: I can't help feeling a little wary about removing this until IOMMUFD can actually offer a functional replacement - is it in the way of anything upcoming? From an upstream perspective if someone has a patched kernel to complete the feature, then they can patch this part in as well, we should not carry dead code like this in the kernel and in the uapi. It is not directly in the way, but this needs to get done at some point, I'd rather just get it out of the way. We are using this interface for nested mode. How are you using it? It doesn't do anything. Do you have out of tree patches as well? Yes, there are some patches out of tree, since they are pending for almost one year. By the way, I am trying to test nesting mode based on iommufd, still requires iommu_enable_nesting, which is removed in this patch as well. So can we wait for alternative patch then remove them? Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid
On Thu, May 12, 2022 at 02:22:03PM +0800, Baolu Lu wrote: > Hi Jason, > > On 2022/5/12 01:00, Jason Gunthorpe wrote: > > > Consolidate pasid programming into dev_set_pasid() then called by both > > > intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right? > > I was only suggesting that really dev_attach_pasid() op is misnamed, > > it should be called set_dev_pasid() and act like a set, not a paired > > attach/detach - same as the non-PASID ops. > > So, > > "set_dev_pasid(domain, device, pasid)" equals to dev_attach_pasid() > > and > > "set_dev_pasid(NULL, device, pasid)" equals to dev_detach_pasid()? > > do I understand it right? blocking_domain should be passed, not null Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote: > > > + mutex_lock(&group->mutex); > > > + domain = xa_load(&group->pasid_array, pasid); > > > + if (domain && domain->type != type) > > > + domain = NULL; > > > + mutex_unlock(&group->mutex); > > > + iommu_group_put(group); > > > + > > > + return domain; > > This is bad locking, group->pasid_array values cannot be taken outside > > the lock. > > It's not iommu core, but SVA (or other feature components) that manage > the life cycle of a domain. The iommu core only provides a place to > store the domain pointer. The feature components are free to fetch their > domain pointers from iommu core as long as they are sure that the domain > is alive during use. I'm not convinced. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops
On Thu, May 12, 2022 at 08:00:59AM +0100, Jean-Philippe Brucker wrote: > > It is not "missing" it is just renamed to > > blocking_domain->ops->set_dev_pasid() > > > > The implementation of that function would be identical to > > detach_dev_pasid. > > attach(dev, pasid, sva_domain) > detach(dev, pasid, sva_domain) > > versus > > set_dev_pasid(dev, pasid, sva_domain) > set_dev_pasid(dev, pasid, blocking) > > we loose the information of the domain previously attached, and the SMMU > driver has to retrieve it to find the ASID corresponding to the mm. It would be easy to have the old domain be an input as well - the core code knows it. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: > On 2022/5/12 13:01, Tian, Kevin wrote: > > > From: Baolu Lu > > > Sent: Thursday, May 12, 2022 11:03 AM > > > > > > On 2022/5/11 22:53, Jason Gunthorpe wrote: > > > > > > Also, given the current arrangement it might make sense to have a > > > > > > struct iommu_domain_sva given that no driver is wrappering this in > > > > > > something else. > > > > > Fair enough. How about below wrapper? > > > > > > > > > > +struct iommu_sva_domain { > > > > > + /* > > > > > +* Common iommu domain header,*must* be put at the top > > > > > +* of the structure. > > > > > +*/ > > > > > + struct iommu_domain domain; > > > > > + struct mm_struct *mm; > > > > > + struct iommu_sva bond; > > > > > +} > > > > > > > > > > The refcount is wrapped in bond. > > > > I'm still not sure that bond is necessary > > > > > > "bond" is the sva handle that the device drivers get through calling > > > iommu_sva_bind(). > > > > > > > 'bond' was required before because we didn't have a domain to wrap > > the page table at that time. > > > > Now we have a domain and it is 1:1 associated to bond. Probably > > make sense now by just returning the domain as the sva handle > > instead? > > It also includes the device information that the domain has been > attached. So the sva_unbind() looks like this: > > /** > * iommu_sva_unbind_device() - Remove a bond created with > iommu_sva_bind_device > * @handle: the handle returned by iommu_sva_bind_device() > * > * Put reference to a bond between device and address space. The device > should > * not be issuing any more transaction for this PASID. All outstanding page > * requests for this PASID must have been flushed to the IOMMU. > */ > void iommu_sva_unbind_device(struct iommu_sva *handle) > > It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, > if we can include the device in the unbind() interface. Why would we have a special unbind for SVA? SVA should not different from normal domains it should use the normal detach flow too. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote: > > > I can't help feeling a little wary about removing this until IOMMUFD > > > can actually offer a functional replacement - is it in the way of > > > anything upcoming? > > From an upstream perspective if someone has a patched kernel to > > complete the feature, then they can patch this part in as well, we > > should not carry dead code like this in the kernel and in the uapi. > > > > It is not directly in the way, but this needs to get done at some > > point, I'd rather just get it out of the way. > > We are using this interface for nested mode. How are you using it? It doesn't do anything. Do you have out of tree patches as well? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU
Hi, Jason On 2022/5/11 上午2:13, Jason Gunthorpe via iommu wrote: On Tue, May 10, 2022 at 06:52:06PM +0100, Robin Murphy wrote: On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote: This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 was never completed, or at least never upstreamed, rendering this part useless dead code. Since the current patches to enable nested translation, aka userspace page tables, rely on iommufd and will not use the enable_nesting() iommu_domain_op, remove this infrastructure. However, don't cut too deep into the SMMU drivers for now expecting the iommufd work to pick it up - we still need to create S2 IO page tables. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. I assume the nested translation/guest SVA patches that Eric and Vivek were working on pre-IOMMUFD made use of this, and given that they got quite far along, I wouldn't be too surprised if some eager cloud vendors might have even deployed something based on the patches off the list. With upstream there is no way to make use of this flag, if someone is using it they have other out of tree kernel, vfio, kvm and qemu patches to make it all work. You can see how much is still needed in Eric's tree: https://github.com/eauger/linux/commits/v5.15-rc7-nested-v16 I can't help feeling a little wary about removing this until IOMMUFD can actually offer a functional replacement - is it in the way of anything upcoming? From an upstream perspective if someone has a patched kernel to complete the feature, then they can patch this part in as well, we should not carry dead code like this in the kernel and in the uapi. It is not directly in the way, but this needs to get done at some point, I'd rather just get it out of the way. We are using this interface for nested mode. Is the replacing patch ready? Can we remove this until submitting the new one? Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops
On Wed, May 11, 2022 at 09:02:40AM -0300, Jason Gunthorpe wrote: > On Wed, May 11, 2022 at 08:54:39AM +0100, Jean-Philippe Brucker wrote: > > > > > Then 'detach pasid' is: > > > > > > > > > > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev, > > > > pasid); > > > > > > > > > > And we move away from the notion of 'detach' and in the direction that > > > > > everything continuously has a domain set. PASID would logically > > > > > default to blocking_domain, though we wouldn't track this anywhere. > > > > > > > > I am not sure whether we still need to keep the blocking domain concept > > > > when we are entering the new PASID world. Please allow me to wait and > > > > listen to more opinions. > > > > > > > > > > I'm with Jason on this direction. In concept after a PASID is detached > > > it's > > > essentially blocked. Implementation-wise it doesn't prevent the iommu > > > driver from marking the PASID entry as non-present as doing in this > > > series instead of actually pointing to the empty page table of the block > > > domain. But api-wise it does make the entire semantics more consistent. > > > > This is all internal to IOMMU so I don't think we should be concerned > > about API consistency. I prefer a straighforward detach() operation > > because that way IOMMU drivers don't have to keep track of which domain is > > attached to which PASID. That code can be factored into the IOMMU core. > > Why would a driver need to keep additional tracking? > > > In addition to clearing contexts, detach() also needs to invalidate TLBs, > > and for that the SMMU driver needs to know the old ASID (!= PASID) that > > was used by the context descriptor. We can certainly work around a missing > > detach() to implement this, but it will be convoluted. > > It is not "missing" it is just renamed to > blocking_domain->ops->set_dev_pasid() > > The implementation of that function would be identical to > detach_dev_pasid. attach(dev, pasid, sva_domain) detach(dev, pasid, sva_domain) versus set_dev_pasid(dev, pasid, sva_domain) set_dev_pasid(dev, pasid, blocking) we loose the information of the domain previously attached, and the SMMU driver has to retrieve it to find the ASID corresponding to the mm. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu