Re: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry
On 2022/4/24 13:55, Tian, Kevin wrote: From: Lu Baolu Sent: Sunday, April 24, 2022 12:38 PM On 2022/4/24 11:37, Tian, Kevin wrote: This should be rebased on top of Jason's enforce coherency series instead of blindly setting it. No matter whether it's legacy mode where we set SNP in PTE or scalable mode where we set PGSNP in PASID entry for entire page table, the trigger point should be same i.e. when someone calls enforce_cache_coherency(). With Jason's enforce coherency series merged, we even don't need to set PGSNP bit of a pasid entry for second level translation. 2nd level always supports SNP in PTEs, so set PGSNP in pasid table entry is unnecessary. Yes, this sounds correct for 2nd-level. but setting PGSNP of 1st level translation is also relevant to that change when talking about enforcing coherency in the guest. In this case PASID_FLAG_PAGE_SNOOP should be set also after enforce_cache_coherency() is called. Yes. Agreed. Currently it's always set for unmanaged domain in domain_setup_first_level(): if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) flags |= PASID_FLAG_PAGE_SNOOP; Suppose we need a separate interface to update PGSNP after pasid entry is set up. Currently enforcing coherency is only used in VFIO. In the VFIO use case, it's safe to always set PGSNP when an UNMANAGED domain is attached on the first level pasid translation. We could add support of updating PGSNP after pasid entry setup once there's a real need. The real need is here. The iommu driver should not assume the policy of VFIO, which is already communicated via the new enforce_cache_coherency() interface. The same policy should apply no matter whether 1st or 2nd level is in-use. Okay, I think I will move this patch out of this series and put it in a separated one for VT-d improvements after Jason's enforcing snoop series gets merged. Thanks for your suggestions. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry
> From: Lu Baolu > Sent: Sunday, April 24, 2022 12:38 PM > > On 2022/4/24 11:37, Tian, Kevin wrote: > >>> This should be rebased on top of Jason's enforce coherency series > >>> instead of blindly setting it. No matter whether it's legacy mode > >>> where we set SNP in PTE or scalable mode where we set PGSNP > >>> in PASID entry for entire page table, the trigger point should be > >>> same i.e. when someone calls enforce_cache_coherency(). > >> With Jason's enforce coherency series merged, we even don't need to set > >> PGSNP bit of a pasid entry for second level translation. 2nd level > >> always supports SNP in PTEs, so set PGSNP in pasid table entry is > >> unnecessary. > >> > > Yes, this sounds correct for 2nd-level. > > > > but setting PGSNP of 1st level translation is also relevant to that > > change when talking about enforcing coherency in the guest. In > > this case PASID_FLAG_PAGE_SNOOP should be set also after > > enforce_cache_coherency() is called. > > Yes. Agreed. > > > Currently it's always set for unmanaged domain in > > domain_setup_first_level(): > > > > if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) > > flags |= PASID_FLAG_PAGE_SNOOP; > > > > Suppose we need a separate interface to update PGSNP after pasid > > entry is set up. > > Currently enforcing coherency is only used in VFIO. In the VFIO use > case, it's safe to always set PGSNP when an UNMANAGED domain is attached > on the first level pasid translation. We could add support of updating > PGSNP after pasid entry setup once there's a real need. > The real need is here. The iommu driver should not assume the policy of VFIO, which is already communicated via the new enforce_cache_coherency() interface. The same policy should apply no matter whether 1st or 2nd level is in-use. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry
On 2022/4/24 11:37, Tian, Kevin wrote: This should be rebased on top of Jason's enforce coherency series instead of blindly setting it. No matter whether it's legacy mode where we set SNP in PTE or scalable mode where we set PGSNP in PASID entry for entire page table, the trigger point should be same i.e. when someone calls enforce_cache_coherency(). With Jason's enforce coherency series merged, we even don't need to set PGSNP bit of a pasid entry for second level translation. 2nd level always supports SNP in PTEs, so set PGSNP in pasid table entry is unnecessary. Yes, this sounds correct for 2nd-level. but setting PGSNP of 1st level translation is also relevant to that change when talking about enforcing coherency in the guest. In this case PASID_FLAG_PAGE_SNOOP should be set also after enforce_cache_coherency() is called. Yes. Agreed. Currently it's always set for unmanaged domain in domain_setup_first_level(): if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) flags |= PASID_FLAG_PAGE_SNOOP; Suppose we need a separate interface to update PGSNP after pasid entry is set up. Currently enforcing coherency is only used in VFIO. In the VFIO use case, it's safe to always set PGSNP when an UNMANAGED domain is attached on the first level pasid translation. We could add support of updating PGSNP after pasid entry setup once there's a real need. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry
> From: Lu Baolu > Sent: Friday, April 22, 2022 9:04 PM > > On 2022/4/22 10:47, Tian, Kevin wrote: > >> From: Lu Baolu > >> Sent: Thursday, April 21, 2022 7:36 PM > >> > >> The latest VT-d specification states that the PGSNP field in the pasid > >> table entry should be treated as Reserved(0) for implementations not > >> supporting Snoop Control (SC=0 in the Extended Capability Register). > >> This adds a check before setting the field. > >> > >> Signed-off-by: Lu Baolu > >> --- > >> drivers/iommu/intel/pasid.c | 13 ++--- > >> 1 file changed, 10 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c > >> index f8d215d85695..5cb2daa2b8cb 100644 > >> --- a/drivers/iommu/intel/pasid.c > >> +++ b/drivers/iommu/intel/pasid.c > >> @@ -625,8 +625,14 @@ int intel_pasid_setup_first_level(struct > intel_iommu > >> *iommu, > >>} > >>} > >> > >> - if (flags & PASID_FLAG_PAGE_SNOOP) > >> - pasid_set_pgsnp(pte); > >> + if (flags & PASID_FLAG_PAGE_SNOOP) { > >> + if (ecap_sc_support(iommu->ecap)) { > >> + pasid_set_pgsnp(pte); > >> + } else { > >> + pasid_clear_entry(pte); > >> + return -EINVAL; > >> + } > >> + } > >> > >>pasid_set_domain_id(pte, did); > >>pasid_set_address_width(pte, iommu->agaw); > >> @@ -710,7 +716,8 @@ int intel_pasid_setup_second_level(struct > >> intel_iommu *iommu, > >>pasid_set_fault_enable(pte); > >>pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); > >> > >> - if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) > >> + if (ecap_sc_support(iommu->ecap) && > >> + domain->domain.type == IOMMU_DOMAIN_UNMANAGED) > >>pasid_set_pgsnp(pte); > >> > > This should be rebased on top of Jason's enforce coherency series > > instead of blindly setting it. No matter whether it's legacy mode > > where we set SNP in PTE or scalable mode where we set PGSNP > > in PASID entry for entire page table, the trigger point should be > > same i.e. when someone calls enforce_cache_coherency(). > > With Jason's enforce coherency series merged, we even don't need to set > PGSNP bit of a pasid entry for second level translation. 2nd level > always supports SNP in PTEs, so set PGSNP in pasid table entry is > unnecessary. > Yes, this sounds correct for 2nd-level. but setting PGSNP of 1st level translation is also relevant to that change when talking about enforcing coherency in the guest. In this case PASID_FLAG_PAGE_SNOOP should be set also after enforce_cache_coherency() is called. Currently it's always set for unmanaged domain in domain_setup_first_level(): if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) flags |= PASID_FLAG_PAGE_SNOOP; Suppose we need a separate interface to update PGSNP after pasid entry is set up. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Sat, 23 Apr 2022 at 19:13, zhangfei@foxmail.com wrote: > > Hi, Jean > > On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote: > > On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: > >>> I'm trying to piece together what happens from the kernel point of view. > >>> > >>> * master process with mm A opens a queue fd through uacce, which calls > >>> iommu_sva_bind_device(dev, A) -> PASID 1 > >>> > >>> * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. > >>> The device is still bound to mm A with PASID 1, since the queue fd is > >>> still open. > >>> We discussed this before, but I don't remember where we left off. The > >>> child can't use the queue because its mappings are not copied on fork(), > >>> and the queue is still bound to the parent mm A. The child either needs to > >>> open a new queue or take ownership of the old one with a new uacce ioctl. > >> Yes, currently nginx aligned with the case. > >> Child process (worker process) reopen uacce, > >> > >> Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID > >> 1 > >> Master process fork Child (daemon) and exit. > >> > >> Child (daemon) does not use PASID 1 any more, only fork and manage worker > >> process. > >> Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 > >> > >> So it is expected. > > Yes, that's fine > > > >>> Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of > >>> tree? This operation should unbind from A before binding to B, no? > >>> Otherwise we leak PASID 1. > >> In 5.16 PASID 1 from master is hold until nginx service stop. > >> nginx start > >> master: > >> iommu_sva_alloc_pasid mm->pasid=1 // master process > >> > >> lynx https start: > >> iommu_sva_alloc_pasid mm->pasid=2//worker process > >> > >> nginx stop: from fops_release > >> iommu_sva_free_pasid mm->pasid=2 // worker process > >> iommu_sva_free_pasid mm->pasid=1 // master process > > That's the expected behavior (master could close its fd before forking, in > > order to free things up earlier, but it's not mandatory) > Currently we unbind in fops_release, so the ioasid allocated in master > can only be freed when nginx stop, > when all forked fd are closed. > > > > >> Have one silly question. > >> > >> kerne driver > >> fops_open > >> iommu_sva_bind_device > >> > >> fops_release > >> iommu_sva_unbind_device > >> > >> application > >> main() > >> fd = open > >> return; > >> > >> Application exit but not close(fd), is it expected fops_release will be > >> called automatically by system? > > Yes, the application doesn't have to call close() explicitly, the file > > descriptor is closed automatically on exit. Note that the fd is copied on > > fork(), so it is only released once parent and all child processes exit. > Yes, in case the application ended unexpected, like ctrl+c > > > >> On 5.17 > >> fops_release is called automatically, as well as iommu_sva_unbind_device. > >> On 5.18-rc1. > >> fops_release is not called, have to manually call close(fd) > > Right that's weird > Looks it is caused by the fix patch, via mmget, which may add refcount > of fd. > > Some experiments > 1. 5.17, everything works well. > > 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID > allocation and free it on mm exit" > > Test application, exit without close uacce fd > First time: fops_release can be called automatically. > > log: > ioasid_alloc ioasid=1 > iommu_sva_alloc_pasid pasid=1 > iommu_sva_bind_device handle=263a2ee8 > ioasid_free ioasid=1 > uacce_fops_release q=55ca3cdf > iommu_sva_unbind_device handle=263a2ee8 > > Second time: hardware reports error > > uacce_fops_open q=8e4d6f78 > ioasid_alloc ioasid=1 > iommu_sva_alloc_pasid pasid=1 > iommu_sva_bind_device handle=cfd11788 > // Haredware reports error > hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found > hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] > found > hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found > hisi_sec2 :b6:00.0: Controller resetting... > hisi_sec2 :b6:00.0: QM mailbox operation timeout! > hisi_sec2 :b6:00.0: Failed to dump sqc! > hisi_sec2 :b6:00.0: Failed to drain out data for stopping! > hisi_sec2 :b6:00.0: Bus lock! Please reset system. > hisi_sec2 :b6:00.0: Controller reset failed (-110) > hisi_sec2 :b6:00.0: controller reset failed (-110) > > 3. Add the fix patch of using mmget in bind. > Test application, exit without close uacce fd > > fops_release can NOT be called automatically, looks mmget adds refcount > of fd. Test application, exit without closing fd. > >> kernel driver > >> fops_open > >> iommu_sva_bind_device > >> > >> fops_release > >> iommu_sva_unbind_device 1. 5.17 kernel, no mmget & mmput wd_release_queue no close Compress bz=512000 nb=1×10, speed=139.5 MB/s (±0.0% N=1) overall=122.9 MB/s (±0.0%) [ 16.052989] do_exi
Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown
Sigh. In theory drivers should never declare coherent memory like this, and there has been some work to fix remoteproc in that regard. But I guess until that is merged we'll need somthing like this fix. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
When dma_direct_alloc_pages encounters a highmem page it just gives up currently. But what we really should do is to try memory using the page allocator instead - without this platforms with a global highmem CMA pool will fail all dma_alloc_pages allocations. Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API") Reported-by: Mark O'Neill Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 9743c6ccce1a9..3e7f4aab740e4 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -115,7 +115,7 @@ static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size) } static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, - gfp_t gfp) + gfp_t gfp, bool allow_highmem) { int node = dev_to_node(dev); struct page *page = NULL; @@ -129,9 +129,12 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_limit); page = dma_alloc_contiguous(dev, size, gfp); - if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { - dma_free_contiguous(dev, page, size); - page = NULL; + if (page) { + if (!dma_coherent_ok(dev, page_to_phys(page), size) || + (!allow_highmem && PageHighMem(page))) { + dma_free_contiguous(dev, page, size); + page = NULL; + } } again: if (!page) @@ -189,7 +192,7 @@ static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size, { struct page *page; - page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO); + page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true); if (!page) return NULL; @@ -262,7 +265,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp); /* we always manually zero the memory once we are done */ - page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO); + page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true); if (!page) return NULL; @@ -370,19 +373,9 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size, if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp)) return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp); - page = __dma_direct_alloc_pages(dev, size, gfp); + page = __dma_direct_alloc_pages(dev, size, gfp, false); if (!page) return NULL; - if (PageHighMem(page)) { - /* -* Depending on the cma= arguments and per-arch setup -* dma_alloc_contiguous could return highmem pages. -* Without remapping there is no way to return them here, -* so log an error and fail. -*/ - dev_info(dev, "Rejecting highmem page from CMA.\n"); - goto out_free_pages; - } ret = page_address(page); if (dma_set_decrypted(dev, ret, size)) -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
swiotlb-xen uses very different ways to allocate coherent memory on x86 vs arm. On the former it allocates memory from the page allocator, while on the later it reuses the dma-direct allocator the handles the complexities of non-coherent DMA on arm platforms. Unfortunately the complexities of trying to deal with the two cases in the swiotlb-xen.c code lead to a bug in the handling of DMA_ATTR_NO_KERNEL_MAPPING on arm. With the DMA_ATTR_NO_KERNEL_MAPPING flag the coherent memory allocator does not actually allocate coherent memory, but just a DMA handle for some memory that is DMA addressable by the device, but which does not have to have a kernel mapping. Thus dereferencing the return value will lead to kernel crashed and memory corruption. Fix this by using the dma-direct allocator directly for arm, which works perfectly fine because on arm swiotlb-xen is only used when the domain is 1:1 mapped, and then simplifying the remaining code to only cater for the x86 case with DMA coherent device. Reported-by: Rahul Singh Signed-off-by: Christoph Hellwig --- arch/arm/include/asm/xen/page-coherent.h | 2 - arch/arm/xen/mm.c | 17 arch/arm64/include/asm/xen/page-coherent.h | 2 - arch/x86/include/asm/xen/page-coherent.h | 24 - arch/x86/include/asm/xen/swiotlb-xen.h | 5 + drivers/xen/swiotlb-xen.c | 106 - include/xen/arm/page-coherent.h| 20 include/xen/xen-ops.h | 7 -- 8 files changed, 45 insertions(+), 138 deletions(-) delete mode 100644 arch/arm/include/asm/xen/page-coherent.h delete mode 100644 arch/arm64/include/asm/xen/page-coherent.h delete mode 100644 arch/x86/include/asm/xen/page-coherent.h delete mode 100644 include/xen/arm/page-coherent.h diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h deleted file mode 100644 index 27e984977402b..0 --- a/arch/arm/include/asm/xen/page-coherent.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#include diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index a7e54a087b802..6e603e5fdebb1 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -118,23 +118,6 @@ bool xen_arch_need_swiotlb(struct device *dev, !dev_is_dma_coherent(dev)); } -int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, -unsigned int address_bits, -dma_addr_t *dma_handle) -{ - if (!xen_initial_domain()) - return -EINVAL; - - /* we assume that dom0 is mapped 1:1 for now */ - *dma_handle = pstart; - return 0; -} - -void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order) -{ - return; -} - static int __init xen_mm_init(void) { struct gnttab_cache_flush cflush; diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h deleted file mode 100644 index 27e984977402b..0 --- a/arch/arm64/include/asm/xen/page-coherent.h +++ /dev/null @@ -1,2 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#include diff --git a/arch/x86/include/asm/xen/page-coherent.h b/arch/x86/include/asm/xen/page-coherent.h deleted file mode 100644 index 63cd41b2e17ac..0 --- a/arch/x86/include/asm/xen/page-coherent.h +++ /dev/null @@ -1,24 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_X86_XEN_PAGE_COHERENT_H -#define _ASM_X86_XEN_PAGE_COHERENT_H - -#include -#include - -static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size, - dma_addr_t *dma_handle, gfp_t flags, - unsigned long attrs) -{ - void *vstart = (void*)__get_free_pages(flags, get_order(size)); - *dma_handle = virt_to_phys(vstart); - return vstart; -} - -static inline void xen_free_coherent_pages(struct device *hwdev, size_t size, - void *cpu_addr, dma_addr_t dma_handle, - unsigned long attrs) -{ - free_pages((unsigned long) cpu_addr, get_order(size)); -} - -#endif /* _ASM_X86_XEN_PAGE_COHERENT_H */ diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h b/arch/x86/include/asm/xen/swiotlb-xen.h index 66b4ddde77430..558821387808e 100644 --- a/arch/x86/include/asm/xen/swiotlb-xen.h +++ b/arch/x86/include/asm/xen/swiotlb-xen.h @@ -10,4 +10,9 @@ extern int pci_xen_swiotlb_init_late(void); static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; } #endif +int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, + unsigned int address_bits, + dma_addr_t *dma_handle); +void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order); + #endif /* _ASM_X86_SWIOTLB_XEN_H */ diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 47aebd98f52f5..557edb9c54879 100644 --- a/drivers/xen/swiotlb-xen.
Re: fully convert arm to use dma-direct
On Sat, Apr 23, 2022 at 11:27:13AM +0100, Marc Zyngier wrote: >> I think Marc Z has a Netwinder that he can test this on. Marc? >> I have one too, just not much in my office because of parental leave. > > I'm about to travel for a week. Can this wait until I'm back? > This is one of the few boxes that isn't hooked up to the PDU, > so I can't operate it remotely. No rush at all from my side except that I;d love to get it into 5.19. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions
Hi Shameer, I love your patch! Perhaps something to improve: [auto build test WARNING on joro-iommu/next] [also build test WARNING on rafael-pm/linux-next arm/for-next arm64/for-next/core soc/for-next linus/master v5.18-rc3 next-20220422] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20220423/202204232022.kmubee9l-...@intel.com/config) compiler: aarch64-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/5b73fd681a27e2ad450bac28f8a81f4b35fe4d68 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822 git checkout 5b73fd681a27e2ad450bac28f8a81f4b35fe4d68 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/acpi/arm64/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/acpi/arm64/iort.c:801:29: warning: no previous prototype for >> 'iort_rmr_alloc' [-Wmissing-prototypes] 801 | struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc *rmr_desc, | ^~ drivers/acpi/arm64/iort.c: In function 'iort_get_rmrs': drivers/acpi/arm64/iort.c:896:34: error: 'ACPI_IORT_RMR_REMAP_PERMITTED' undeclared (first use in this function) 896 | if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED) | ^ drivers/acpi/arm64/iort.c:896:34: note: each undeclared identifier is reported only once for each function it appears in drivers/acpi/arm64/iort.c:901:34: error: 'ACPI_IORT_RMR_ACCESS_PRIVILEGE' undeclared (first use in this function) 901 | if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE) | ^~ drivers/acpi/arm64/iort.c:905:21: error: implicit declaration of function 'ACPI_IORT_RMR_ACCESS_ATTRIBUTES'; did you mean 'ACPI_IORT_MF_ATTRIBUTES'? [-Werror=implicit-function-declaration] 905 | if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <= | ^~~ | ACPI_IORT_MF_ATTRIBUTES drivers/acpi/arm64/iort.c:906:33: error: 'ACPI_IORT_RMR_ATTR_DEVICE_GRE' undeclared (first use in this function) 906 | ACPI_IORT_RMR_ATTR_DEVICE_GRE) | ^ drivers/acpi/arm64/iort.c:909:33: error: 'ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB' undeclared (first use in this function) 909 | ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB) | ^ cc1: some warnings being treated as errors vim +/iort_rmr_alloc +801 drivers/acpi/arm64/iort.c 800 > 801 struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc > *rmr_desc, 802 int prot, enum iommu_resv_type type, 803 u32 *sids, u32 num_sids) 804 { 805 struct iommu_iort_rmr_data *rmr_data; 806 struct iommu_resv_region *region; 807 u32 *sids_copy; 808 u64 addr = rmr_desc->base_address, size = rmr_desc->length; 809 810 rmr_data = kmalloc(sizeof(*rmr_data), GFP_KERNEL); 811 if (!rmr_data) 812 return NULL; 813 814 /* Create a copy of SIDs array to associate with this rmr_data */ 815 sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL); 816 if (!sids_copy) { 817 kfree(rmr_data); 818 return NULL; 819 } 820 rmr_data->sids = sids_copy; 821 rmr_data->num_sids = num_sids; 822 823 if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) { 824 /* PAGE align base add
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote: On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: I'm trying to piece together what happens from the kernel point of view. * master process with mm A opens a queue fd through uacce, which calls iommu_sva_bind_device(dev, A) -> PASID 1 * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. The device is still bound to mm A with PASID 1, since the queue fd is still open. We discussed this before, but I don't remember where we left off. The child can't use the queue because its mappings are not copied on fork(), and the queue is still bound to the parent mm A. The child either needs to open a new queue or take ownership of the old one with a new uacce ioctl. Yes, currently nginx aligned with the case. Child process (worker process) reopen uacce, Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID 1 Master process fork Child (daemon) and exit. Child (daemon) does not use PASID 1 any more, only fork and manage worker process. Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 So it is expected. Yes, that's fine Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of tree? This operation should unbind from A before binding to B, no? Otherwise we leak PASID 1. In 5.16 PASID 1 from master is hold until nginx service stop. nginx start master: iommu_sva_alloc_pasid mm->pasid=1 // master process lynx https start: iommu_sva_alloc_pasid mm->pasid=2 //worker process nginx stop: from fops_release iommu_sva_free_pasid mm->pasid=2 // worker process iommu_sva_free_pasid mm->pasid=1 // master process That's the expected behavior (master could close its fd before forking, in order to free things up earlier, but it's not mandatory) Currently we unbind in fops_release, so the ioasid allocated in master can only be freed when nginx stop, when all forked fd are closed. Have one silly question. kerne driver fops_open iommu_sva_bind_device fops_release iommu_sva_unbind_device application main() fd = open return; Application exit but not close(fd), is it expected fops_release will be called automatically by system? Yes, the application doesn't have to call close() explicitly, the file descriptor is closed automatically on exit. Note that the fd is copied on fork(), so it is only released once parent and all child processes exit. Yes, in case the application ended unexpected, like ctrl+c On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Right that's weird Looks it is caused by the fix patch, via mmget, which may add refcount of fd. Some experiments 1. 5.17, everything works well. 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit" Test application, exit without close uacce fd First time: fops_release can be called automatically. log: ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=263a2ee8 ioasid_free ioasid=1 uacce_fops_release q=55ca3cdf iommu_sva_unbind_device handle=263a2ee8 Second time: hardware reports error uacce_fops_open q=8e4d6f78 ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=cfd11788 // Haredware reports error hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found hisi_sec2 :b6:00.0: Controller resetting... hisi_sec2 :b6:00.0: QM mailbox operation timeout! hisi_sec2 :b6:00.0: Failed to dump sqc! hisi_sec2 :b6:00.0: Failed to drain out data for stopping! hisi_sec2 :b6:00.0: Bus lock! Please reset system. hisi_sec2 :b6:00.0: Controller reset failed (-110) hisi_sec2 :b6:00.0: controller reset failed (-110) 3. Add the fix patch of using mmget in bind. Test application, exit without close uacce fd fops_release can NOT be called automatically, looks mmget adds refcount of fd. So the fix method of using mmget blocks fops_release to be called once fd is closed without unbind. Since nginx may have a issue, it does not call close(fd) when nginx -s quit. And you're sure that none of the processes are still alive or in zombie state? Just to cover every possibility. It can also reproduced by a simple application exit without close(uacce_fd) Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: fully convert arm to use dma-direct
On 2022-04-22 22:17, Linus Walleij wrote: On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig wrote: arm is the last platform not using the dma-direct code for directly mapped DMA. With the dmaboune removal from Arnd we can easily switch arm to always use dma-direct now (it already does for LPAE configs and nommu). I'd love to merge this series through the dma-mapping tree as it gives us the opportunity for additional core dma-mapping improvements. (...) b/arch/arm/mach-footbridge/Kconfig |1 b/arch/arm/mach-footbridge/common.c | 19 b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8 b/arch/arm/mach-footbridge/include/mach/memory.h |4 I think Marc Z has a Netwinder that he can test this on. Marc? I have one too, just not much in my office because of parental leave. I'm about to travel for a week. Can this wait until I'm back? This is one of the few boxes that isn't hooked up to the PDU, so I can't operate it remotely. M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions
Hi Shameer, I love your patch! Perhaps something to improve: [auto build test WARNING on joro-iommu/next] [also build test WARNING on rafael-pm/linux-next arm/for-next arm64/for-next/core soc/for-next linus/master v5.18-rc3 next-20220422] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next config: arm64-randconfig-r012-20220422 (https://download.01.org/0day-ci/archive/20220423/202204231737.0jkkpxzk-...@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm64 cross compiling tool for clang build # apt-get install binutils-aarch64-linux-gnu # https://github.com/intel-lab-lkp/linux/commit/5b73fd681a27e2ad450bac28f8a81f4b35fe4d68 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822 git checkout 5b73fd681a27e2ad450bac28f8a81f4b35fe4d68 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/acpi/arm64/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/acpi/arm64/iort.c:801:29: warning: no previous prototype for >> function 'iort_rmr_alloc' [-Wmissing-prototypes] struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc *rmr_desc, ^ drivers/acpi/arm64/iort.c:801:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc *rmr_desc, ^ static drivers/acpi/arm64/iort.c:896:20: error: use of undeclared identifier 'ACPI_IORT_RMR_REMAP_PERMITTED' if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED) ^ drivers/acpi/arm64/iort.c:901:20: error: use of undeclared identifier 'ACPI_IORT_RMR_ACCESS_PRIVILEGE' if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE) ^ drivers/acpi/arm64/iort.c:905:7: error: call to undeclared function 'ACPI_IORT_RMR_ACCESS_ATTRIBUTES'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <= ^ drivers/acpi/arm64/iort.c:906:5: error: use of undeclared identifier 'ACPI_IORT_RMR_ATTR_DEVICE_GRE' ACPI_IORT_RMR_ATTR_DEVICE_GRE) ^ drivers/acpi/arm64/iort.c:909:5: error: use of undeclared identifier 'ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB' ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB) ^ 1 warning and 5 errors generated. vim +/iort_rmr_alloc +801 drivers/acpi/arm64/iort.c 800 > 801 struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc > *rmr_desc, 802 int prot, enum iommu_resv_type type, 803 u32 *sids, u32 num_sids) 804 { 805 struct iommu_iort_rmr_data *rmr_data; 806 struct iommu_resv_region *region; 807 u32 *sids_copy; 808 u64 addr = rmr_desc->base_address, size = rmr_desc->length; 809 810 rmr_data = kmalloc(sizeof(*rmr_data), GFP_KERNEL); 811 if (!rmr_data) 812 return NULL; 813 814 /* Create a copy of SIDs array to associate with this rmr_data */ 815 sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL); 816 if (!sids_copy) { 817 kfree(rmr_data); 818 return NULL; 819 } 820 rmr_data->sids = sids_copy; 821 rmr_data->num_sids = num_sids; 822 823 if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) { 824 /* PAGE align base addr and size */ 825 addr &= PAGE_MASK; 826 size = PAGE_ALIGN(
Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
On 2022/4/23 17:00, Lu Baolu wrote: On 2022/4/23 16:51, Lu Baolu wrote: On 2022/4/23 16:37, Robin Murphy wrote: On 2022-04-23 09:01, Lu Baolu wrote: Hi Robin, On 2022/4/19 15:20, Robin Murphy wrote: On 2022-04-19 00:37, Lu Baolu wrote: On 2022/4/19 6:09, Robin Murphy wrote: On 2022-04-16 01:04, Lu Baolu wrote: On 2022/4/14 20:42, Robin Murphy wrote: @@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; Do we still need to keep above lines in bus_set_iommu()? It preserves the existing behaviour until each callsite and its associated error handling are removed later on, which seems like as good a thing to do as any. Since I'm already relaxing iommu_device_register() to a warn-but-continue behaviour while it keeps the bus ops on life-support internally, I figured not changing too much at once would make it easier to bisect any potential issues arising from this first step. Fair enough. Thank you for the explanation. Do you have a public tree that I could pull these patches and try them on an Intel hardware? Or perhaps you have done this? I like the whole idea of this series, but it's better to try it with a real hardware. I haven't bothered with separate branches since there's so many different pieces in-flight, but my complete (unstable) development branch can be found here: https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus For now I'd recommend winding the head back to "iommu: Clean up bus_set_iommu()" for testing - some of the patches above that have already been posted and picked up by their respective subsystems, but others are incomplete and barely compile-tested. I'll probably rearrange it later this week to better reflect what's happened so far. I wound the head back to "iommu: Clean up bus_set_iommu" and tested it on an Intel machine. It got stuck during boot. This test was on a remote machine and I have no means to access it physically. So I can't get any kernel debugging messages. (I have to work from home these days. :-() I guess it's due to the fact that intel_iommu_probe_device() callback only works for the pci devices. The issue occurs when probing a device other than a PCI one. Yeah, I was wondering if that would be significant, since it's the only driver that never registered itself for platform_bus_type so won't have actually seen those calls before. I'm inclined to bodge that as below for now, as long as it then works OK in terms of the rest of the changes. Thanks, Robin. ->8- diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9fa1b98186a3..6e359f92ec00 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4565,6 +4565,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) unsigned long flags; u8 bus, devfn; + /* ANDD platform device support needs fixing */ + if (!pdev) + return ERR_PTR(-ENODEV); + iommu = device_to_iommu(dev, &bus, &devfn); if (!iommu) return ERR_PTR(-ENODEV); I haven't seen any real ANDD platform devices, hence this works for me. Or more precisely, return NULL when @dev goes through device_to_iommu()? diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index df5c62ecf942..0d447739e076 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -797,8 +797,11 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) pf_pdev = pci_physfn(pdev); dev = &pf_pdev->dev; segment = pci_domain_nr(pdev->bus); - } else if (has_acpi_companion(dev)) + } else if (has_acpi_companion(dev)) { dev = &ACPI_COMPANION(dev)->dev; + } else { + return NULL; + } rcu_read_lock(); for_each_iommu(iommu, drhd) { Robin, please ignore this. "has_acpi_companion(dev)" isn't equal to an ANDD device. Please use yours. Sorry for the noise. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
On 2022/4/23 16:51, Lu Baolu wrote: On 2022/4/23 16:37, Robin Murphy wrote: On 2022-04-23 09:01, Lu Baolu wrote: Hi Robin, On 2022/4/19 15:20, Robin Murphy wrote: On 2022-04-19 00:37, Lu Baolu wrote: On 2022/4/19 6:09, Robin Murphy wrote: On 2022-04-16 01:04, Lu Baolu wrote: On 2022/4/14 20:42, Robin Murphy wrote: @@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; Do we still need to keep above lines in bus_set_iommu()? It preserves the existing behaviour until each callsite and its associated error handling are removed later on, which seems like as good a thing to do as any. Since I'm already relaxing iommu_device_register() to a warn-but-continue behaviour while it keeps the bus ops on life-support internally, I figured not changing too much at once would make it easier to bisect any potential issues arising from this first step. Fair enough. Thank you for the explanation. Do you have a public tree that I could pull these patches and try them on an Intel hardware? Or perhaps you have done this? I like the whole idea of this series, but it's better to try it with a real hardware. I haven't bothered with separate branches since there's so many different pieces in-flight, but my complete (unstable) development branch can be found here: https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus For now I'd recommend winding the head back to "iommu: Clean up bus_set_iommu()" for testing - some of the patches above that have already been posted and picked up by their respective subsystems, but others are incomplete and barely compile-tested. I'll probably rearrange it later this week to better reflect what's happened so far. I wound the head back to "iommu: Clean up bus_set_iommu" and tested it on an Intel machine. It got stuck during boot. This test was on a remote machine and I have no means to access it physically. So I can't get any kernel debugging messages. (I have to work from home these days. :-() I guess it's due to the fact that intel_iommu_probe_device() callback only works for the pci devices. The issue occurs when probing a device other than a PCI one. Yeah, I was wondering if that would be significant, since it's the only driver that never registered itself for platform_bus_type so won't have actually seen those calls before. I'm inclined to bodge that as below for now, as long as it then works OK in terms of the rest of the changes. Thanks, Robin. ->8- diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9fa1b98186a3..6e359f92ec00 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4565,6 +4565,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) unsigned long flags; u8 bus, devfn; + /* ANDD platform device support needs fixing */ + if (!pdev) + return ERR_PTR(-ENODEV); + iommu = device_to_iommu(dev, &bus, &devfn); if (!iommu) return ERR_PTR(-ENODEV); I haven't seen any real ANDD platform devices, hence this works for me. Or more precisely, return NULL when @dev goes through device_to_iommu()? diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index df5c62ecf942..0d447739e076 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -797,8 +797,11 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) pf_pdev = pci_physfn(pdev); dev = &pf_pdev->dev; segment = pci_domain_nr(pdev->bus); - } else if (has_acpi_companion(dev)) + } else if (has_acpi_companion(dev)) { dev = &ACPI_COMPANION(dev)->dev; + } else { + return NULL; + } rcu_read_lock(); for_each_iommu(iommu, drhd) { Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
On 2022/4/23 16:37, Robin Murphy wrote: On 2022-04-23 09:01, Lu Baolu wrote: Hi Robin, On 2022/4/19 15:20, Robin Murphy wrote: On 2022-04-19 00:37, Lu Baolu wrote: On 2022/4/19 6:09, Robin Murphy wrote: On 2022-04-16 01:04, Lu Baolu wrote: On 2022/4/14 20:42, Robin Murphy wrote: @@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; Do we still need to keep above lines in bus_set_iommu()? It preserves the existing behaviour until each callsite and its associated error handling are removed later on, which seems like as good a thing to do as any. Since I'm already relaxing iommu_device_register() to a warn-but-continue behaviour while it keeps the bus ops on life-support internally, I figured not changing too much at once would make it easier to bisect any potential issues arising from this first step. Fair enough. Thank you for the explanation. Do you have a public tree that I could pull these patches and try them on an Intel hardware? Or perhaps you have done this? I like the whole idea of this series, but it's better to try it with a real hardware. I haven't bothered with separate branches since there's so many different pieces in-flight, but my complete (unstable) development branch can be found here: https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus For now I'd recommend winding the head back to "iommu: Clean up bus_set_iommu()" for testing - some of the patches above that have already been posted and picked up by their respective subsystems, but others are incomplete and barely compile-tested. I'll probably rearrange it later this week to better reflect what's happened so far. I wound the head back to "iommu: Clean up bus_set_iommu" and tested it on an Intel machine. It got stuck during boot. This test was on a remote machine and I have no means to access it physically. So I can't get any kernel debugging messages. (I have to work from home these days. :-() I guess it's due to the fact that intel_iommu_probe_device() callback only works for the pci devices. The issue occurs when probing a device other than a PCI one. Yeah, I was wondering if that would be significant, since it's the only driver that never registered itself for platform_bus_type so won't have actually seen those calls before. I'm inclined to bodge that as below for now, as long as it then works OK in terms of the rest of the changes. Thanks, Robin. ->8- diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9fa1b98186a3..6e359f92ec00 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4565,6 +4565,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) unsigned long flags; u8 bus, devfn; + /* ANDD platform device support needs fixing */ + if (!pdev) + return ERR_PTR(-ENODEV); + iommu = device_to_iommu(dev, &bus, &devfn); if (!iommu) return ERR_PTR(-ENODEV); I haven't seen any real ANDD platform devices, hence this works for me. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
On 2022-04-23 09:01, Lu Baolu wrote: Hi Robin, On 2022/4/19 15:20, Robin Murphy wrote: On 2022-04-19 00:37, Lu Baolu wrote: On 2022/4/19 6:09, Robin Murphy wrote: On 2022-04-16 01:04, Lu Baolu wrote: On 2022/4/14 20:42, Robin Murphy wrote: @@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; Do we still need to keep above lines in bus_set_iommu()? It preserves the existing behaviour until each callsite and its associated error handling are removed later on, which seems like as good a thing to do as any. Since I'm already relaxing iommu_device_register() to a warn-but-continue behaviour while it keeps the bus ops on life-support internally, I figured not changing too much at once would make it easier to bisect any potential issues arising from this first step. Fair enough. Thank you for the explanation. Do you have a public tree that I could pull these patches and try them on an Intel hardware? Or perhaps you have done this? I like the whole idea of this series, but it's better to try it with a real hardware. I haven't bothered with separate branches since there's so many different pieces in-flight, but my complete (unstable) development branch can be found here: https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus For now I'd recommend winding the head back to "iommu: Clean up bus_set_iommu()" for testing - some of the patches above that have already been posted and picked up by their respective subsystems, but others are incomplete and barely compile-tested. I'll probably rearrange it later this week to better reflect what's happened so far. I wound the head back to "iommu: Clean up bus_set_iommu" and tested it on an Intel machine. It got stuck during boot. This test was on a remote machine and I have no means to access it physically. So I can't get any kernel debugging messages. (I have to work from home these days. :-() I guess it's due to the fact that intel_iommu_probe_device() callback only works for the pci devices. The issue occurs when probing a device other than a PCI one. Yeah, I was wondering if that would be significant, since it's the only driver that never registered itself for platform_bus_type so won't have actually seen those calls before. I'm inclined to bodge that as below for now, as long as it then works OK in terms of the rest of the changes. Thanks, Robin. ->8- diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 9fa1b98186a3..6e359f92ec00 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4565,6 +4565,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) unsigned long flags; u8 bus, devfn; + /* ANDD platform device support needs fixing */ + if (!pdev) + return ERR_PTR(-ENODEV); + iommu = device_to_iommu(dev, &bus, &devfn); if (!iommu) return ERR_PTR(-ENODEV); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/vt-d: Drop stop marker messages
The page fault handling framework in the IOMMU core explicitly states that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must discard them before reporting faults. This handles Stop Marker messages in prq_event_thread() before reporting events to the core. The VT-d driver explicitly drains the pending page requests when a CPU page table (represented by a mm struct) is unbound from a PASID according to the procedures defined in the VT-d spec. The Stop Marker messages do not need a response. Hence, it is safe to drop the Stop Marker messages silently if any of them is found in the page request queue. Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework") Signed-off-by: Lu Baolu Reviewed-by: Jacob Pan Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu...@linux.intel.com --- drivers/iommu/intel/svm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 23a38763c1d1..7ee37d996e15 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -757,6 +757,10 @@ static irqreturn_t prq_event_thread(int irq, void *d) goto bad_req; } + /* Drop Stop Marker message. No need for a response. */ + if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) + goto prq_advance; + if (!svm || svm->pasid != req->pasid) { /* * It can't go away, because the driver is not permitted -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/1] iommu/vt-d: Fixes for v5.18-rc4
Hi Joerg, One fix is queued for v5.18. It aims to fix: - Handle PCI stop marker messages in IOMMU driver to meet the requirement of I/O page fault handling framework. Please consider it for the iommu/fix branch. Best regards, Lu Baolu Lu Baolu (1): iommu/vt-d: Drop stop marker messages drivers/iommu/intel/svm.c | 4 1 file changed, 4 insertions(+) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration
Hi Robin, On 2022/4/19 15:20, Robin Murphy wrote: On 2022-04-19 00:37, Lu Baolu wrote: On 2022/4/19 6:09, Robin Murphy wrote: On 2022-04-16 01:04, Lu Baolu wrote: On 2022/4/14 20:42, Robin Murphy wrote: @@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; Do we still need to keep above lines in bus_set_iommu()? It preserves the existing behaviour until each callsite and its associated error handling are removed later on, which seems like as good a thing to do as any. Since I'm already relaxing iommu_device_register() to a warn-but-continue behaviour while it keeps the bus ops on life-support internally, I figured not changing too much at once would make it easier to bisect any potential issues arising from this first step. Fair enough. Thank you for the explanation. Do you have a public tree that I could pull these patches and try them on an Intel hardware? Or perhaps you have done this? I like the whole idea of this series, but it's better to try it with a real hardware. I haven't bothered with separate branches since there's so many different pieces in-flight, but my complete (unstable) development branch can be found here: https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus For now I'd recommend winding the head back to "iommu: Clean up bus_set_iommu()" for testing - some of the patches above that have already been posted and picked up by their respective subsystems, but others are incomplete and barely compile-tested. I'll probably rearrange it later this week to better reflect what's happened so far. I wound the head back to "iommu: Clean up bus_set_iommu" and tested it on an Intel machine. It got stuck during boot. This test was on a remote machine and I have no means to access it physically. So I can't get any kernel debugging messages. (I have to work from home these days. :-() I guess it's due to the fact that intel_iommu_probe_device() callback only works for the pci devices. The issue occurs when probing a device other than a PCI one. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/4] iommu/vt-d: Drop stop marker messages
On 2022/4/22 11:05, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, April 21, 2022 7:36 PM The page fault handling framework in the IOMMU core explicitly states that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must discard them before reporting faults. This handles Stop Marker messages in prq_event_thread() before reporting events to the core. The VT-d driver explicitly drains the pending page requests when a CPU page table (represented by a mm struct) is unbound from a PASID according to the procedures defined in the VT-d spec. The Stop Marker messages do not need a response. Hence, it is safe to drop the Stop Marker messages silently if any of them is found in the page request queue. Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework") Signed-off-by: Lu Baolu Reviewed-by: Jacob Pan Reviewed-by: Kevin Tian Thank you, Kevin. I will queue this patch to Joerg as a fix for v5.18. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu