Re: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-23 Thread Lu Baolu

On 2022/4/24 13:55, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Sunday, April 24, 2022 12:38 PM

On 2022/4/24 11:37, Tian, Kevin wrote:

This should be rebased on top of Jason's enforce coherency series
instead of blindly setting it. No matter whether it's legacy mode
where we set SNP in PTE or scalable mode where we set PGSNP
in PASID entry for entire page table, the trigger point should be
same i.e. when someone calls enforce_cache_coherency().

With Jason's enforce coherency series merged, we even don't need to set
PGSNP bit of a pasid entry for second level translation. 2nd level
always supports SNP in PTEs, so set PGSNP in pasid table entry is
unnecessary.


Yes, this sounds correct for 2nd-level.

but setting PGSNP of 1st level translation is also relevant to that
change when talking about enforcing coherency in the guest. In
this case PASID_FLAG_PAGE_SNOOP should be set also after
enforce_cache_coherency() is called.


Yes. Agreed.


Currently it's always set for unmanaged domain in
domain_setup_first_level():

if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
flags |= PASID_FLAG_PAGE_SNOOP;

Suppose we need a separate interface to update PGSNP after pasid
entry is set up.


Currently enforcing coherency is only used in VFIO. In the VFIO use
case, it's safe to always set PGSNP when an UNMANAGED domain is attached
on the first level pasid translation. We could add support of updating
PGSNP after pasid entry setup once there's a real need.



The real need is here. The iommu driver should not assume the
policy of VFIO, which is already communicated via the new
enforce_cache_coherency() interface. The same policy should
apply no matter whether 1st or 2nd level is in-use.


Okay, I think I will move this patch out of this series and put it in a
separated one for VT-d improvements after Jason's enforcing snoop series
gets merged. Thanks for your suggestions.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-23 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Sunday, April 24, 2022 12:38 PM
> 
> On 2022/4/24 11:37, Tian, Kevin wrote:
> >>> This should be rebased on top of Jason's enforce coherency series
> >>> instead of blindly setting it. No matter whether it's legacy mode
> >>> where we set SNP in PTE or scalable mode where we set PGSNP
> >>> in PASID entry for entire page table, the trigger point should be
> >>> same i.e. when someone calls enforce_cache_coherency().
> >> With Jason's enforce coherency series merged, we even don't need to set
> >> PGSNP bit of a pasid entry for second level translation. 2nd level
> >> always supports SNP in PTEs, so set PGSNP in pasid table entry is
> >> unnecessary.
> >>
> > Yes, this sounds correct for 2nd-level.
> >
> > but setting PGSNP of 1st level translation is also relevant to that
> > change when talking about enforcing coherency in the guest. In
> > this case PASID_FLAG_PAGE_SNOOP should be set also after
> > enforce_cache_coherency() is called.
> 
> Yes. Agreed.
> 
> > Currently it's always set for unmanaged domain in
> > domain_setup_first_level():
> >
> > if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
> > flags |= PASID_FLAG_PAGE_SNOOP;
> >
> > Suppose we need a separate interface to update PGSNP after pasid
> > entry is set up.
> 
> Currently enforcing coherency is only used in VFIO. In the VFIO use
> case, it's safe to always set PGSNP when an UNMANAGED domain is attached
> on the first level pasid translation. We could add support of updating
> PGSNP after pasid entry setup once there's a real need.
> 

The real need is here. The iommu driver should not assume the
policy of VFIO, which is already communicated via the new
enforce_cache_coherency() interface. The same policy should
apply no matter whether 1st or 2nd level is in-use.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-23 Thread Lu Baolu

On 2022/4/24 11:37, Tian, Kevin wrote:

This should be rebased on top of Jason's enforce coherency series
instead of blindly setting it. No matter whether it's legacy mode
where we set SNP in PTE or scalable mode where we set PGSNP
in PASID entry for entire page table, the trigger point should be
same i.e. when someone calls enforce_cache_coherency().

With Jason's enforce coherency series merged, we even don't need to set
PGSNP bit of a pasid entry for second level translation. 2nd level
always supports SNP in PTEs, so set PGSNP in pasid table entry is
unnecessary.


Yes, this sounds correct for 2nd-level.

but setting PGSNP of 1st level translation is also relevant to that
change when talking about enforcing coherency in the guest. In
this case PASID_FLAG_PAGE_SNOOP should be set also after
enforce_cache_coherency() is called.


Yes. Agreed.


Currently it's always set for unmanaged domain in
domain_setup_first_level():

if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
flags |= PASID_FLAG_PAGE_SNOOP;

Suppose we need a separate interface to update PGSNP after pasid
entry is set up.


Currently enforcing coherency is only used in VFIO. In the VFIO use
case, it's safe to always set PGSNP when an UNMANAGED domain is attached
on the first level pasid translation. We could add support of updating
PGSNP after pasid entry setup once there's a real need.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-23 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, April 22, 2022 9:04 PM
> 
> On 2022/4/22 10:47, Tian, Kevin wrote:
> >> From: Lu Baolu
> >> Sent: Thursday, April 21, 2022 7:36 PM
> >>
> >> The latest VT-d specification states that the PGSNP field in the pasid
> >> table entry should be treated as Reserved(0) for implementations not
> >> supporting Snoop Control (SC=0 in the Extended Capability Register).
> >> This adds a check before setting the field.
> >>
> >> Signed-off-by: Lu Baolu
> >> ---
> >>   drivers/iommu/intel/pasid.c | 13 ++---
> >>   1 file changed, 10 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> >> index f8d215d85695..5cb2daa2b8cb 100644
> >> --- a/drivers/iommu/intel/pasid.c
> >> +++ b/drivers/iommu/intel/pasid.c
> >> @@ -625,8 +625,14 @@ int intel_pasid_setup_first_level(struct
> intel_iommu
> >> *iommu,
> >>}
> >>}
> >>
> >> -  if (flags & PASID_FLAG_PAGE_SNOOP)
> >> -  pasid_set_pgsnp(pte);
> >> +  if (flags & PASID_FLAG_PAGE_SNOOP) {
> >> +  if (ecap_sc_support(iommu->ecap)) {
> >> +  pasid_set_pgsnp(pte);
> >> +  } else {
> >> +  pasid_clear_entry(pte);
> >> +  return -EINVAL;
> >> +  }
> >> +  }
> >>
> >>pasid_set_domain_id(pte, did);
> >>pasid_set_address_width(pte, iommu->agaw);
> >> @@ -710,7 +716,8 @@ int intel_pasid_setup_second_level(struct
> >> intel_iommu *iommu,
> >>pasid_set_fault_enable(pte);
> >>pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> >>
> >> -  if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
> >> +  if (ecap_sc_support(iommu->ecap) &&
> >> +  domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
> >>pasid_set_pgsnp(pte);
> >>
> > This should be rebased on top of Jason's enforce coherency series
> > instead of blindly setting it. No matter whether it's legacy mode
> > where we set SNP in PTE or scalable mode where we set PGSNP
> > in PASID entry for entire page table, the trigger point should be
> > same i.e. when someone calls enforce_cache_coherency().
> 
> With Jason's enforce coherency series merged, we even don't need to set
> PGSNP bit of a pasid entry for second level translation. 2nd level
> always supports SNP in PTEs, so set PGSNP in pasid table entry is
> unnecessary.
> 

Yes, this sounds correct for 2nd-level.

but setting PGSNP of 1st level translation is also relevant to that
change when talking about enforcing coherency in the guest. In
this case PASID_FLAG_PAGE_SNOOP should be set also after
enforce_cache_coherency() is called.

Currently it's always set for unmanaged domain in
domain_setup_first_level():

if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
flags |= PASID_FLAG_PAGE_SNOOP;

Suppose we need a separate interface to update PGSNP after pasid
entry is set up.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-23 Thread Zhangfei Gao
On Sat, 23 Apr 2022 at 19:13, zhangfei@foxmail.com
 wrote:
>
> Hi, Jean
>
> On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote:
> > On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote:
> >>> I'm trying to piece together what happens from the kernel point of view.
> >>>
> >>> * master process with mm A opens a queue fd through uacce, which calls
> >>> iommu_sva_bind_device(dev, A) -> PASID 1
> >>>
> >>> * master forks and exits. Child (daemon) gets mm B, inherits the queue fd.
> >>> The device is still bound to mm A with PASID 1, since the queue fd is
> >>> still open.
> >>> We discussed this before, but I don't remember where we left off. The
> >>> child can't use the queue because its mappings are not copied on fork(),
> >>> and the queue is still bound to the parent mm A. The child either needs to
> >>> open a new queue or take ownership of the old one with a new uacce ioctl.
> >> Yes, currently nginx aligned with the case.
> >> Child process (worker process) reopen uacce,
> >>
> >> Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID
> >> 1
> >> Master process fork Child (daemon) and exit.
> >>
> >> Child (daemon)  does not use PASID 1 any more, only fork and manage worker
> >> process.
> >> Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2
> >>
> >> So it is expected.
> > Yes, that's fine
> >
> >>> Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of
> >>> tree?  This operation should unbind from A before binding to B, no?
> >>> Otherwise we leak PASID 1.
> >> In 5.16 PASID 1 from master is hold until nginx service stop.
> >> nginx start
> >> master:
> >> iommu_sva_alloc_pasid mm->pasid=1  // master process
> >>
> >> lynx https start:
> >> iommu_sva_alloc_pasid mm->pasid=2//worker process
> >>
> >> nginx stop:  from fops_release
> >> iommu_sva_free_pasid mm->pasid=2   // worker process
> >> iommu_sva_free_pasid mm->pasid=1  // master process
> > That's the expected behavior (master could close its fd before forking, in
> > order to free things up earlier, but it's not mandatory)
> Currently we unbind in fops_release, so the ioasid allocated in master
> can only be freed when nginx stop,
> when all forked fd are closed.
>
> >
> >> Have one silly question.
> >>
> >> kerne driver
> >> fops_open
> >> iommu_sva_bind_device
> >>
> >> fops_release
> >> iommu_sva_unbind_device
> >>
> >> application
> >> main()
> >> fd = open
> >> return;
> >>
> >> Application exit but not close(fd), is it expected fops_release will be
> >> called automatically by system?
> > Yes, the application doesn't have to call close() explicitly, the file
> > descriptor is closed automatically on exit. Note that the fd is copied on
> > fork(), so it is only released once parent and all child processes exit.
> Yes, in case the application ended unexpected, like ctrl+c
> >
> >> On 5.17
> >> fops_release is called automatically, as well as iommu_sva_unbind_device.
> >> On 5.18-rc1.
> >> fops_release is not called, have to manually call close(fd)
> > Right that's weird
> Looks it is caused by the fix patch, via mmget, which may add refcount
> of fd.
>
> Some experiments
> 1. 5.17, everything works well.
>
> 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID
> allocation and free it on mm exit"
>
> Test application, exit without close uacce fd
> First time:  fops_release can be called automatically.
>
> log:
> ioasid_alloc ioasid=1
> iommu_sva_alloc_pasid pasid=1
> iommu_sva_bind_device handle=263a2ee8
> ioasid_free ioasid=1
> uacce_fops_release q=55ca3cdf
> iommu_sva_unbind_device handle=263a2ee8
>
> Second time: hardware reports error
>
> uacce_fops_open q=8e4d6f78
> ioasid_alloc ioasid=1
> iommu_sva_alloc_pasid pasid=1
> iommu_sva_bind_device handle=cfd11788
> // Haredware reports error
> hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found
> hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40]
> found
> hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found
> hisi_sec2 :b6:00.0: Controller resetting...
> hisi_sec2 :b6:00.0: QM mailbox operation timeout!
> hisi_sec2 :b6:00.0: Failed to dump sqc!
> hisi_sec2 :b6:00.0: Failed to drain out data for stopping!
> hisi_sec2 :b6:00.0: Bus lock! Please reset system.
> hisi_sec2 :b6:00.0: Controller reset failed (-110)
> hisi_sec2 :b6:00.0: controller reset failed (-110)
>
> 3. Add the fix patch of using mmget in bind.
> Test application, exit without close uacce fd
>
> fops_release can NOT be called automatically, looks mmget adds refcount
> of fd.

Test application, exit without closing fd.
> >> kernel driver
> >> fops_open
> >> iommu_sva_bind_device
> >>
> >> fops_release
> >> iommu_sva_unbind_device

1.
5.17 kernel, no mmget & mmput

wd_release_queue no close
Compress bz=512000 nb=1×10, speed=139.5 MB/s (±0.0% N=1) overall=122.9
MB/s (±0.0%)
[   16.052989] do_exi

Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown

2022-04-23 Thread Christoph Hellwig
Sigh.  In theory drivers should never declare coherent memory like
this, and there has been some work to fix remoteproc in that regard.

But I guess until that is merged we'll need somthing like this fix.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages

2022-04-23 Thread Christoph Hellwig
When dma_direct_alloc_pages encounters a highmem page it just gives up
currently.  But what we really should do is to try memory using the
page allocator instead - without this platforms with a global highmem
CMA pool will fail all dma_alloc_pages allocations.

Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API")
Reported-by: Mark O'Neill 
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 9743c6ccce1a9..3e7f4aab740e4 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -115,7 +115,7 @@ static struct page *dma_direct_alloc_swiotlb(struct device 
*dev, size_t size)
 }
 
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
-   gfp_t gfp)
+   gfp_t gfp, bool allow_highmem)
 {
int node = dev_to_node(dev);
struct page *page = NULL;
@@ -129,9 +129,12 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   &phys_limit);
page = dma_alloc_contiguous(dev, size, gfp);
-   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
-   dma_free_contiguous(dev, page, size);
-   page = NULL;
+   if (page) {
+   if (!dma_coherent_ok(dev, page_to_phys(page), size) ||
+   (!allow_highmem && PageHighMem(page))) {
+   dma_free_contiguous(dev, page, size);
+   page = NULL;
+   }
}
 again:
if (!page)
@@ -189,7 +192,7 @@ static void *dma_direct_alloc_no_mapping(struct device 
*dev, size_t size,
 {
struct page *page;
 
-   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
+   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
if (!page)
return NULL;
 
@@ -262,7 +265,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
-   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
+   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO, true);
if (!page)
return NULL;
 
@@ -370,19 +373,9 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
-   page = __dma_direct_alloc_pages(dev, size, gfp);
+   page = __dma_direct_alloc_pages(dev, size, gfp, false);
if (!page)
return NULL;
-   if (PageHighMem(page)) {
-   /*
-* Depending on the cma= arguments and per-arch setup
-* dma_alloc_contiguous could return highmem pages.
-* Without remapping there is no way to return them here,
-* so log an error and fail.
-*/
-   dev_info(dev, "Rejecting highmem page from CMA.\n");
-   goto out_free_pages;
-   }
 
ret = page_address(page);
if (dma_set_decrypted(dev, ret, size))
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm

2022-04-23 Thread Christoph Hellwig
swiotlb-xen uses very different ways to allocate coherent memory on x86
vs arm.  On the former it allocates memory from the page allocator, while
on the later it reuses the dma-direct allocator the handles the
complexities of non-coherent DMA on arm platforms.

Unfortunately the complexities of trying to deal with the two cases in
the swiotlb-xen.c code lead to a bug in the handling of
DMA_ATTR_NO_KERNEL_MAPPING on arm.  With the DMA_ATTR_NO_KERNEL_MAPPING
flag the coherent memory allocator does not actually allocate coherent
memory, but just a DMA handle for some memory that is DMA addressable
by the device, but which does not have to have a kernel mapping.  Thus
dereferencing the return value will lead to kernel crashed and memory
corruption.

Fix this by using the dma-direct allocator directly for arm, which works
perfectly fine because on arm swiotlb-xen is only used when the domain is
1:1 mapped, and then simplifying the remaining code to only cater for the
x86 case with DMA coherent device.

Reported-by: Rahul Singh 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/xen/page-coherent.h   |   2 -
 arch/arm/xen/mm.c  |  17 
 arch/arm64/include/asm/xen/page-coherent.h |   2 -
 arch/x86/include/asm/xen/page-coherent.h   |  24 -
 arch/x86/include/asm/xen/swiotlb-xen.h |   5 +
 drivers/xen/swiotlb-xen.c  | 106 -
 include/xen/arm/page-coherent.h|  20 
 include/xen/xen-ops.h  |   7 --
 8 files changed, 45 insertions(+), 138 deletions(-)
 delete mode 100644 arch/arm/include/asm/xen/page-coherent.h
 delete mode 100644 arch/arm64/include/asm/xen/page-coherent.h
 delete mode 100644 arch/x86/include/asm/xen/page-coherent.h
 delete mode 100644 include/xen/arm/page-coherent.h

diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
deleted file mode 100644
index 27e984977402b..0
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,2 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include 
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index a7e54a087b802..6e603e5fdebb1 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -118,23 +118,6 @@ bool xen_arch_need_swiotlb(struct device *dev,
!dev_is_dma_coherent(dev));
 }
 
-int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
-unsigned int address_bits,
-dma_addr_t *dma_handle)
-{
-   if (!xen_initial_domain())
-   return -EINVAL;
-
-   /* we assume that dom0 is mapped 1:1 for now */
-   *dma_handle = pstart;
-   return 0;
-}
-
-void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
-{
-   return;
-}
-
 static int __init xen_mm_init(void)
 {
struct gnttab_cache_flush cflush;
diff --git a/arch/arm64/include/asm/xen/page-coherent.h 
b/arch/arm64/include/asm/xen/page-coherent.h
deleted file mode 100644
index 27e984977402b..0
--- a/arch/arm64/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,2 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include 
diff --git a/arch/x86/include/asm/xen/page-coherent.h 
b/arch/x86/include/asm/xen/page-coherent.h
deleted file mode 100644
index 63cd41b2e17ac..0
--- a/arch/x86/include/asm/xen/page-coherent.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_XEN_PAGE_COHERENT_H
-#define _ASM_X86_XEN_PAGE_COHERENT_H
-
-#include 
-#include 
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-   dma_addr_t *dma_handle, gfp_t flags,
-   unsigned long attrs)
-{
-   void *vstart = (void*)__get_free_pages(flags, get_order(size));
-   *dma_handle = virt_to_phys(vstart);
-   return vstart;
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-   void *cpu_addr, dma_addr_t dma_handle,
-   unsigned long attrs)
-{
-   free_pages((unsigned long) cpu_addr, get_order(size));
-}
-
-#endif /* _ASM_X86_XEN_PAGE_COHERENT_H */
diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h 
b/arch/x86/include/asm/xen/swiotlb-xen.h
index 66b4ddde77430..558821387808e 100644
--- a/arch/x86/include/asm/xen/swiotlb-xen.h
+++ b/arch/x86/include/asm/xen/swiotlb-xen.h
@@ -10,4 +10,9 @@ extern int pci_xen_swiotlb_init_late(void);
 static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; }
 #endif
 
+int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
+   unsigned int address_bits,
+   dma_addr_t *dma_handle);
+void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order);
+
 #endif /* _ASM_X86_SWIOTLB_XEN_H */
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 47aebd98f52f5..557edb9c54879 100644
--- a/drivers/xen/swiotlb-xen.

Re: fully convert arm to use dma-direct

2022-04-23 Thread Christoph Hellwig
On Sat, Apr 23, 2022 at 11:27:13AM +0100, Marc Zyngier wrote:
>> I think Marc Z has a Netwinder that he can test this on. Marc?
>> I have one too, just not much in my office because of parental leave.
>
> I'm about to travel for a week. Can this wait until I'm back?
> This is one of the few boxes that isn't hooked up to the PDU,
> so I can't operate it remotely.

No rush at all from my side except that I;d love to get it into
5.19.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-23 Thread kernel test robot
Hi Shameer,

I love your patch! Perhaps something to improve:

[auto build test WARNING on joro-iommu/next]
[also build test WARNING on rafael-pm/linux-next arm/for-next 
arm64/for-next/core soc/for-next linus/master v5.18-rc3 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm64-allyesconfig 
(https://download.01.org/0day-ci/archive/20220423/202204232022.kmubee9l-...@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/5b73fd681a27e2ad450bac28f8a81f4b35fe4d68
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822
git checkout 5b73fd681a27e2ad450bac28f8a81f4b35fe4d68
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 
O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/acpi/arm64/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/acpi/arm64/iort.c:801:29: warning: no previous prototype for 
>> 'iort_rmr_alloc' [-Wmissing-prototypes]
 801 | struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc 
*rmr_desc,
 | ^~
   drivers/acpi/arm64/iort.c: In function 'iort_get_rmrs':
   drivers/acpi/arm64/iort.c:896:34: error: 'ACPI_IORT_RMR_REMAP_PERMITTED' 
undeclared (first use in this function)
 896 | if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED)
 |  ^
   drivers/acpi/arm64/iort.c:896:34: note: each undeclared identifier is 
reported only once for each function it appears in
   drivers/acpi/arm64/iort.c:901:34: error: 'ACPI_IORT_RMR_ACCESS_PRIVILEGE' 
undeclared (first use in this function)
 901 | if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE)
 |  ^~
   drivers/acpi/arm64/iort.c:905:21: error: implicit declaration of function 
'ACPI_IORT_RMR_ACCESS_ATTRIBUTES'; did you mean 'ACPI_IORT_MF_ATTRIBUTES'? 
[-Werror=implicit-function-declaration]
 905 | if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <=
 | ^~~
 | ACPI_IORT_MF_ATTRIBUTES
   drivers/acpi/arm64/iort.c:906:33: error: 'ACPI_IORT_RMR_ATTR_DEVICE_GRE' 
undeclared (first use in this function)
 906 | ACPI_IORT_RMR_ATTR_DEVICE_GRE)
 | ^
   drivers/acpi/arm64/iort.c:909:33: error: 'ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB' 
undeclared (first use in this function)
 909 | ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB)
 | ^
   cc1: some warnings being treated as errors


vim +/iort_rmr_alloc +801 drivers/acpi/arm64/iort.c

   800  
 > 801  struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc 
 > *rmr_desc,
   802 int prot, enum 
iommu_resv_type type,
   803 u32 *sids, u32 num_sids)
   804  {
   805  struct iommu_iort_rmr_data *rmr_data;
   806  struct iommu_resv_region *region;
   807  u32 *sids_copy;
   808  u64 addr = rmr_desc->base_address, size = rmr_desc->length;
   809  
   810  rmr_data = kmalloc(sizeof(*rmr_data), GFP_KERNEL);
   811  if (!rmr_data)
   812  return NULL;
   813  
   814  /* Create a copy of SIDs array to associate with this rmr_data 
*/
   815  sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL);
   816  if (!sids_copy) {
   817  kfree(rmr_data);
   818  return NULL;
   819  }
   820  rmr_data->sids = sids_copy;
   821  rmr_data->num_sids = num_sids;
   822  
   823  if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) {
   824  /* PAGE align base add

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-23 Thread zhangfei....@foxmail.com

Hi, Jean

On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote:

On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote:

I'm trying to piece together what happens from the kernel point of view.

* master process with mm A opens a queue fd through uacce, which calls
iommu_sva_bind_device(dev, A) -> PASID 1

* master forks and exits. Child (daemon) gets mm B, inherits the queue fd.
The device is still bound to mm A with PASID 1, since the queue fd is
still open.
We discussed this before, but I don't remember where we left off. The
child can't use the queue because its mappings are not copied on fork(),
and the queue is still bound to the parent mm A. The child either needs to
open a new queue or take ownership of the old one with a new uacce ioctl.

Yes, currently nginx aligned with the case.
Child process (worker process) reopen uacce,

Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID
1
Master process fork Child (daemon) and exit.

Child (daemon)  does not use PASID 1 any more, only fork and manage worker
process.
Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2

So it is expected.

Yes, that's fine


Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of
tree?  This operation should unbind from A before binding to B, no?
Otherwise we leak PASID 1.

In 5.16 PASID 1 from master is hold until nginx service stop.
nginx start
master:
iommu_sva_alloc_pasid mm->pasid=1      // master process

lynx https start:
iommu_sva_alloc_pasid mm->pasid=2    //worker process

nginx stop:  from fops_release
iommu_sva_free_pasid mm->pasid=2   // worker process
iommu_sva_free_pasid mm->pasid=1  // master process

That's the expected behavior (master could close its fd before forking, in
order to free things up earlier, but it's not mandatory)
Currently we unbind in fops_release, so the ioasid allocated in master 
can only be freed when nginx stop,

when all forked fd are closed.




Have one silly question.

kerne driver
fops_open
iommu_sva_bind_device

fops_release
iommu_sva_unbind_device

application
main()
fd = open
return;

Application exit but not close(fd), is it expected fops_release will be
called automatically by system?

Yes, the application doesn't have to call close() explicitly, the file
descriptor is closed automatically on exit. Note that the fd is copied on
fork(), so it is only released once parent and all child processes exit.

Yes, in case the application ended unexpected, like ctrl+c



On 5.17
fops_release is called automatically, as well as iommu_sva_unbind_device.
On 5.18-rc1.
fops_release is not called, have to manually call close(fd)

Right that's weird
Looks it is caused by the fix patch, via mmget, which may add refcount 
of fd.


Some experiments
1. 5.17, everything works well.

2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID 
allocation and free it on mm exit"


Test application, exit without close uacce fd
First time:  fops_release can be called automatically.

log:
ioasid_alloc ioasid=1
iommu_sva_alloc_pasid pasid=1
iommu_sva_bind_device handle=263a2ee8
ioasid_free ioasid=1
uacce_fops_release q=55ca3cdf
iommu_sva_unbind_device handle=263a2ee8

Second time: hardware reports error

uacce_fops_open q=8e4d6f78
ioasid_alloc ioasid=1
iommu_sva_alloc_pasid pasid=1
iommu_sva_bind_device handle=cfd11788
// Haredware reports error
hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found
hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] 
found

hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found
hisi_sec2 :b6:00.0: Controller resetting...
hisi_sec2 :b6:00.0: QM mailbox operation timeout!
hisi_sec2 :b6:00.0: Failed to dump sqc!
hisi_sec2 :b6:00.0: Failed to drain out data for stopping!
hisi_sec2 :b6:00.0: Bus lock! Please reset system.
hisi_sec2 :b6:00.0: Controller reset failed (-110)
hisi_sec2 :b6:00.0: controller reset failed (-110)

3. Add the fix patch of using mmget in bind.
Test application, exit without close uacce fd

fops_release can NOT be called automatically, looks mmget adds refcount 
of fd.


So the fix method of using mmget blocks fops_release to be called once 
fd is closed without unbind.



Since nginx may have a issue, it does not call close(fd) when nginx -s quit.

And you're sure that none of the processes are still alive or in zombie
state?  Just to cover every possibility.

It can also reproduced by a simple application exit without close(uacce_fd)

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: fully convert arm to use dma-direct

2022-04-23 Thread Marc Zyngier

On 2022-04-22 22:17, Linus Walleij wrote:

On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:


arm is the last platform not using the dma-direct code for directly
mapped DMA.  With the dmaboune removal from Arnd we can easily switch
arm to always use dma-direct now (it already does for LPAE configs
and nommu).  I'd love to merge this series through the dma-mapping 
tree

as it gives us the opportunity for additional core dma-mapping
improvements.

(...)


 b/arch/arm/mach-footbridge/Kconfig   |1
 b/arch/arm/mach-footbridge/common.c  |   19
 b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8
 b/arch/arm/mach-footbridge/include/mach/memory.h |4


I think Marc Z has a Netwinder that he can test this on. Marc?
I have one too, just not much in my office because of parental leave.


I'm about to travel for a week. Can this wait until I'm back?
This is one of the few boxes that isn't hooked up to the PDU,
so I can't operate it remotely.

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-23 Thread kernel test robot
Hi Shameer,

I love your patch! Perhaps something to improve:

[auto build test WARNING on joro-iommu/next]
[also build test WARNING on rafael-pm/linux-next arm/for-next 
arm64/for-next/core soc/for-next linus/master v5.18-rc3 next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm64-randconfig-r012-20220422 
(https://download.01.org/0day-ci/archive/20220423/202204231737.0jkkpxzk-...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# 
https://github.com/intel-lab-lkp/linux/commit/5b73fd681a27e2ad450bac28f8a81f4b35fe4d68
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Shameer-Kolothum/ACPI-IORT-Support-for-IORT-RMR-node/20220423-003822
git checkout 5b73fd681a27e2ad450bac28f8a81f4b35fe4d68
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/acpi/arm64/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/acpi/arm64/iort.c:801:29: warning: no previous prototype for 
>> function 'iort_rmr_alloc' [-Wmissing-prototypes]
   struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc 
*rmr_desc,
   ^
   drivers/acpi/arm64/iort.c:801:1: note: declare 'static' if the function is 
not intended to be used outside of this translation unit
   struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc 
*rmr_desc,
   ^
   static 
   drivers/acpi/arm64/iort.c:896:20: error: use of undeclared identifier 
'ACPI_IORT_RMR_REMAP_PERMITTED'
   if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED)
^
   drivers/acpi/arm64/iort.c:901:20: error: use of undeclared identifier 
'ACPI_IORT_RMR_ACCESS_PRIVILEGE'
   if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE)
^
   drivers/acpi/arm64/iort.c:905:7: error: call to undeclared function 
'ACPI_IORT_RMR_ACCESS_ATTRIBUTES'; ISO C99 and later do not support implicit 
function declarations [-Wimplicit-function-declaration]
   if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <=
   ^
   drivers/acpi/arm64/iort.c:906:5: error: use of undeclared identifier 
'ACPI_IORT_RMR_ATTR_DEVICE_GRE'
   ACPI_IORT_RMR_ATTR_DEVICE_GRE)
   ^
   drivers/acpi/arm64/iort.c:909:5: error: use of undeclared identifier 
'ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB'
   ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB)
   ^
   1 warning and 5 errors generated.


vim +/iort_rmr_alloc +801 drivers/acpi/arm64/iort.c

   800  
 > 801  struct iommu_iort_rmr_data *iort_rmr_alloc(struct acpi_iort_rmr_desc 
 > *rmr_desc,
   802 int prot, enum 
iommu_resv_type type,
   803 u32 *sids, u32 num_sids)
   804  {
   805  struct iommu_iort_rmr_data *rmr_data;
   806  struct iommu_resv_region *region;
   807  u32 *sids_copy;
   808  u64 addr = rmr_desc->base_address, size = rmr_desc->length;
   809  
   810  rmr_data = kmalloc(sizeof(*rmr_data), GFP_KERNEL);
   811  if (!rmr_data)
   812  return NULL;
   813  
   814  /* Create a copy of SIDs array to associate with this rmr_data 
*/
   815  sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL);
   816  if (!sids_copy) {
   817  kfree(rmr_data);
   818  return NULL;
   819  }
   820  rmr_data->sids = sids_copy;
   821  rmr_data->num_sids = num_sids;
   822  
   823  if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) {
   824  /* PAGE align base addr and size */
   825  addr &= PAGE_MASK;
   826  size = PAGE_ALIGN(

Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-23 Thread Lu Baolu

On 2022/4/23 17:00, Lu Baolu wrote:

On 2022/4/23 16:51, Lu Baolu wrote:

On 2022/4/23 16:37, Robin Murphy wrote:

On 2022-04-23 09:01, Lu Baolu wrote:

Hi Robin,

On 2022/4/19 15:20, Robin Murphy wrote:

On 2022-04-19 00:37, Lu Baolu wrote:

On 2022/4/19 6:09, Robin Murphy wrote:

On 2022-04-16 01:04, Lu Baolu wrote:

On 2022/4/14 20:42, Robin Murphy wrote:
@@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct 
bus_type *bus)

   */
  int bus_set_iommu(struct bus_type *bus, const struct 
iommu_ops *ops)

  {
-    int err;
-
-    if (ops == NULL) {
-    bus->iommu_ops = NULL;
-    return 0;
-    }
-
-    if (bus->iommu_ops != NULL)
+    if (bus->iommu_ops && ops && bus->iommu_ops != ops)
  return -EBUSY;
  bus->iommu_ops = ops;


Do we still need to keep above lines in bus_set_iommu()?


It preserves the existing behaviour until each callsite and its 
associated error handling are removed later on, which seems like 
as good a thing to do as any. Since I'm already relaxing 
iommu_device_register() to a warn-but-continue behaviour while it 
keeps the bus ops on life-support internally, I figured not 
changing too much at once would make it easier to bisect any 
potential issues arising from this first step.


Fair enough. Thank you for the explanation.

Do you have a public tree that I could pull these patches and try 
them

on an Intel hardware? Or perhaps you have done this? I like the whole
idea of this series, but it's better to try it with a real hardware.


I haven't bothered with separate branches since there's so many 
different pieces in-flight, but my complete (unstable) development 
branch can be found here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

For now I'd recommend winding the head back to "iommu: Clean up 
bus_set_iommu()" for testing - some of the patches above that have 
already been posted and picked up by their respective subsystems, 
but others are incomplete and barely compile-tested. I'll probably 
rearrange it later this week to better reflect what's happened so far.


I wound the head back to "iommu: Clean up bus_set_iommu" and tested it
on an Intel machine. It got stuck during boot. This test was on a 
remote

machine and I have no means to access it physically. So I can't get any
kernel debugging messages. (I have to work from home these days. :-()

I guess it's due to the fact that intel_iommu_probe_device() callback
only works for the pci devices. The issue occurs when probing a device
other than a PCI one.


Yeah, I was wondering if that would be significant, since it's the 
only driver that never registered itself for platform_bus_type so 
won't have actually seen those calls before. I'm inclined to bodge 
that as below for now, as long as it then works OK in terms of the 
rest of the changes.


Thanks,
Robin.

->8-
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9fa1b98186a3..6e359f92ec00 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4565,6 +4565,10 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)

  unsigned long flags;
  u8 bus, devfn;

+    /* ANDD platform device support needs fixing */
+    if (!pdev)
+    return ERR_PTR(-ENODEV);
+
  iommu = device_to_iommu(dev, &bus, &devfn);
  if (!iommu)
  return ERR_PTR(-ENODEV);


I haven't seen any real ANDD platform devices, hence this works for me.


Or more precisely, return NULL when @dev goes through device_to_iommu()?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942..0d447739e076 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -797,8 +797,11 @@ struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devfn)

     pf_pdev = pci_physfn(pdev);
     dev = &pf_pdev->dev;
     segment = pci_domain_nr(pdev->bus);
-   } else if (has_acpi_companion(dev))
+   } else if (has_acpi_companion(dev)) {
     dev = &ACPI_COMPANION(dev)->dev;
+   } else {
+   return NULL;
+   }

     rcu_read_lock();
     for_each_iommu(iommu, drhd) {


Robin, please ignore this. "has_acpi_companion(dev)" isn't equal to an
ANDD device. Please use yours. Sorry for the noise.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-23 Thread Lu Baolu

On 2022/4/23 16:51, Lu Baolu wrote:

On 2022/4/23 16:37, Robin Murphy wrote:

On 2022-04-23 09:01, Lu Baolu wrote:

Hi Robin,

On 2022/4/19 15:20, Robin Murphy wrote:

On 2022-04-19 00:37, Lu Baolu wrote:

On 2022/4/19 6:09, Robin Murphy wrote:

On 2022-04-16 01:04, Lu Baolu wrote:

On 2022/4/14 20:42, Robin Murphy wrote:
@@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct 
bus_type *bus)

   */
  int bus_set_iommu(struct bus_type *bus, const struct iommu_ops 
*ops)

  {
-    int err;
-
-    if (ops == NULL) {
-    bus->iommu_ops = NULL;
-    return 0;
-    }
-
-    if (bus->iommu_ops != NULL)
+    if (bus->iommu_ops && ops && bus->iommu_ops != ops)
  return -EBUSY;
  bus->iommu_ops = ops;


Do we still need to keep above lines in bus_set_iommu()?


It preserves the existing behaviour until each callsite and its 
associated error handling are removed later on, which seems like 
as good a thing to do as any. Since I'm already relaxing 
iommu_device_register() to a warn-but-continue behaviour while it 
keeps the bus ops on life-support internally, I figured not 
changing too much at once would make it easier to bisect any 
potential issues arising from this first step.


Fair enough. Thank you for the explanation.

Do you have a public tree that I could pull these patches and try them
on an Intel hardware? Or perhaps you have done this? I like the whole
idea of this series, but it's better to try it with a real hardware.


I haven't bothered with separate branches since there's so many 
different pieces in-flight, but my complete (unstable) development 
branch can be found here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

For now I'd recommend winding the head back to "iommu: Clean up 
bus_set_iommu()" for testing - some of the patches above that have 
already been posted and picked up by their respective subsystems, 
but others are incomplete and barely compile-tested. I'll probably 
rearrange it later this week to better reflect what's happened so far.


I wound the head back to "iommu: Clean up bus_set_iommu" and tested it
on an Intel machine. It got stuck during boot. This test was on a remote
machine and I have no means to access it physically. So I can't get any
kernel debugging messages. (I have to work from home these days. :-()

I guess it's due to the fact that intel_iommu_probe_device() callback
only works for the pci devices. The issue occurs when probing a device
other than a PCI one.


Yeah, I was wondering if that would be significant, since it's the 
only driver that never registered itself for platform_bus_type so 
won't have actually seen those calls before. I'm inclined to bodge 
that as below for now, as long as it then works OK in terms of the 
rest of the changes.


Thanks,
Robin.

->8-
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9fa1b98186a3..6e359f92ec00 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4565,6 +4565,10 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)

  unsigned long flags;
  u8 bus, devfn;

+    /* ANDD platform device support needs fixing */
+    if (!pdev)
+    return ERR_PTR(-ENODEV);
+
  iommu = device_to_iommu(dev, &bus, &devfn);
  if (!iommu)
  return ERR_PTR(-ENODEV);


I haven't seen any real ANDD platform devices, hence this works for me.


Or more precisely, return NULL when @dev goes through device_to_iommu()?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942..0d447739e076 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -797,8 +797,11 @@ struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devfn)

pf_pdev = pci_physfn(pdev);
dev = &pf_pdev->dev;
segment = pci_domain_nr(pdev->bus);
-   } else if (has_acpi_companion(dev))
+   } else if (has_acpi_companion(dev)) {
dev = &ACPI_COMPANION(dev)->dev;
+   } else {
+   return NULL;
+   }

rcu_read_lock();
for_each_iommu(iommu, drhd) {

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-23 Thread Lu Baolu

On 2022/4/23 16:37, Robin Murphy wrote:

On 2022-04-23 09:01, Lu Baolu wrote:

Hi Robin,

On 2022/4/19 15:20, Robin Murphy wrote:

On 2022-04-19 00:37, Lu Baolu wrote:

On 2022/4/19 6:09, Robin Murphy wrote:

On 2022-04-16 01:04, Lu Baolu wrote:

On 2022/4/14 20:42, Robin Murphy wrote:
@@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type 
*bus)

   */
  int bus_set_iommu(struct bus_type *bus, const struct iommu_ops 
*ops)

  {
-    int err;
-
-    if (ops == NULL) {
-    bus->iommu_ops = NULL;
-    return 0;
-    }
-
-    if (bus->iommu_ops != NULL)
+    if (bus->iommu_ops && ops && bus->iommu_ops != ops)
  return -EBUSY;
  bus->iommu_ops = ops;


Do we still need to keep above lines in bus_set_iommu()?


It preserves the existing behaviour until each callsite and its 
associated error handling are removed later on, which seems like as 
good a thing to do as any. Since I'm already relaxing 
iommu_device_register() to a warn-but-continue behaviour while it 
keeps the bus ops on life-support internally, I figured not 
changing too much at once would make it easier to bisect any 
potential issues arising from this first step.


Fair enough. Thank you for the explanation.

Do you have a public tree that I could pull these patches and try them
on an Intel hardware? Or perhaps you have done this? I like the whole
idea of this series, but it's better to try it with a real hardware.


I haven't bothered with separate branches since there's so many 
different pieces in-flight, but my complete (unstable) development 
branch can be found here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

For now I'd recommend winding the head back to "iommu: Clean up 
bus_set_iommu()" for testing - some of the patches above that have 
already been posted and picked up by their respective subsystems, but 
others are incomplete and barely compile-tested. I'll probably 
rearrange it later this week to better reflect what's happened so far.


I wound the head back to "iommu: Clean up bus_set_iommu" and tested it
on an Intel machine. It got stuck during boot. This test was on a remote
machine and I have no means to access it physically. So I can't get any
kernel debugging messages. (I have to work from home these days. :-()

I guess it's due to the fact that intel_iommu_probe_device() callback
only works for the pci devices. The issue occurs when probing a device
other than a PCI one.


Yeah, I was wondering if that would be significant, since it's the only 
driver that never registered itself for platform_bus_type so won't have 
actually seen those calls before. I'm inclined to bodge that as below 
for now, as long as it then works OK in terms of the rest of the changes.


Thanks,
Robin.

->8-
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9fa1b98186a3..6e359f92ec00 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4565,6 +4565,10 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)

  unsigned long flags;
  u8 bus, devfn;

+    /* ANDD platform device support needs fixing */
+    if (!pdev)
+    return ERR_PTR(-ENODEV);
+
  iommu = device_to_iommu(dev, &bus, &devfn);
  if (!iommu)
  return ERR_PTR(-ENODEV);


I haven't seen any real ANDD platform devices, hence this works for me.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-23 Thread Robin Murphy

On 2022-04-23 09:01, Lu Baolu wrote:

Hi Robin,

On 2022/4/19 15:20, Robin Murphy wrote:

On 2022-04-19 00:37, Lu Baolu wrote:

On 2022/4/19 6:09, Robin Murphy wrote:

On 2022-04-16 01:04, Lu Baolu wrote:

On 2022/4/14 20:42, Robin Murphy wrote:
@@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type 
*bus)

   */
  int bus_set_iommu(struct bus_type *bus, const struct iommu_ops 
*ops)

  {
-    int err;
-
-    if (ops == NULL) {
-    bus->iommu_ops = NULL;
-    return 0;
-    }
-
-    if (bus->iommu_ops != NULL)
+    if (bus->iommu_ops && ops && bus->iommu_ops != ops)
  return -EBUSY;
  bus->iommu_ops = ops;


Do we still need to keep above lines in bus_set_iommu()?


It preserves the existing behaviour until each callsite and its 
associated error handling are removed later on, which seems like as 
good a thing to do as any. Since I'm already relaxing 
iommu_device_register() to a warn-but-continue behaviour while it 
keeps the bus ops on life-support internally, I figured not changing 
too much at once would make it easier to bisect any potential issues 
arising from this first step.


Fair enough. Thank you for the explanation.

Do you have a public tree that I could pull these patches and try them
on an Intel hardware? Or perhaps you have done this? I like the whole
idea of this series, but it's better to try it with a real hardware.


I haven't bothered with separate branches since there's so many 
different pieces in-flight, but my complete (unstable) development 
branch can be found here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

For now I'd recommend winding the head back to "iommu: Clean up 
bus_set_iommu()" for testing - some of the patches above that have 
already been posted and picked up by their respective subsystems, but 
others are incomplete and barely compile-tested. I'll probably 
rearrange it later this week to better reflect what's happened so far.


I wound the head back to "iommu: Clean up bus_set_iommu" and tested it
on an Intel machine. It got stuck during boot. This test was on a remote
machine and I have no means to access it physically. So I can't get any
kernel debugging messages. (I have to work from home these days. :-()

I guess it's due to the fact that intel_iommu_probe_device() callback
only works for the pci devices. The issue occurs when probing a device
other than a PCI one.


Yeah, I was wondering if that would be significant, since it's the only 
driver that never registered itself for platform_bus_type so won't have 
actually seen those calls before. I'm inclined to bodge that as below 
for now, as long as it then works OK in terms of the rest of the changes.


Thanks,
Robin.

->8-
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9fa1b98186a3..6e359f92ec00 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4565,6 +4565,10 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)

unsigned long flags;
u8 bus, devfn;

+   /* ANDD platform device support needs fixing */
+   if (!pdev)
+   return ERR_PTR(-ENODEV);
+
iommu = device_to_iommu(dev, &bus, &devfn);
if (!iommu)
return ERR_PTR(-ENODEV);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/1] iommu/vt-d: Drop stop marker messages

2022-04-23 Thread Lu Baolu
The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.

The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.

Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu 
Reviewed-by: Jacob Pan 
Reviewed-by: Kevin Tian 
Link: 
https://lore.kernel.org/r/20220421113558.3504874-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/svm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a38763c1d1..7ee37d996e15 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -757,6 +757,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto bad_req;
}
 
+   /* Drop Stop Marker message. No need for a response. */
+   if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
+   goto prq_advance;
+
if (!svm || svm->pasid != req->pasid) {
/*
 * It can't go away, because the driver is not permitted
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/1] iommu/vt-d: Fixes for v5.18-rc4

2022-04-23 Thread Lu Baolu
Hi Joerg,

One fix is queued for v5.18. It aims to fix:

 - Handle PCI stop marker messages in IOMMU driver to meet the
   requirement of I/O page fault handling framework.

Please consider it for the iommu/fix branch.

Best regards,
Lu Baolu

Lu Baolu (1):
  iommu/vt-d: Drop stop marker messages

 drivers/iommu/intel/svm.c | 4 
 1 file changed, 4 insertions(+)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: Move bus setup to IOMMU device registration

2022-04-23 Thread Lu Baolu

Hi Robin,

On 2022/4/19 15:20, Robin Murphy wrote:

On 2022-04-19 00:37, Lu Baolu wrote:

On 2022/4/19 6:09, Robin Murphy wrote:

On 2022-04-16 01:04, Lu Baolu wrote:

On 2022/4/14 20:42, Robin Murphy wrote:
@@ -1883,27 +1900,12 @@ static int iommu_bus_init(struct bus_type 
*bus)

   */
  int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops)
  {
-    int err;
-
-    if (ops == NULL) {
-    bus->iommu_ops = NULL;
-    return 0;
-    }
-
-    if (bus->iommu_ops != NULL)
+    if (bus->iommu_ops && ops && bus->iommu_ops != ops)
  return -EBUSY;
  bus->iommu_ops = ops;


Do we still need to keep above lines in bus_set_iommu()?


It preserves the existing behaviour until each callsite and its 
associated error handling are removed later on, which seems like as 
good a thing to do as any. Since I'm already relaxing 
iommu_device_register() to a warn-but-continue behaviour while it 
keeps the bus ops on life-support internally, I figured not changing 
too much at once would make it easier to bisect any potential issues 
arising from this first step.


Fair enough. Thank you for the explanation.

Do you have a public tree that I could pull these patches and try them
on an Intel hardware? Or perhaps you have done this? I like the whole
idea of this series, but it's better to try it with a real hardware.


I haven't bothered with separate branches since there's so many 
different pieces in-flight, but my complete (unstable) development 
branch can be found here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

For now I'd recommend winding the head back to "iommu: Clean up 
bus_set_iommu()" for testing - some of the patches above that have 
already been posted and picked up by their respective subsystems, but 
others are incomplete and barely compile-tested. I'll probably rearrange 
it later this week to better reflect what's happened so far.


I wound the head back to "iommu: Clean up bus_set_iommu" and tested it
on an Intel machine. It got stuck during boot. This test was on a remote
machine and I have no means to access it physically. So I can't get any
kernel debugging messages. (I have to work from home these days. :-()

I guess it's due to the fact that intel_iommu_probe_device() callback
only works for the pci devices. The issue occurs when probing a device
other than a PCI one.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/4] iommu/vt-d: Drop stop marker messages

2022-04-23 Thread Lu Baolu

On 2022/4/22 11:05, Tian, Kevin wrote:

From: Lu Baolu
Sent: Thursday, April 21, 2022 7:36 PM

The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.

The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.

Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu
Reviewed-by: Jacob Pan

Reviewed-by: Kevin Tian



Thank you, Kevin. I will queue this patch to Joerg as a fix for v5.18.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu