Re: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling

2022-06-27 Thread Baolu Lu

Hi Ethan,

On 2022/6/27 21:03, Ethan Zhao wrote:

Hi,

在 2022/6/21 22:43, Lu Baolu 写道:

Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.

The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() waits for pending faults with
iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
to synchronize life cycle of the iommu domains between the unbind() and
the interrupt threads.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
  drivers/iommu/io-pgfault.c | 64 +-
  1 file changed, 7 insertions(+), 57 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..4f24ec703479 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, 
struct iopf_fault *iopf,

  return iommu_page_response(dev, &resp);
  }
-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
-    vm_fault_t ret;
-    struct mm_struct *mm;
-    struct vm_area_struct *vma;
-    unsigned int access_flags = 0;
-    unsigned int fault_flags = FAULT_FLAG_REMOTE;
-    struct iommu_fault_page_request *prm = &iopf->fault.prm;
-    enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
-    if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
-    return status;
-
-    mm = iommu_sva_find(prm->pasid);
-    if (IS_ERR_OR_NULL(mm))
-    return status;
-
-    mmap_read_lock(mm);
-
-    vma = find_extend_vma(mm, prm->addr);
-    if (!vma)
-    /* Unmapped area */
-    goto out_put_mm;
-
-    if (prm->perm & IOMMU_FAULT_PERM_READ)
-    access_flags |= VM_READ;
-
-    if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
-    access_flags |= VM_WRITE;
-    fault_flags |= FAULT_FLAG_WRITE;
-    }
-
-    if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
-    access_flags |= VM_EXEC;
-    fault_flags |= FAULT_FLAG_INSTRUCTION;
-    }
-
-    if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
-    fault_flags |= FAULT_FLAG_USER;
-
-    if (access_flags & ~vma->vm_flags)
-    /* Access fault */
-    goto out_put_mm;
-
-    ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
-    status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
-    IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
-    mmap_read_unlock(mm);
-    mmput(mm);
-
-    return status;
-}
-


Once the iopf_handle_single() is removed, the name of 
iopf_handle_group() looks a little weired


and confused, does this group mean the iommu group (domain) ? while I 
take some minutes to


No. This is not the iommu group. It's page request group defined by the
PCI SIG spec. Multiple page requests could be put in a group with a
same group id. All page requests in a group could be responded to device
in one shot.

Best regards,
baolu



look into the code, oh, means a batch / list / queue  of iopfs , and 
iopf_handle_group() becomes a


generic iopf_handler() .

Doe it make sense to revise the names of iopf_handle_group(), 
iopf_complete_group,  iopf_group in


this patch set ?


Thanks,

Ethan


  static void iopf_handle_group(struct work_struct *work)
  {
  struct iopf_group *group;
+    struct iommu_domain *domain;
  struct iopf_fault *iopf, *next;
  enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
  group = container_of(work, struct iopf_group, work);
+    domain = iommu_get_domain_for_dev_pasid(group->dev,
+    group->last_fault.fault.prm.pasid);
+    if (!domain || !domain->iopf_handler)
+    status = IOMMU_PAGE_RESP_INVALID;
  list_for_each_entry_safe(iopf, next, &group->faults, list) {
  /*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct 
*work)

   * faults in the group if there is an error.
   */
  if (status == IOMMU_PAGE_RESP_SUCCESS)
-    status = iopf_handle_single(iopf);
+    status = domain->iopf_handler(&iopf->fault,
+  domain->fault_data);
  if (!(iopf->fault.prm.flags &
    IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support

2022-06-27 Thread Baolu Lu

On 2022/6/27 19:50, Zhangfei Gao wrote:



On 2022/6/21 下午10:43, Lu Baolu wrote:

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 


Tested-by: Zhangfei Gao 
Have tested the series on aarch64.


Tony has been helping to validate this series on Intel's platform.

Tony, can I add your Test-by as well in this series?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-06-27 Thread Baolu Lu

On 2022/6/27 18:14, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Tuesday, June 21, 2022 10:44 PM
+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
mm_struct *mm)
+{
+   struct iommu_domain *domain;
+   ioasid_t max_pasids;
+   int ret = -EINVAL;
+
+   /* Allocate mm->pasid if necessary. */


this comment is for iommu_sva_alloc_pasid()


Updated.




+   max_pasids = dev->iommu->max_pasids;
+   if (!max_pasids)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+   if (ret)
+   return ERR_PTR(ret);
+



...

+void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+   struct device *dev = handle->dev;
+   struct iommu_domain *domain =
+   container_of(handle, struct iommu_domain, bond);
+   ioasid_t pasid = iommu_sva_get_pasid(handle);
+
+   mutex_lock(&iommu_sva_lock);
+   if (refcount_dec_and_test(&domain->bond.users)) {
+   iommu_detach_device_pasid(domain, dev, pasid);
+   iommu_domain_free(domain);
+   }
+   mutex_unlock(&iommu_sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+   struct iommu_domain *domain =
+   container_of(handle, struct iommu_domain, bond);
+
+   return domain->mm->pasid;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);


Looks this is only used by unbind_device. Just open code it.


It's part of current IOMMU/SVA interfaces for the device drivers, and
has been used in various drivers.

$ git grep iommu_sva_get_pasid
drivers/dma/idxd/cdev.c:pasid = iommu_sva_get_pasid(sva);
drivers/iommu/iommu-sva-lib.c:  ioasid_t pasid = 
iommu_sva_get_pasid(handle);
drivers/iommu/iommu-sva-lib.c:u32 iommu_sva_get_pasid(struct iommu_sva 
*handle)

drivers/iommu/iommu-sva-lib.c:EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
drivers/misc/uacce/uacce.c: pasid = iommu_sva_get_pasid(handle);
include/linux/iommu.h:u32 iommu_sva_get_pasid(struct iommu_sva *handle);
include/linux/iommu.h:static inline u32 iommu_sva_get_pasid(struct 
iommu_sva *handle)


Or, I missed anything?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support

2022-06-27 Thread Baolu Lu

On 2022/6/27 19:50, Zhangfei Gao wrote:


On 2022/6/21 下午10:43, Lu Baolu wrote:

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 


Tested-by: Zhangfei Gao 
Have tested the series on aarch64.


Thank you very much! Very appreciated for your help!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 04/11] iommu: Add sva iommu_domain support

2022-06-27 Thread Baolu Lu

Hi Kevin,

On 2022/6/27 16:29, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Tuesday, June 21, 2022 10:44 PM

The sva iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA
domain
   type. The IOMMU drivers that support SVA should provide the sva
   domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
   is still used to free an SVA domain.
- Add helpers to attach an SVA domain to a device and the reverse
   operation.

Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.

The iommu_attach/detach_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc.


I'd split this into two patches. One for adding iommu_attach/
detach_device_pasid() and set/block_dev_pasid ops, and the
other for adding SVA.


Yes. Make sense.




  struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
-   iommu_fault_handler_t handler;
-   void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+   union {
+   struct {/* IOMMU_DOMAIN_DMA */
+   iommu_fault_handler_t handler;
+   void *handler_token;
+   };


why is it DMA domain specific? What about unmanaged
domain? Unrecoverable fault can happen on any type
including SVA. Hence I think above should be domain type
agnostic.


The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Jean has already started this work.

https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/

Currently this is only for DMA domains, hence Robin suggested to make it
exclude with the SVA domain things.

https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-68305f683...@arm.com/




+   struct {/* IOMMU_DOMAIN_SVA */
+   struct mm_struct *mm;
+   };
+   };
  };






+
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+   struct mm_struct *mm)
+{
+   const struct iommu_ops *ops = dev_iommu_ops(dev);
+   struct iommu_domain *domain;
+
+   domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+   if (!domain)
+   return NULL;
+
+   domain->type = IOMMU_DOMAIN_SVA;


It's a bit weird that the type has been specified when calling
ops->domain_alloc while it still leaves to the caller to set the
type. But this is not caused by this series. could be cleaned
up separately.


Yes. Robin has patches to refactor the domain allocation interface,
let's wait and see what it looks like finally.




+
+   mutex_lock(&group->mutex);
+   curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
GFP_KERNEL);
+   if (curr)
+   goto out_unlock;


Need check xa_is_err(old).


Either

(1) old entry is a valid pointer, or
(2) xa_is_err(curr)

are failure cases. Hence, "curr == NULL" is the only check we need. Did
I miss anything?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-27 Thread Michael Schmitz

Hii Geert

Am 28.06.2022 um 09:12 schrieb Michael Schmitz:

Hi Geert,

On 27/06/22 20:26, Geert Uytterhoeven wrote:

Hi Michael,

On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz 
wrote:

Am 18.06.2022 um 00:57 schrieb Arnd Bergmann:

From: Arnd Bergmann 

All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now.  This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.

The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work
correctly
with the old interfaces and are probably not worth changing.

The Amiga SCSI drivers are all old WD33C93 ones, and replacing
virt_to_bus by virt_to_phys in the dma_setup() function there would
cause no functional change at all.

FTR, the sgiwd93 driver use dma_map_single().


Thanks! From what I see, it doesn't have to deal with bounce buffers
though?


Leaving the bounce buffer handling in place, and taking a few other 
liberties - this is what converting the easiest case (a3000 SCSI) might 
look like. Any obvious mistakes? The mvme147 driver would be very 
similar to handle (after conversion to a platform device).


The driver allocates bounce buffers using kmalloc if it hits an 
unaligned data buffer - can such buffers still even happen these days? 
If I understand dma_map_single() correctly, the resulting dma handle 
would be equally misaligned?


To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() 
even though AFAIU memory used for DMA buffers generally isn't consistent 
on m68k?


Thinking ahead to the other two Amiga drivers - I wonder whether 
allocating a static bounce buffer or a DMA pool at driver init is likely 
to succeed if the kernel runs from the low 16 MB RAM chunk? It certainly 
won't succeed if the kernel runs from a higher memory address, so the 
present bounce buffer logic around amiga_chip_alloc() might still need 
to be used here.


Leaves the question whether converting the gvp11 and a2091 drivers is 
actually worth it, if bounce buffers still have to be handled explicitly.


Untested (except for compile testing), un-checkpatched, don't try this 
on any disk with valuable data ...


Cheers,

Michael
>From e8c6aa068d27901c49dfb7442d4200cc966350a5 Mon Sep 17 00:00:00 2001
From: Michael Schmitz 
Date: Tue, 28 Jun 2022 12:45:08 +1200
Subject: [PATCH] scsi - convert m68k WD33C93 drivers to DMA API

Use dma_map_single() for gvp11 driver (leave bounce buffer logic unchanged).

Compile-tested only.

Signed-off-by: Michael Schmitz 
---
 drivers/scsi/a3000.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/a3000.c b/drivers/scsi/a3000.c
index dd161885eed1..3c62e8bafb8f 100644
--- a/drivers/scsi/a3000.c
+++ b/drivers/scsi/a3000.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -25,8 +26,11 @@
 struct a3000_hostdata {
 	struct WD33C93_hostdata wh;
 	struct a3000_scsiregs *regs;
+	struct device *dev;
 };
 
+#define DMA_DIR(d)   ((d == DATA_OUT_DIR) ? DMA_TO_DEVICE : DMA_FROM_DEVICE)
+
 static irqreturn_t a3000_intr(int irq, void *data)
 {
 	struct Scsi_Host *instance = data;
@@ -49,12 +53,16 @@ static irqreturn_t a3000_intr(int irq, void *data)
 static int dma_setup(struct scsi_cmnd *cmd, int dir_in)
 {
 	struct scsi_pointer *scsi_pointer = WD33C93_scsi_pointer(cmd);
+	unsigned long len = scsi_pointer->this_residual;
 	struct Scsi_Host *instance = cmd->device->host;
 	struct a3000_hostdata *hdata = shost_priv(instance);
 	struct WD33C93_hostdata *wh = &hdata->wh;
 	struct a3000_scsiregs *regs = hdata->regs;
 	unsigned short cntr = CNTR_PDMD | CNTR_INTEN;
-	unsigned long addr = virt_to_bus(scsi_pointer->ptr);
+	dma_addr_t addr;
+
+	addr = dma_map_single(hdata->dev, scsi_pointer->ptr, len, DMA_DIR(dir_in));
+	scsi_pointer->dma_handle = addr;
 
 	/*
 	 * if the physical address has the wrong alignment, or if
@@ -79,7 +87,7 @@ static int dma_setup(struct scsi_cmnd *cmd, int dir_in)
 			   scsi_pointer->this_residual);
 		}
 
-		addr = virt_to_bus(wh->dma_bounce_buffer);
+		addr = virt_to_phys(wh->dma_bounce_buffer);
 	}
 
 	/* setup dma direction */
@@ -166,6 +174,10 @@ static void dma_stop(struct Scsi_Host *instance, struct scsi_cmnd *SCpnt,
 			wh->dma_bounce_len = 0;
 		}
 	}
+	dma_unmap_single(hdata->dev, scsi_pointer->dma_handle,
+			 scsi_pointer->this_residual,
+			 DMA_DIR(wh->dma_dir));
+
 }
 
 static struct scsi_host_template amiga_a3000_scsi_template = {
@@ -193,6 +205,11 @@ static int __init amiga_a3000_scsi_probe(struct platform_device *pdev)
 	wd33c93_regs wdregs;
 	struct a3000_hostdata *hdata;
 
+	if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(32))) {
+		dev_warn(&pdev->dev, "cannot use 32 bit DMA\n");
+		return -ENODEV;
+	}
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!res)
 		return -E

Re: [PATCH v3 0/3] phase out CONFIG_VIRT_TO_BUS

2022-06-27 Thread Martin K. Petersen


Hi Arnd!

> If there are no more issues identified with this series, I'll merge it
> through the asm-generic tree. The SCSI patches can also get merged
> separately through the SCSI maintainers' tree if they prefer.

I put patches 1 and 2 in scsi-staging to see if anything breaks...

-- 
Martin K. Petersen  Oracle Linux Engineering
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 0/2] Fix console probe delay when stdout-path isn't set

2022-06-27 Thread Fabio Estevam
Hi Saravana,

On Mon, Jun 27, 2022 at 11:03 PM Saravana Kannan  wrote:
>
> Since the series that fixes console probe delay based on stdout-path[1] got
> pulled into driver-core-next, I made these patches on top of them.
>
> Even if stdout-path isn't set in DT, this patch should take console
> probe times back to how they were before the deferred_probe_timeout
> clean up series[2].
>
> Fabio/Ahmad/Sascha,
>
> Can you give this a shot please?

This series works fine for me (with and without stdout-path), thanks:

Tested-by: Fabio Estevam 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 2/2] serial: Set probe_no_timeout for all DT based drivers

2022-06-27 Thread Saravana Kannan via iommu
With commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by
default") the probing of TTY consoles could get delayed if they have
optional suppliers that are listed in DT, but those suppliers don't
probe by the time kernel boot finishes. The console devices will probe
eventually after driver_probe_timeout expires.

However, since consoles are often used for debugging kernel issues, it
does not make sense to delay their probe. So, set the newly added
probe_no_timeout flag for all serial drivers that at DT based. This way,
fw_devlink will know not to delay the probing of the consoles past
kernel boot.

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer 
Reported-by: Peng Fan 
Reported-by: Fabio Estevam 
Reported-by: Ahmad Fatoum 
Signed-off-by: Saravana Kannan 
---
 drivers/tty/ehv_bytechan.c  | 1 +
 drivers/tty/goldfish.c  | 1 +
 drivers/tty/hvc/hvc_opal.c  | 1 +
 drivers/tty/serial/8250/8250_acorn.c| 1 -
 drivers/tty/serial/8250/8250_aspeed_vuart.c | 1 +
 drivers/tty/serial/8250/8250_bcm2835aux.c   | 1 +
 drivers/tty/serial/8250/8250_bcm7271.c  | 1 +
 drivers/tty/serial/8250/8250_dw.c   | 1 +
 drivers/tty/serial/8250/8250_em.c   | 1 +
 drivers/tty/serial/8250/8250_ingenic.c  | 1 +
 drivers/tty/serial/8250/8250_lpc18xx.c  | 1 +
 drivers/tty/serial/8250/8250_mtk.c  | 1 +
 drivers/tty/serial/8250/8250_of.c   | 1 +
 drivers/tty/serial/8250/8250_omap.c | 1 +
 drivers/tty/serial/8250/8250_pxa.c  | 1 +
 drivers/tty/serial/8250/8250_tegra.c| 1 +
 drivers/tty/serial/8250/8250_uniphier.c | 1 +
 drivers/tty/serial/altera_jtaguart.c| 1 +
 drivers/tty/serial/altera_uart.c| 1 +
 drivers/tty/serial/amba-pl011.c | 1 +
 drivers/tty/serial/apbuart.c| 1 +
 drivers/tty/serial/ar933x_uart.c| 1 +
 drivers/tty/serial/arc_uart.c   | 1 +
 drivers/tty/serial/atmel_serial.c   | 1 +
 drivers/tty/serial/bcm63xx_uart.c   | 1 +
 drivers/tty/serial/clps711x.c   | 1 +
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 1 +
 drivers/tty/serial/digicolor-usart.c| 1 +
 drivers/tty/serial/fsl_linflexuart.c| 1 +
 drivers/tty/serial/fsl_lpuart.c | 1 +
 drivers/tty/serial/imx.c| 1 +
 drivers/tty/serial/lantiq.c | 1 +
 drivers/tty/serial/liteuart.c   | 1 +
 drivers/tty/serial/lpc32xx_hs.c | 1 +
 drivers/tty/serial/max310x.c| 1 +
 drivers/tty/serial/meson_uart.c | 1 +
 drivers/tty/serial/milbeaut_usio.c  | 1 +
 drivers/tty/serial/mpc52xx_uart.c   | 1 +
 drivers/tty/serial/mps2-uart.c  | 1 +
 drivers/tty/serial/msm_serial.c | 1 +
 drivers/tty/serial/mvebu-uart.c | 1 +
 drivers/tty/serial/mxs-auart.c  | 1 +
 drivers/tty/serial/omap-serial.c| 1 +
 drivers/tty/serial/owl-uart.c   | 1 +
 drivers/tty/serial/pic32_uart.c | 1 +
 drivers/tty/serial/pmac_zilog.c | 1 +
 drivers/tty/serial/pxa.c| 1 +
 drivers/tty/serial/qcom_geni_serial.c   | 1 +
 drivers/tty/serial/rda-uart.c   | 1 +
 drivers/tty/serial/samsung_tty.c| 1 +
 drivers/tty/serial/sc16is7xx.c  | 1 +
 drivers/tty/serial/serial-tegra.c   | 1 +
 drivers/tty/serial/sh-sci.c | 1 +
 drivers/tty/serial/sifive.c | 1 +
 drivers/tty/serial/sprd_serial.c| 1 +
 drivers/tty/serial/st-asc.c | 1 +
 drivers/tty/serial/stm32-usart.c| 1 +
 drivers/tty/serial/sunhv.c  | 1 +
 drivers/tty/serial/sunplus-uart.c   | 1 +
 drivers/tty/serial/sunsab.c | 1 +
 drivers/tty/serial/sunsu.c  | 1 +
 drivers/tty/serial/sunzilog.c   | 1 +
 drivers/tty/serial/tegra-tcu.c  | 1 +
 drivers/tty/serial/uartlite.c   | 1 +
 drivers/tty/serial/ucc_uart.c   | 1 +
 drivers/tty/serial/vt8500_serial.c  | 1 +
 drivers/tty/serial/xilinx_uartps.c  | 1 +
 67 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c
index 19d32cb6af84..6de710da99be 100644
--- a/drivers/tty/ehv_bytechan.c
+++ b/drivers/tty/ehv_bytechan.c
@@ -739,6 +739,7 @@ static struct platform_driver ehv_bc_tty_driver = {
.driver = {
.name = "ehv-bc",
.of_match_table = ehv_bc_tty_of_ids,
+   .probe_no_timeout = true,
.suppress_bind_attrs = true,
},
.probe  = ehv_bc_tty_probe,
diff --git a/drivers/tty/goldfish.c b/drivers/tty/goldfish.c
index c7968aecd870..f9760598c836 100644
--- a/drivers/tty/goldfish.c
+++ b/drivers/tty/goldfish.c
@@ -474,6 +474,7 @@ static struct platform_drive

[PATCH v1 1/2] driver core: Add probe_no_timeout flag for drivers

2022-06-27 Thread Saravana Kannan via iommu
This flag only needs to be set for drivers of devices that meet all the
following conditions:
- Need to probe successfully before userspace init in started
- Have optional suppliers
- Can't wait for deferred_probe_timeout to expire

fw_devlink=on uses this info, as needed, to ignore dependencies on supplier
devices that have not been added or supplier devices that don't have any
drivers.  It's still up to the driver to decide which of the missing
suppliers are optional or not.

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer 
Reported-by: Peng Fan 
Reported-by: Fabio Estevam 
Reported-by: Ahmad Fatoum 
Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h   |  1 +
 drivers/base/core.c   |  7 +++
 drivers/base/dd.c |  3 +++
 include/linux/device.h|  7 +++
 include/linux/device/driver.h | 11 +++
 5 files changed, 29 insertions(+)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index b3a43a164dcd..149822d2086f 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -193,6 +193,7 @@ extern void device_links_no_driver(struct device *dev);
 extern bool device_links_busy(struct device *dev);
 extern void device_links_unbind_consumers(struct device *dev);
 extern void fw_devlink_drivers_done(void);
+extern void fw_devlink_probe_no_timeout(void);
 
 /* device pm support */
 void device_pm_move_to_tail(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ccdd5b4295de..8e18904a1584 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1;
 static DEFINE_MUTEX(fwnode_link_lock);
 static bool fw_devlink_is_permissive(void);
 static bool fw_devlink_drv_reg_done;
+static bool fw_devlink_no_timeout;
 static bool fw_devlink_best_effort;
 
 /**
@@ -969,6 +970,7 @@ static void device_links_missing_supplier(struct device 
*dev)
 static bool dev_is_best_effort(struct device *dev)
 {
return (fw_devlink_best_effort && dev->can_match) ||
+   (fw_devlink_no_timeout && dev->probe_no_timeout) ||
(dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT));
 }
 
@@ -1688,6 +1690,11 @@ void fw_devlink_drivers_done(void)
device_links_write_unlock();
 }
 
+void fw_devlink_probe_no_timeout(void)
+{
+   fw_devlink_no_timeout = true;
+}
+
 /**
  * wait_for_init_devices_probe - Try to probe any device needed for init
  *
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index e600dd2afc35..9b0ef2b6a058 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -324,6 +324,8 @@ static int deferred_probe_initcall(void)
 
if (!IS_ENABLED(CONFIG_MODULES))
fw_devlink_drivers_done();
+   else
+   fw_devlink_probe_no_timeout();
 
/*
 * Trigger deferred probe again, this time we won't defer anything
@@ -734,6 +736,7 @@ static int __driver_probe_device(struct device_driver *drv, 
struct device *dev)
return -EBUSY;
 
dev->can_match = true;
+   dev->probe_no_timeout = drv->probe_no_timeout;
pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 drv->bus->name, __func__, dev_name(dev), drv->name);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 424b55df0272..e6246b6cf6cf 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -536,6 +536,12 @@ struct device_physical_location {
  * @can_match: The device has matched with a driver at least once or it is in
  * a bus (like AMBA) which can't check for matching drivers until
  * other devices probe successfully.
+ * @probe_no_timeout: Set by driver core to indicate that this device's probe
+ * can't wait till driver_probe_timeout expires. This information
+ * is used by fw_devlink=on to avoid deferring the probe of this
+ * device to wait on supplier devices that haven't been added or
+ * probed successfully.
+ * See also: probe_no_timeout in struct driver.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
  * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
@@ -642,6 +648,7 @@ struct device {
boolof_node_reused:1;
boolstate_synced:1;
boolcan_match:1;
+   boolprobe_no_timeout:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 7acaabde5396..2ce60e511504 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -55,6 +55,15 @@ enum probe_type {
  * @owner: The module owner.
  * @mod_name:  Used for

[PATCH v1 0/2] Fix console probe delay when stdout-path isn't set

2022-06-27 Thread Saravana Kannan via iommu
Since the series that fixes console probe delay based on stdout-path[1] got
pulled into driver-core-next, I made these patches on top of them.

Even if stdout-path isn't set in DT, this patch should take console
probe times back to how they were before the deferred_probe_timeout
clean up series[2].

Fabio/Ahmad/Sascha,

Can you give this a shot please?

[1] - https://lore.kernel.org/lkml/20220623080344.783549-1-sarava...@google.com/
[2] - 
https://lore.kernel.org/lkml/20220601070707.3946847-1-sarava...@google.com/

Thanks,
Saravana

cc: Rob Herring 
cc: sascha hauer 
cc: peng fan 
cc: kevin hilman 
cc: ulf hansson 
cc: len brown 
cc: pavel machek 
cc: joerg roedel 
cc: will deacon 
cc: andrew lunn 
cc: heiner kallweit 
cc: russell king 
cc: "david s. miller" 
cc: eric dumazet 
cc: jakub kicinski 
cc: paolo abeni 
cc: linus walleij 
cc: hideaki yoshifuji 
cc: david ahern 
cc: kernel-t...@android.com
cc: linux-ker...@vger.kernel.org
cc: linux...@vger.kernel.org
cc: iommu@lists.linux-foundation.org
cc: net...@vger.kernel.org
cc: linux-g...@vger.kernel.org
Cc: ker...@pengutronix.de

Saravana Kannan (2):
  driver core: Add probe_no_timeout flag for drivers
  serial: Set probe_no_timeout for all DT based drivers

 drivers/base/base.h |  1 +
 drivers/base/core.c |  7 +++
 drivers/base/dd.c   |  3 +++
 drivers/tty/ehv_bytechan.c  |  1 +
 drivers/tty/goldfish.c  |  1 +
 drivers/tty/hvc/hvc_opal.c  |  1 +
 drivers/tty/serial/8250/8250_acorn.c|  1 -
 drivers/tty/serial/8250/8250_aspeed_vuart.c |  1 +
 drivers/tty/serial/8250/8250_bcm2835aux.c   |  1 +
 drivers/tty/serial/8250/8250_bcm7271.c  |  1 +
 drivers/tty/serial/8250/8250_dw.c   |  1 +
 drivers/tty/serial/8250/8250_em.c   |  1 +
 drivers/tty/serial/8250/8250_ingenic.c  |  1 +
 drivers/tty/serial/8250/8250_lpc18xx.c  |  1 +
 drivers/tty/serial/8250/8250_mtk.c  |  1 +
 drivers/tty/serial/8250/8250_of.c   |  1 +
 drivers/tty/serial/8250/8250_omap.c |  1 +
 drivers/tty/serial/8250/8250_pxa.c  |  1 +
 drivers/tty/serial/8250/8250_tegra.c|  1 +
 drivers/tty/serial/8250/8250_uniphier.c |  1 +
 drivers/tty/serial/altera_jtaguart.c|  1 +
 drivers/tty/serial/altera_uart.c|  1 +
 drivers/tty/serial/amba-pl011.c |  1 +
 drivers/tty/serial/apbuart.c|  1 +
 drivers/tty/serial/ar933x_uart.c|  1 +
 drivers/tty/serial/arc_uart.c   |  1 +
 drivers/tty/serial/atmel_serial.c   |  1 +
 drivers/tty/serial/bcm63xx_uart.c   |  1 +
 drivers/tty/serial/clps711x.c   |  1 +
 drivers/tty/serial/cpm_uart/cpm_uart_core.c |  1 +
 drivers/tty/serial/digicolor-usart.c|  1 +
 drivers/tty/serial/fsl_linflexuart.c|  1 +
 drivers/tty/serial/fsl_lpuart.c |  1 +
 drivers/tty/serial/imx.c|  1 +
 drivers/tty/serial/lantiq.c |  1 +
 drivers/tty/serial/liteuart.c   |  1 +
 drivers/tty/serial/lpc32xx_hs.c |  1 +
 drivers/tty/serial/max310x.c|  1 +
 drivers/tty/serial/meson_uart.c |  1 +
 drivers/tty/serial/milbeaut_usio.c  |  1 +
 drivers/tty/serial/mpc52xx_uart.c   |  1 +
 drivers/tty/serial/mps2-uart.c  |  1 +
 drivers/tty/serial/msm_serial.c |  1 +
 drivers/tty/serial/mvebu-uart.c |  1 +
 drivers/tty/serial/mxs-auart.c  |  1 +
 drivers/tty/serial/omap-serial.c|  1 +
 drivers/tty/serial/owl-uart.c   |  1 +
 drivers/tty/serial/pic32_uart.c |  1 +
 drivers/tty/serial/pmac_zilog.c |  1 +
 drivers/tty/serial/pxa.c|  1 +
 drivers/tty/serial/qcom_geni_serial.c   |  1 +
 drivers/tty/serial/rda-uart.c   |  1 +
 drivers/tty/serial/samsung_tty.c|  1 +
 drivers/tty/serial/sc16is7xx.c  |  1 +
 drivers/tty/serial/serial-tegra.c   |  1 +
 drivers/tty/serial/sh-sci.c |  1 +
 drivers/tty/serial/sifive.c |  1 +
 drivers/tty/serial/sprd_serial.c|  1 +
 drivers/tty/serial/st-asc.c |  1 +
 drivers/tty/serial/stm32-usart.c|  1 +
 drivers/tty/serial/sunhv.c  |  1 +
 drivers/tty/serial/sunplus-uart.c   |  1 +
 drivers/tty/serial/sunsab.c |  1 +
 drivers/tty/serial/sunsu.c  |  1 +
 drivers/tty/serial/sunzilog.c   |  1 +
 drivers/tty/serial/tegra-tcu.c  |  1 +
 drivers/tty/serial/uartlite.c   |  1 +
 drivers/tty/serial/ucc_uart.c   |  1 +
 drivers/tty/serial/vt8500_serial.c  |  1 +
 drivers/tty/serial/xilinx_uartps.c  |  1 +
 include/linux/device.h  |  7 +++
 include/linux/device/driver.h   | 11 +++
 72 

Re: [PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-27 Thread Damien Le Moal via iommu
On 6/28/22 00:25, John Garry wrote:
> ATA devices (struct ata_device) have a max_sectors field which is
> configured internally in libata. This is then used to (re)configure the
> associated sdev request queue max_sectors value from how it is earlier set
> in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
> according to shost limits, which includes host DMA mapping limits.
> 
> Cap the ata_device max_sectors according to shost->max_sectors to respect
> this shost limit.
> 
> Signed-off-by: John Garry 
> Acked-by: Damien Le Moal 

Nit: please change the patch title to "ata: libata-scsi: Cap ..."

> ---
>  drivers/ata/libata-scsi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 86dbb1cdfabd..24a43d540d9f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1060,6 +1060,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, 
> struct ata_device *dev)
>   dev->flags |= ATA_DFLAG_NO_UNLOAD;
>  
>   /* configure max sectors */
> + dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
>   blk_queue_max_hw_sectors(q, dev->max_sectors);
>  
>   if (dev->class == ATA_DEV_ATAPI) {


-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-27 Thread Michael Schmitz

Hi Geert,

On 27/06/22 20:26, Geert Uytterhoeven wrote:

Hi Michael,

On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz  wrote:

Am 18.06.2022 um 00:57 schrieb Arnd Bergmann:

From: Arnd Bergmann 

All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now.  This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.

The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
with the old interfaces and are probably not worth changing.

The Amiga SCSI drivers are all old WD33C93 ones, and replacing
virt_to_bus by virt_to_phys in the dma_setup() function there would
cause no functional change at all.

FTR, the sgiwd93 driver use dma_map_single().


Thanks! From what I see, it doesn't have to deal with bounce buffers 
though?


Cheers,

    Michael




Gr{oetje,eeting}s,

 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 -- Linus Torvalds

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination

2022-06-27 Thread Robin Murphy

On 2022-06-27 20:21, Alex Williamson wrote:

On Fri, 24 Jun 2022 18:51:44 +0100
Robin Murphy  wrote:


Since IOMMU groups are mandatory for drivers to support, it stands to
reason that any device which has been successfully added to a group
must be on a bus supported by that IOMMU driver, and therefore a domain
viable for any device in the group must be viable for all devices in
the group. This already has to be the case for the IOMMU API's internal
default domain, for instance. Thus even if the group contains devices on
different buses, that can only mean that the IOMMU driver actually
supports such an odd topology, and so without loss of generality we can
expect the bus type of any device in a group to be suitable for IOMMU
API calls.

Furthermore, scrutiny reveals a lack of protection for the bus being
removed while vfio_iommu_type1_attach_group() is using it; the reference
that VFIO holds on the iommu_group ensures that data remains valid, but
does not prevent the group's membership changing underfoot.

We can address both concerns by recycling vfio_bus_type() into some
superficially similar logic to indirect the IOMMU API calls themselves.
Each call is thus protected from races by the IOMMU group's own locking,
and we no longer need to hold group-derived pointers beyond that scope.
It also gives us an easy path for the IOMMU API's migration of bus-based
interfaces to device-based, of which we can already take the first step
with device_iommu_capable(). As with domains, any capability must in
practice be consistent for devices in a given group - and after all it's
still the same capability which was expected to be consistent across an
entire bus! - so there's no need for any complicated validation.

Signed-off-by: Robin Murphy 
---

v3: Complete rewrite yet again, and finally it doesn't feel like we're
stretching any abstraction boundaries the wrong way, and the diffstat
looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP
directly in the callback, but decided I like the consistency of minimal
generic wrappers. And yes, if the capability isn't supported then it
does end up getting tested for the whole group, but meh, it's harmless.

  drivers/vfio/vfio_iommu_type1.c | 42 +
  1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..a77ff00c677b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1679,18 +1679,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
return ret;
  }
  
-static int vfio_bus_type(struct device *dev, void *data)

-{
-   struct bus_type **bus = data;
-
-   if (*bus && *bus != dev->bus)
-   return -EINVAL;
-
-   *bus = dev->bus;
-
-   return 0;
-}
-
  static int vfio_iommu_replay(struct vfio_iommu *iommu,
 struct vfio_domain *domain)
  {
@@ -2153,13 +2141,25 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
list_splice_tail(iova_copy, iova);
  }
  


Any objection if I add the following comment:

/* Redundantly walks non-present capabilities to simplify caller */


Not at all, feel free - I guess if I felt it was worth pre-empting the 
review question then it probably is subtle enough to deserve a code comment!


Thanks,
Robin.



Thanks,
Alex


+static int vfio_iommu_device_capable(struct device *dev, void *data)
+{
+   return device_iommu_capable(dev, (enum iommu_cap)data);
+}
+
+static int vfio_iommu_domain_alloc(struct device *dev, void *data)
+{
+   struct iommu_domain **domain = data;
+
+   *domain = iommu_domain_alloc(dev->bus);
+   return 1; /* Don't iterate */
+}
+
  static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group, enum vfio_group_type type)
  {
struct vfio_iommu *iommu = iommu_data;
struct vfio_iommu_group *group;
struct vfio_domain *domain, *d;
-   struct bus_type *bus = NULL;
bool resv_msi, msi_remap;
phys_addr_t resv_msi_base = 0;
struct iommu_domain_geometry *geo;
@@ -2192,18 +2192,19 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
goto out_unlock;
}
  
-	/* Determine bus_type in order to allocate a domain */

-   ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
-   if (ret)
-   goto out_free_group;
-
ret = -ENOMEM;
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
goto out_free_group;
  
+	/*

+* Going via the iommu_group iterator avoids races, and trivially gives
+* us a representative device for the IOMMU API call. We don't actually
+* want to iterate beyond the first device (if any).
+*/
ret = -EIO;
-   domain->domain = iommu_domain_alloc(bus);
+   iommu_group_for_each_dev(iommu_group, &domain->domain,

Re: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination

2022-06-27 Thread Alex Williamson
On Fri, 24 Jun 2022 18:51:44 +0100
Robin Murphy  wrote:

> Since IOMMU groups are mandatory for drivers to support, it stands to
> reason that any device which has been successfully added to a group
> must be on a bus supported by that IOMMU driver, and therefore a domain
> viable for any device in the group must be viable for all devices in
> the group. This already has to be the case for the IOMMU API's internal
> default domain, for instance. Thus even if the group contains devices on
> different buses, that can only mean that the IOMMU driver actually
> supports such an odd topology, and so without loss of generality we can
> expect the bus type of any device in a group to be suitable for IOMMU
> API calls.
> 
> Furthermore, scrutiny reveals a lack of protection for the bus being
> removed while vfio_iommu_type1_attach_group() is using it; the reference
> that VFIO holds on the iommu_group ensures that data remains valid, but
> does not prevent the group's membership changing underfoot.
> 
> We can address both concerns by recycling vfio_bus_type() into some
> superficially similar logic to indirect the IOMMU API calls themselves.
> Each call is thus protected from races by the IOMMU group's own locking,
> and we no longer need to hold group-derived pointers beyond that scope.
> It also gives us an easy path for the IOMMU API's migration of bus-based
> interfaces to device-based, of which we can already take the first step
> with device_iommu_capable(). As with domains, any capability must in
> practice be consistent for devices in a given group - and after all it's
> still the same capability which was expected to be consistent across an
> entire bus! - so there's no need for any complicated validation.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> v3: Complete rewrite yet again, and finally it doesn't feel like we're
> stretching any abstraction boundaries the wrong way, and the diffstat
> looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP
> directly in the callback, but decided I like the consistency of minimal
> generic wrappers. And yes, if the capability isn't supported then it
> does end up getting tested for the whole group, but meh, it's harmless.
> 
>  drivers/vfio/vfio_iommu_type1.c | 42 +
>  1 file changed, 22 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c13b9290e357..a77ff00c677b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1679,18 +1679,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>   return ret;
>  }
>  
> -static int vfio_bus_type(struct device *dev, void *data)
> -{
> - struct bus_type **bus = data;
> -
> - if (*bus && *bus != dev->bus)
> - return -EINVAL;
> -
> - *bus = dev->bus;
> -
> - return 0;
> -}
> -
>  static int vfio_iommu_replay(struct vfio_iommu *iommu,
>struct vfio_domain *domain)
>  {
> @@ -2153,13 +2141,25 @@ static void vfio_iommu_iova_insert_copy(struct 
> vfio_iommu *iommu,
>   list_splice_tail(iova_copy, iova);
>  }
>  

Any objection if I add the following comment:

/* Redundantly walks non-present capabilities to simplify caller */

Thanks,
Alex

> +static int vfio_iommu_device_capable(struct device *dev, void *data)
> +{
> + return device_iommu_capable(dev, (enum iommu_cap)data);
> +}
> +
> +static int vfio_iommu_domain_alloc(struct device *dev, void *data)
> +{
> + struct iommu_domain **domain = data;
> +
> + *domain = iommu_domain_alloc(dev->bus);
> + return 1; /* Don't iterate */
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>   struct iommu_group *iommu_group, enum vfio_group_type type)
>  {
>   struct vfio_iommu *iommu = iommu_data;
>   struct vfio_iommu_group *group;
>   struct vfio_domain *domain, *d;
> - struct bus_type *bus = NULL;
>   bool resv_msi, msi_remap;
>   phys_addr_t resv_msi_base = 0;
>   struct iommu_domain_geometry *geo;
> @@ -2192,18 +2192,19 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>   goto out_unlock;
>   }
>  
> - /* Determine bus_type in order to allocate a domain */
> - ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type);
> - if (ret)
> - goto out_free_group;
> -
>   ret = -ENOMEM;
>   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
>   if (!domain)
>   goto out_free_group;
>  
> + /*
> +  * Going via the iommu_group iterator avoids races, and trivially gives
> +  * us a representative device for the IOMMU API call. We don't actually
> +  * want to iterate beyond the first device (if any).
> +  */
>   ret = -EIO;
> - domain->domain = iommu_domain_alloc(bus);
> + iommu_group_for_each_dev(iommu_group, &domain->domain,
> +  vfio_iommu_domain_alloc);
>   if

Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-27 Thread Saravana Kannan via iommu
On Mon, Jun 27, 2022 at 10:50 AM Rob Herring  wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > enabled iommus and dmas dependency enforcement by default. On some
> > > systems, this caused the console device's probe to get delayed until the
> > > deferred_probe_timeout expires.
> > >
> > > We need consoles to work as soon as possible, so mark the console device
> > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > the probe of the console device for suppliers without drivers. The
> > > driver can then make the decision on where it can probe without those
> > > suppliers or defer its probe.
> > >
> > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > Reported-by: Sascha Hauer 
> > > Reported-by: Peng Fan 
> > > Signed-off-by: Saravana Kannan 
> > > Tested-by: Peng Fan 
> > > ---
> > >  drivers/of/base.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index d4f98c8469ed..a19cd0c73644 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> > > align))
> > > of_property_read_string(of_aliases, "stdout", &name);
> > > if (name)
> > > of_stdout = of_find_node_opts_by_path(name, 
> > > &of_stdout_options);
> > > +   if (of_stdout)
> > > +   of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >
> > The device given in the stdout-path property doesn't necessarily have to
> > be consistent with the console= parameter. The former is usually
> > statically set in the device trees contained in the kernel while the
> > latter is dynamically set by the bootloader. So if you change the
> > console uart in the bootloader then you'll still run into this trap.
> >
> > It's problematic to consult only the device tree for dependencies. I
> > found several examples of drivers in the tree for which dma support
> > is optional. They use it if they can, but continue without it when
> > not available. "hwlock" is another property which consider several
> > drivers as optional. Also consider SoCs in early upstreaming phases
> > when the device tree is merged with "dmas" or "hwlock" properties,
> > but the corresponding drivers are not yet upstreamed. It's not nice
> > to defer probing of all these devices for a long time.
> >
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> Can't we have a driver flag 'I have optional dependencies' that will
> trigger probe without all dependencies and then the driver can defer
> probe if required dependencies are not yet met.

Haha... that's kinda what I'm working on right now. But named
intentionally in a more limited sense of "I can't wait for the
timeout" where fw_devlink will relax and allow the driver to probe
(and have it make the call) once we hit late_initcall(). I'm
explicitly limiting it to "timeout" because we don't want everyone
adding this flag everytime they hit an issue. That'll beat the point
of fw_devlink=on.

Also, setting the flag for a driver to fix one system might break
another system because in the other system the user might want to wait
for the timeout because the supplier drivers would be loaded before
the timeout.

Another option is to restart the timer (if it hasn't expired) when
filesystems get mounted (in addition to the current "when new driver
gets registered). That way, we might be able to drop the timeout from
10s to 5s.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

2022-06-27 Thread Rob Herring
On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > enabled iommus and dmas dependency enforcement by default. On some
> > systems, this caused the console device's probe to get delayed until the
> > deferred_probe_timeout expires.
> > 
> > We need consoles to work as soon as possible, so mark the console device
> > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > the probe of the console device for suppliers without drivers. The
> > driver can then make the decision on where it can probe without those
> > suppliers or defer its probe.
> > 
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer 
> > Reported-by: Peng Fan 
> > Signed-off-by: Saravana Kannan 
> > Tested-by: Peng Fan 
> > ---
> >  drivers/of/base.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > index d4f98c8469ed..a19cd0c73644 100644
> > --- a/drivers/of/base.c
> > +++ b/drivers/of/base.c
> > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
> > align))
> > of_property_read_string(of_aliases, "stdout", &name);
> > if (name)
> > of_stdout = of_find_node_opts_by_path(name, 
> > &of_stdout_options);
> > +   if (of_stdout)
> > +   of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> 
> The device given in the stdout-path property doesn't necessarily have to
> be consistent with the console= parameter. The former is usually
> statically set in the device trees contained in the kernel while the
> latter is dynamically set by the bootloader. So if you change the
> console uart in the bootloader then you'll still run into this trap.
> 
> It's problematic to consult only the device tree for dependencies. I
> found several examples of drivers in the tree for which dma support
> is optional. They use it if they can, but continue without it when
> not available. "hwlock" is another property which consider several
> drivers as optional. Also consider SoCs in early upstreaming phases
> when the device tree is merged with "dmas" or "hwlock" properties,
> but the corresponding drivers are not yet upstreamed. It's not nice
> to defer probing of all these devices for a long time.
> 
> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

Can't we have a driver flag 'I have optional dependencies' that will 
trigger probe without all dependencies and then the driver can defer 
probe if required dependencies are not yet met.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-27 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according to shost limits, which includes host DMA mapping limits.

Cap the ata_device max_sectors according to shost->max_sectors to respect
this shost limit.

Signed-off-by: John Garry 
Acked-by: Damien Le Moal 
---
 drivers/ata/libata-scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 86dbb1cdfabd..24a43d540d9f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1060,6 +1060,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct 
ata_device *dev)
dev->flags |= ATA_DFLAG_NO_UNLOAD;
 
/* configure max sectors */
+   dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
blk_queue_max_hw_sectors(q, dev->max_sectors);
 
if (dev->class == ATA_DEV_ATAPI) {
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] swiotlb: Split up single swiotlb lock

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 3 files changed, 194 insertions(+), 39 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..7157428cf3ac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +120,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+void __init swiotlb_adjust_nareas(unsigned int nareas);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline void swiotlb_adjust_nareas(unsigned int nareas)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..17154abdfb34 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,7 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = 1;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +80,10 @@ setup_io_tlb_npages(char *str)
default_nslabs =
ALIGN(simple_strtoul(str, &str, 0), IO_TLB_SEGSIZE);
}
+   if (*str == ',')
+   ++str;
+   if (isdigit(*str))
+   swiotlb_adjust_na

[PATCH 0/2] swiotlb: Split up single swiotlb lock

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

Patch 1 is to introduce swiotlb area concept and split up single swiotlb
lock.
Patch 2 set swiotlb area number with lapic number


Tianyu Lan (2):
  swiotlb: Split up single swiotlb lock
  x86/ACPI: Set swiotlb area according to the number of lapic entry in
MADT

 .../admin-guide/kernel-parameters.txt |   4 +-
 arch/x86/kernel/acpi/boot.c   |   3 +
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 4 files changed, 197 insertions(+), 39 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

When initialize swiotlb bounce buffer, smp_init() has not been
called and cpu number can not be got from num_online_cpus().
Use the number of lapic entry to set swiotlb area number and
keep swiotlb area number equal to cpu number on the x86 platform.

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/acpi/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 907cc98b1938..7e13499f2c10 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1131,6 +1132,8 @@ static int __init acpi_parse_madt_lapic_entries(void)
return count;
}
 
+   swiotlb_adjust_nareas(max(count, x2count));
+
x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI,
acpi_parse_x2apic_nmi, 0);
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal mapping limit

2022-06-27 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.

For performance reasons set the request queue max_sectors from
dma_opt_mapping_size(), which knows this mapping limit.

Signed-off-by: John Garry 
---
 drivers/scsi/scsi_transport_sas.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index 12bff64dade6..1b45248748e0 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -225,6 +225,7 @@ static int sas_host_setup(struct transport_container *tc, 
struct device *dev,
 {
struct Scsi_Host *shost = dev_to_shost(dev);
struct sas_host_attrs *sas_host = to_sas_host_attrs(shost);
+   struct device *dma_dev = shost->dma_dev;
 
INIT_LIST_HEAD(&sas_host->rphy_list);
mutex_init(&sas_host->lock);
@@ -236,6 +237,11 @@ static int sas_host_setup(struct transport_container *tc, 
struct device *dev,
dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n",
   shost->host_no);
 
+   if (dma_dev) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }
+
return 0;
 }
 
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/5] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-27 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
---
 drivers/iommu/dma-iommu.c | 6 ++
 drivers/iommu/iova.c  | 5 +
 include/linux/iova.h  | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f90251572a5d..9e1586447ee8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1459,6 +1459,11 @@ static unsigned long iommu_dma_get_merge_boundary(struct 
device *dev)
return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
 }
 
+static size_t iommu_dma_opt_mapping_size(void)
+{
+   return iova_rcache_range();
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc  = iommu_dma_alloc,
.free   = iommu_dma_free,
@@ -1479,6 +1484,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
+   .opt_mapping_size   = iommu_dma_opt_mapping_size,
 };
 
 /*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..9f00b58d546e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 
+unsigned long iova_rcache_range(void)
+{
+   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+
 static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
 {
struct iova_domain *iovad;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..c6ba6d95d79c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
*iovad, dma_addr_t iova)
 int iova_cache_get(void);
 void iova_cache_put(void);
 
+unsigned long iova_rcache_range(void);
+
 void free_iova(struct iova_domain *iovad, unsigned long pfn);
 void __free_iova(struct iova_domain *iovad, struct iova *iova);
 struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/5] scsi: core: Cap shost max_sectors according to DMA mapping limits only once

2022-06-27 Thread John Garry via iommu
The shost->max_sectors is repeatedly capped according to the host DMA
mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so
set only once when adding the host.

Signed-off-by: John Garry 
---
 drivers/scsi/hosts.c| 5 +
 drivers/scsi/scsi_lib.c | 4 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 8352f90d997d..d04bd2c7c9f1 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -236,6 +236,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
shost->dma_dev = dma_dev;
 
+   if (dma_dev->dma_mask) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }
+
error = scsi_mq_setup_tags(shost);
if (error)
goto fail;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6ffc9e4258a8..6ce8acea322a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1884,10 +1884,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
}
 
-   if (dev->dma_mask) {
-   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
-   dma_max_mapping_size(dev) >> SECTOR_SHIFT);
-   }
blk_queue_max_hw_sectors(q, shost->max_sectors);
blk_queue_segment_boundary(q, shost->dma_boundary);
dma_set_seg_boundary(dev, shost->dma_boundary);
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/5] dma-mapping: Add dma_opt_mapping_size()

2022-06-27 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
---
 Documentation/core-api/dma-api.rst |  9 +
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 kernel/dma/mapping.c   | 12 
 4 files changed, 27 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 6d6d0edd2d27..b3cd9763d28b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -204,6 +204,15 @@ Returns the maximum size of a mapping for the device. The 
size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
 others should not be larger than the returned value.
 
+::
+
+   size_t
+   dma_opt_mapping_size(struct device *dev);
+
+Returns the maximum optimal size of a mapping for the device. Mapping large
+buffers may take longer so device drivers are advised to limit total DMA
+streaming mappings length to the returned value.
+
 ::
 
bool
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..98ceba6fa848 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,7 @@ struct dma_map_ops {
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
+   size_t (*opt_mapping_size)(void);
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..fe3849434b2a 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
+size_t dma_opt_mapping_size(struct device *dev);
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
@@ -266,6 +267,10 @@ static inline size_t dma_max_mapping_size(struct device 
*dev)
 {
return 0;
 }
+static inline size_t dma_opt_mapping_size(struct device *dev)
+{
+   return 0;
+}
 static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b74..1bfe11b1edb6 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -773,6 +773,18 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
+size_t dma_opt_mapping_size(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   size_t size = SIZE_MAX;
+
+   if (ops && ops->opt_mapping_size)
+   size = ops->opt_mapping_size();
+
+   return min(dma_max_mapping_size(dev), size);
+}
+EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
+
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/5] DMA mapping changes for SCSI core

2022-06-27 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.

This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.

The SCSI SAS transport code is modified only to use this limit. For now I
did not want to touch other hosts as I have a concern that this change
could cause a performance regression.

I also added a patch for libata-scsi as it does not currently honour the
shost max_sectors limit.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/f5b78c9c-312e-70ab-ecbb-f14623a4b...@arm.com/

Changes since v3:
- Apply max DMA optimial limit to SAS hosts only
  Note: Even though "scsi: core: Cap shost max_sectors only once when
  adding" is a subset of a previous patch I did not transfer the RB tags
- Rebase on v5.19-rc4

Changes since v2:
- Rebase on v5.19-rc1
- Add Damien's tag to 2/4 (thanks)

Changes since v1:
- Relocate scsi_add_host_with_dma() dma_dev check (Reported by Dan)
- Add tags from Damien and Martin (thanks)
  - note: I only added Martin's tag to the SCSI patch

John Garry (5):
  dma-mapping: Add dma_opt_mapping_size()
  dma-iommu: Add iommu_dma_opt_mapping_size()
  scsi: core: Cap shost max_sectors according to DMA mapping limits only
once
  scsi: scsi_transport_sas: Cap shost max_sectors according to DMA
optimal mapping limit
  libata-scsi: Cap ata_device->max_sectors according to
shost->max_sectors

 Documentation/core-api/dma-api.rst |  9 +
 drivers/ata/libata-scsi.c  |  1 +
 drivers/iommu/dma-iommu.c  |  6 ++
 drivers/iommu/iova.c   |  5 +
 drivers/scsi/hosts.c   |  5 +
 drivers/scsi/scsi_lib.c|  4 
 drivers/scsi/scsi_transport_sas.c  |  6 ++
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 include/linux/iova.h   |  2 ++
 kernel/dma/mapping.c   | 12 
 11 files changed, 52 insertions(+), 4 deletions(-)

-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/5] iommu/io-pgtable-dart: Add DART PTE support for t6000

2022-06-27 Thread Robin Murphy

On 2022-06-21 08:18, Janne Grunau wrote:

From: Sven Peter 

The DARTs present in the M1 Pro/Max/Ultra SoC use a diffent PTE format.
They support a 42bit physical address space by shifting the paddr and
extending its mask inside the PTE.
They also come with mandatory sub-page protection now which we just
configure to always allow access to the entire page. This feature is
already present but optional on the previous DARTs which allows to
unconditionally configure it.

Signed-off-by: Sven Peter 
Co-developed-by: Janne Grunau 
Signed-off-by: Janne Grunau 

---

Changes in v3:
- apply change to io-pgtable-dart.c
- handle pte <> paddr conversion based on the pte format instead of
   the output address size

Changes in v2:
- add APPLE_DART2 PTE format

  drivers/iommu/io-pgtable-dart.c | 51 +++--
  drivers/iommu/io-pgtable.c  |  1 +
  include/linux/io-pgtable.h  |  1 +
  3 files changed, 45 insertions(+), 8 deletions(-)


[...]

@@ -536,7 +571,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void 
*cookie)
if (!cfg->coherent_walk)
return NULL;
  
-	if (cfg->oas > 36)

+   if (cfg->oas != 36 && cfg->oas != 42)
return NULL;


Wouldn't it make sense to tie this to the format? Maybe 36-bit OAS is 
still valid with v2, but presumably 42-bit with v1 definitely isn't.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/5] iommu/io-pgtable: Move Apple DART support to its own file

2022-06-27 Thread Robin Murphy

On 2022-06-21 08:18, Janne Grunau wrote:

The pte format used by the DARTs found in the Apple M1 (t8103) is not
fully compatible with io-pgtable-arm. The 24 MSB are used for subpage
protection (mapping only parts of page) and conflict with the address
mask. In addition bit 1 is not available for tagging entries but disables
subpage protection. Subpage protection could be useful to support a CPU
granule of 4k with the fixed IOMMU page size of 16k.

The DARTs found on Apple M1 Pro/Max/Ultra use another different pte
format which is even less compatible. To support an output address size
of 42 bit the address is shifted down by 4. Subpage protection is
mandatory and bit 1 signifies uncached mappings used by the display
controller.

It would be advantageous to share code for all known Apple DART
variants to support common features. The page table allocator for DARTs
is less complex since it uses a two levels of translation table without
support for huge pages.

Signed-off-by: Janne Grunau 

---

Changes in v3:
- move APPLE_DART to its own io-pgtable implementation, copied from
   io-pgtable-arm and simplified

Changes in v2:
- add APPLE_DART2 io-pgtable format

  MAINTAINERS |   1 +
  drivers/iommu/Kconfig   |   1 -
  drivers/iommu/Makefile  |   2 +-
  drivers/iommu/io-pgtable-arm.c  |  63 
  drivers/iommu/io-pgtable-dart.c | 580 
  drivers/iommu/io-pgtable.c  |   2 +
  6 files changed, 584 insertions(+), 65 deletions(-)
  create mode 100644 drivers/iommu/io-pgtable-dart.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1fc9ead83d2a..028b7e31e589 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1848,6 +1848,7 @@ F:drivers/clk/clk-apple-nco.c
  F:drivers/i2c/busses/i2c-pasemi-core.c
  F:drivers/i2c/busses/i2c-pasemi-platform.c
  F:drivers/iommu/apple-dart.c
+F: drivers/iommu/io-pgtable-dart.c
  F:drivers/irqchip/irq-apple-aic.c
  F:drivers/mailbox/apple-mailbox.c
  F:drivers/nvme/host/apple.c
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c79a0df090c0..ed6d8260f330 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -294,7 +294,6 @@ config APPLE_DART
tristate "Apple DART IOMMU Support"
depends on ARCH_APPLE || (COMPILE_TEST && !GENERIC_ATOMIC64)
select IOMMU_API
-   select IOMMU_IO_PGTABLE_LPAE


You still need to select the base IO_PGTABLE. FWIW I think that's 
probably the only crucial issue here - lots more comments below, but 
they're primarily things that could all be unpicked later.



default ARCH_APPLE
help
  Support for Apple DART (Device Address Resolution Table) IOMMUs
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..bd68c93bbd62 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,4 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
  obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
-obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_APPLE_DART) += apple-dart.o io-pgtable-dart.o
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94ff319ae8ac..d7f5e23da643 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -130,9 +130,6 @@
  #define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x88ULL
  #define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
  
-#define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)

-#define APPLE_DART_PTE_PROT_NO_READ (1<<8)
-
  /* IOPTE accessors */
  #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
  
@@ -406,15 +403,6 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,

  {
arm_lpae_iopte pte;
  
-	if (data->iop.fmt == APPLE_DART) {

-   pte = 0;
-   if (!(prot & IOMMU_WRITE))
-   pte |= APPLE_DART_PTE_PROT_NO_WRITE;
-   if (!(prot & IOMMU_READ))
-   pte |= APPLE_DART_PTE_PROT_NO_READ;
-   return pte;
-   }
-
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
pte = ARM_LPAE_PTE_nG;
@@ -1107,52 +1095,6 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
return NULL;
  }
  
-static struct io_pgtable *

-apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
-{
-   struct arm_lpae_io_pgtable *data;
-   int i;
-
-   if (cfg->oas > 36)
-   return NULL;
-
-   data = arm_lpae_alloc_pgtable(cfg);
-   if (!data)
-   return NULL;
-
-   /*
-* The table format itself always uses two levels, but the total VA
-* space is mapped by four separate tables, making the MMIO registers
-* an effective "level 1". For simplicity, though, we treat this
-* equivalently to LPAE stage 2 concatenation at level 2, with the
-

RE: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

2022-06-27 Thread Bjoern A. Zeeb

On Fri, 24 Jun 2022, Shameerali Kolothum Thodi wrote:

Hi,


-Original Message-
From: Steven Price [mailto:steven.pr...@arm.com]
Sent: 17 June 2022 13:42
To: Shameerali Kolothum Thodi ;
linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: Linuxarm ; lorenzo.pieral...@arm.com;
j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
; Guohanjun (Hanjun Guo)
; sami.muja...@arm.com; j...@solid-run.com;
eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org
Subject: Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

On 15/06/2022 11:10, Shameer Kolothum wrote:

Hi

v12 --> v13
  -No changes. Rebased to 5.19-rc1.
  -Picked up tags received from Laurentiu, Hanjun and Will. Thanks!.


You've already got my Tested-by tags, but just to confirm I gave this a
spin and it works fine.


Thanks Steve.

I think the series is now in a good shape to be merged.

Hi Will/Robin,

Appreciate, if you could please take a look at the remaining SMMU related
patches(7-9) and provide your approval?

Thanks,
Shameer


First of all thanks to all of you for keeping this going.

I've read through most of this patch series and it doesn't read
like the best sunny days.

I do understand that there are incentives to get things right; sometimes
first make it work, then make it better? Running code often seems a
better alternative than wrong words on paper as users don't care about
the paper.  They only care if their hardware becomes a paperweight
because it's not working.

I was trying to find diplomatic words but the general problem has become
so much bigger than just this change as I am faced with the fact that
vendors are talking to give up maintaining Arm/ACPI and go back to FDT
exclusively, which I believe would be the wrong but an understandable
exit out of a roundabout.

For me this Arm/Linux/ACPI problem becomes double-impact, as I am not
even a Linux person.  And part of what Arm/ACPI was solving was the
any OS can just works on Arm hardware; for a while people were hoping
it could make FDT the next Flash; it just seems it'll not be because
people cannot get fixes or workarounds for real world problems into
Linux timely?

So a very polite but firm prod towards Cambridge from here as well in
the hope that you can make a big change to this world by helping not
to miss the next merge window/release leading to way bigger impact.
It would be rather sad to close the Arm/ACPI chapter for good but it
seems that we may be standing on the verge of it if things do not move
quick now and different in the future.  It'll certainly need change from
all sides but the good things is that at the end of the day we all want
to make the world a better place.

As I mentioned, I have no stakes in this Linux change.
I just care about Arm and ACPI because I saw light and a chance in it
and I would love to see it stay.
Let's all work together in one direction and make it a brighter future
for everyone.  Can we?  Are you in?


May God bless you and your work,
Bjoern
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device

2022-06-27 Thread Peter Zijlstra
On Mon, Jun 27, 2022 at 09:25:42PM +0800, Yicong Yang wrote:
> On 2022/6/27 21:12, Greg KH wrote:
> > On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote:
> >> Hi Greg,
> >>
> >> Since the kernel side of this device has been reviewed for 8 versions with
> >> all comments addressed and no more comment since v9 posted in 5.19-rc1,
> >> is it ok to merge it first (for Patch 1-3 and 7-8)?
> > 
> > I am not the maintainer of this subsystem, so I do not understand why
> > you are asking me :(
> > 
> 
> I checked the log of drivers/hwtracing and seems patches of 
> coresight/intel_th/stm
> applied by different maintainers and I see you applied some patches of 
> intel_th/stm.
> Should any of these three maintainers or you can help applied this?

I was hoping Mark would have a look, since he knows this ARM stuff
better than me. But ISTR he's somewhat busy atm too. But an ACK from the
CoreSight people would also be appreciated.

And Arnaldo usually doesn't pick up the userspace perf bits until the
kernel side is sorted.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/10] dt-bindings: display: tegra: Convert to json-schema

2022-06-27 Thread Rob Herring
On Fri, Jun 24, 2022 at 11:26 AM Rob Herring  wrote:
>
> On Tue, 21 Jun 2022 18:10:14 +0300, Mikko Perttunen wrote:
> > From: Thierry Reding 
> >
> > Convert the Tegra host1x controller bindings from the free-form text
> > format to json-schema.
> >
> > This also adds the missing display-hub DT bindings that were not
> > previously documented.
> >
> > Reviewed-by: Rob Herring 
> > Signed-off-by: Thierry Reding 
> > ---
> >  .../display/tegra/nvidia,tegra114-mipi.txt|  41 --
> >  .../display/tegra/nvidia,tegra114-mipi.yaml   |  74 ++
> >  .../display/tegra/nvidia,tegra124-dpaux.yaml  | 149 
> >  .../display/tegra/nvidia,tegra124-sor.yaml| 206 ++
> >  .../display/tegra/nvidia,tegra124-vic.yaml|  71 ++
> >  .../display/tegra/nvidia,tegra186-dc.yaml |  85 +++
> >  .../tegra/nvidia,tegra186-display.yaml| 310 
> >  .../tegra/nvidia,tegra186-dsi-padctl.yaml |  45 ++
> >  .../display/tegra/nvidia,tegra20-dc.yaml  | 181 +
> >  .../display/tegra/nvidia,tegra20-dsi.yaml | 159 +
> >  .../display/tegra/nvidia,tegra20-epp.yaml |  70 ++
> >  .../display/tegra/nvidia,tegra20-gr2d.yaml|  73 ++
> >  .../display/tegra/nvidia,tegra20-gr3d.yaml| 214 ++
> >  .../display/tegra/nvidia,tegra20-hdmi.yaml| 126 
> >  .../display/tegra/nvidia,tegra20-host1x.txt   | 675 --
> >  .../display/tegra/nvidia,tegra20-host1x.yaml  | 347 +
> >  .../display/tegra/nvidia,tegra20-isp.yaml |  67 ++
> >  .../display/tegra/nvidia,tegra20-mpe.yaml |  73 ++
> >  .../display/tegra/nvidia,tegra20-tvo.yaml |  58 ++
> >  .../display/tegra/nvidia,tegra20-vi.yaml  | 163 +
> >  .../display/tegra/nvidia,tegra210-csi.yaml|  52 ++
> >  .../pinctrl/nvidia,tegra124-dpaux-padctl.txt  |  59 --
> >  22 files changed, 2523 insertions(+), 775 deletions(-)
> >  delete mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra114-mipi.txt
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra114-mipi.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-dpaux.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-vic.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-display.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dsi-padctl.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dsi.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-epp.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr2d.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr3d.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-hdmi.yaml
> >  delete mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-isp.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-mpe.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-tvo.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-vi.yaml
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/tegra/nvidia,tegra210-csi.yaml
> >  delete mode 100644 
> > Documentation/devicetree/bindings/pinctrl/nvidia,tegra124-dpaux-padctl.txt
> >
>
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
>
> yamllint warnings/errors:
>
> dtschema/dtc warnings/errors:
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml:
>  allOf:1:if:not:properties: {'contains': {'const': 'nvidia,panel'}} should 
> not be valid under {'$ref': '#/definitions/sub-schemas'}
> hint: A json-schema keyword was found instead of a DT property name.
> from schema $id: http://devicetree.org/meta-schemas/keywords.yaml#
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml:
>  ignoring, error in schema: allOf: 1: if: not: properties
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr3d.example.dtb:
>  

Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device

2022-06-27 Thread Yicong Yang via iommu
On 2022/6/27 21:12, Greg KH wrote:
> On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote:
>> Hi Greg,
>>
>> Since the kernel side of this device has been reviewed for 8 versions with
>> all comments addressed and no more comment since v9 posted in 5.19-rc1,
>> is it ok to merge it first (for Patch 1-3 and 7-8)?
> 
> I am not the maintainer of this subsystem, so I do not understand why
> you are asking me :(
> 

I checked the log of drivers/hwtracing and seems patches of 
coresight/intel_th/stm
applied by different maintainers and I see you applied some patches of 
intel_th/stm.
Should any of these three maintainers or you can help applied this?

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device

2022-06-27 Thread Greg KH
On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote:
> Hi Greg,
> 
> Since the kernel side of this device has been reviewed for 8 versions with
> all comments addressed and no more comment since v9 posted in 5.19-rc1,
> is it ok to merge it first (for Patch 1-3 and 7-8)?

I am not the maintainer of this subsystem, so I do not understand why
you are asking me :(

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling

2022-06-27 Thread Ethan Zhao

Hi,

在 2022/6/21 22:43, Lu Baolu 写道:

Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.

The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() waits for pending faults with
iopf_queue_flush_dev() before freeing the domain. Hence, there's no need
to synchronize life cycle of the iommu domains between the unbind() and
the interrupt threads.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
  drivers/iommu/io-pgfault.c | 64 +-
  1 file changed, 7 insertions(+), 57 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..4f24ec703479 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct 
iopf_fault *iopf,
return iommu_page_response(dev, &resp);
  }
  
-static enum iommu_page_response_code

-iopf_handle_single(struct iopf_fault *iopf)
-{
-   vm_fault_t ret;
-   struct mm_struct *mm;
-   struct vm_area_struct *vma;
-   unsigned int access_flags = 0;
-   unsigned int fault_flags = FAULT_FLAG_REMOTE;
-   struct iommu_fault_page_request *prm = &iopf->fault.prm;
-   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
-   if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
-   return status;
-
-   mm = iommu_sva_find(prm->pasid);
-   if (IS_ERR_OR_NULL(mm))
-   return status;
-
-   mmap_read_lock(mm);
-
-   vma = find_extend_vma(mm, prm->addr);
-   if (!vma)
-   /* Unmapped area */
-   goto out_put_mm;
-
-   if (prm->perm & IOMMU_FAULT_PERM_READ)
-   access_flags |= VM_READ;
-
-   if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
-   access_flags |= VM_WRITE;
-   fault_flags |= FAULT_FLAG_WRITE;
-   }
-
-   if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
-   access_flags |= VM_EXEC;
-   fault_flags |= FAULT_FLAG_INSTRUCTION;
-   }
-
-   if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
-   fault_flags |= FAULT_FLAG_USER;
-
-   if (access_flags & ~vma->vm_flags)
-   /* Access fault */
-   goto out_put_mm;
-
-   ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
-   status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
-   IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
-   mmap_read_unlock(mm);
-   mmput(mm);
-
-   return status;
-}
-


Once the iopf_handle_single() is removed, the name of 
iopf_handle_group() looks a little weired


and confused, does this group mean the iommu group (domain) ? while I 
take some minutes to


look into the code, oh, means a batch / list / queue  of iopfs , and 
iopf_handle_group() becomes a


generic iopf_handler() .

Doe it make sense to revise the names of iopf_handle_group(), 
iopf_complete_group,  iopf_group in


this patch set ?


Thanks,

Ethan


  static void iopf_handle_group(struct work_struct *work)
  {
struct iopf_group *group;
+   struct iommu_domain *domain;
struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
  
  	group = container_of(work, struct iopf_group, work);

+   domain = iommu_get_domain_for_dev_pasid(group->dev,
+   group->last_fault.fault.prm.pasid);
+   if (!domain || !domain->iopf_handler)
+   status = IOMMU_PAGE_RESP_INVALID;
  
  	list_for_each_entry_safe(iopf, next, &group->faults, list) {

/*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
 * faults in the group if there is an error.
 */
if (status == IOMMU_PAGE_RESP_SUCCESS)
-   status = iopf_handle_single(iopf);
+   status = domain->iopf_handler(&iopf->fault,
+ domain->fault_data);
  
  		if (!(iopf->fault.prm.flags &

  IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))


--
"firm, enduring, strong, and long-lived"

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] ACPI: VIOT: Fix ACS setup

2022-06-27 Thread Eric Auger
Currently acpi_viot_init() gets called after the pci
device has been scanned and pci_enable_acs() has been called.
So pci_request_acs() fails to be taken into account leading
to wrong single iommu group topologies when dealing with
multi-function root ports for instance.

We cannot simply move the acpi_viot_init() earlier, similarly
as the IORT init because the VIOT parsing relies on the pci
scan. However we can detect VIOT is present earlier and in
such a case, request ACS. Introduce a new acpi_viot_early_init()
routine that allows to call pci_request_acs() before the scan.

Fixes: 3cf485540e7b ("ACPI: Add driver for the VIOT table")
Signed-off-by: Eric Auger 
Reported-by: Jin Liu 
---
 drivers/acpi/bus.c|  1 +
 drivers/acpi/viot.c   | 23 +--
 include/linux/acpi_viot.h |  2 ++
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 86fa61a21826..906ad8153fd9 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1400,6 +1400,7 @@ static int __init acpi_init(void)
 
pci_mmcfg_late_init();
acpi_iort_init();
+   acpi_viot_early_init();
acpi_hest_init();
acpi_ghes_init();
acpi_scan_init();
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
index d2256326c73a..3c1be123e4d6 100644
--- a/drivers/acpi/viot.c
+++ b/drivers/acpi/viot.c
@@ -248,6 +248,23 @@ static int __init viot_parse_node(const struct 
acpi_viot_header *hdr)
return ret;
 }
 
+/**
+ * acpi_viot_early_init - Test the presence of VIOT and enable ACS
+ *
+ * If the VIOT does exist, ACS must be enabled. This cannot be
+ * done in acpi_viot_init() which is called after the bus scan
+ */
+void __init acpi_viot_early_init(void)
+{
+   acpi_status status;
+   struct acpi_table_header *hdr;
+
+   status = acpi_get_table(ACPI_SIG_VIOT, 0, &hdr);
+   if (!ACPI_FAILURE(status))
+   pci_request_acs();
+   acpi_put_table(hdr);
+}
+
 /**
  * acpi_viot_init - Parse the VIOT table
  *
@@ -319,12 +336,6 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, 
u16 dev_id, void *data)
epid = ((domain_nr - ep->segment_start) << 16) +
dev_id - ep->bdf_start + ep->endpoint_id;
 
-   /*
-* If we found a PCI range managed by the viommu, we're
-* the one that has to request ACS.
-*/
-   pci_request_acs();
-
return viot_dev_iommu_init(&pdev->dev, ep->viommu,
   epid);
}
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
index 1eb8ee5b0e5f..e58d60f8ff2e 100644
--- a/include/linux/acpi_viot.h
+++ b/include/linux/acpi_viot.h
@@ -6,10 +6,12 @@
 #include 
 
 #ifdef CONFIG_ACPI_VIOT
+void __init acpi_viot_early_init(void);
 void __init acpi_viot_init(void);
 int viot_iommu_configure(struct device *dev);
 #else
 static inline void acpi_viot_init(void) {}
+static inline void acpi_viot_early_init(void) {}
 static inline int viot_iommu_configure(struct device *dev)
 {
return -ENODEV;
-- 
2.35.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

2022-06-27 Thread Robin Murphy

On 2022-06-24 16:44, Shameerali Kolothum Thodi via iommu wrote:




-Original Message-
From: Steven Price [mailto:steven.pr...@arm.com]
Sent: 17 June 2022 13:42
To: Shameerali Kolothum Thodi ;
linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: Linuxarm ; lorenzo.pieral...@arm.com;
j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
; Guohanjun (Hanjun Guo)
; sami.muja...@arm.com; j...@solid-run.com;
eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org
Subject: Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

On 15/06/2022 11:10, Shameer Kolothum wrote:

Hi

v12 --> v13
   -No changes. Rebased to 5.19-rc1.
   -Picked up tags received from Laurentiu, Hanjun and Will. Thanks!.


You've already got my Tested-by tags, but just to confirm I gave this a
spin and it works fine.


Thanks Steve.

I think the series is now in a good shape to be merged.

Hi Will/Robin,

Appreciate, if you could please take a look at the remaining SMMU related
patches(7-9) and provide your approval?


I said v12 looked fine, but for the avoidance of doubt, here it is 
again, as formally as can be:


Acked-by: Robin Murphy 

Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support

2022-06-27 Thread Zhangfei Gao



On 2022/6/21 下午10:43, Lu Baolu wrote:

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 


Tested-by: Zhangfei Gao 
Have tested the series on aarch64.

Thanks


---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  6 ++
  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 69 +++
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  3 +
  3 files changed, 78 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d2ba86470c42..96399dd3a67a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, 
struct mm_struct *mm);
  void arm_smmu_sva_unbind(struct iommu_sva *handle);
  u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
  void arm_smmu_sva_notifier_synchronize(void);
+struct iommu_domain *arm_smmu_sva_domain_alloc(void);
  #else /* CONFIG_ARM_SMMU_V3_SVA */
  static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
  {
@@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
  }
  
  static inline void arm_smmu_sva_notifier_synchronize(void) {}

+
+static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+   return NULL;
+}
  #endif /* CONFIG_ARM_SMMU_V3_SVA */
  #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f155d406c5d5..fc4555dac5b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void)
 */
mmu_notifier_synchronize();
  }
+
+static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+   int ret = 0;
+   struct mm_struct *mm;
+   struct iommu_sva *handle;
+
+   if (domain->type != IOMMU_DOMAIN_SVA)
+   return -EINVAL;
+
+   mm = domain->mm;
+   if (WARN_ON(!mm))
+   return -ENODEV;
+
+   mutex_lock(&sva_lock);
+   handle = __arm_smmu_sva_bind(dev, mm);
+   if (IS_ERR(handle))
+   ret = PTR_ERR(handle);
+   mutex_unlock(&sva_lock);
+
+   return ret;
+}
+
+static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
+struct device *dev, ioasid_t id)
+{
+   struct mm_struct *mm = domain->mm;
+   struct arm_smmu_bond *bond = NULL, *t;
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+   mutex_lock(&sva_lock);
+   list_for_each_entry(t, &master->bonds, list) {
+   if (t->mm == mm) {
+   bond = t;
+   break;
+   }
+   }
+
+   if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) {
+   list_del(&bond->list);
+   arm_smmu_mmu_notifier_put(bond->smmu_mn);
+   kfree(bond);
+   }
+   mutex_unlock(&sva_lock);
+}
+
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+   kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+   .set_dev_pasid  = arm_smmu_sva_set_dev_pasid,
+   .block_dev_pasid= arm_smmu_sva_block_dev_pasid,
+   .free   = arm_smmu_sva_domain_free,
+};
+
+struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+   struct iommu_domain *domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (!domain)
+   return NULL;
+   domain->ops = &arm_smmu_sva_domain_ops;
+
+   return domain;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ae8ec8df47c1..a30b252e2f95 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1999,6 +1999,9 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
  {
struct arm_smmu_domain *smmu_domain;
  
+	if (type == IOMMU_DOMAIN_SVA)

+   return arm_smmu_sva_domain_alloc();
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device

2022-06-27 Thread Yicong Yang via iommu
Hi Greg,

Since the kernel side of this device has been reviewed for 8 versions with
all comments addressed and no more comment since v9 posted in 5.19-rc1,
is it ok to merge it first (for Patch 1-3 and 7-8)?

Thanks.

On 2022/6/6 19:55, Yicong Yang wrote:
> HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
> integrated Endpoint (RCiEP) device, providing the capability
> to dynamically monitor and tune the PCIe traffic (tune),
> and trace the TLP headers (trace).
> 
> PTT tune is designed for monitoring and adjusting PCIe link parameters.
> We provide several parameters of the PCIe link. Through the driver,
> user can adjust the value of certain parameter to affect the PCIe link
> for the purpose of enhancing the performance in certian situation.
> 
> PTT trace is designed for dumping the TLP headers to the memory, which
> can be used to analyze the transactions and usage condition of the PCIe
> Link. Users can choose filters to trace headers, by either requester
> ID, or those downstream of a set of Root Ports on the same core of the
> PTT device. It's also supported to trace the headers of certain type and
> of certain direction.
> 
> The driver registers a PMU device for each PTT device. The trace can
> be used through `perf record` and the traced headers can be decoded
> by `perf report`. The perf command support for the device is also
> added in this patchset. The tune can be used through the sysfs
> attributes of related PMU device. See the documentation for the
> detailed usage.
> 
> Change since v8:
> - Cleanups and one minor fix from Jonathan and John, thanks
> Link: 
> https://lore.kernel.org/lkml/20220516125223.32012-1-yangyic...@hisilicon.com/
> 
> Change since v7:
> - Configure the DMA in probe rather than in runtime. Also use devres to manage
>   PMU device as we have no order problem now
> - Refactor the config validation function per John and Leo
> - Use a spinlock hisi_ptt::pmu_lock instead of mutex to serialize the perf 
> process
>   in pmu::start as it's in atomic context
> - Only commit the traced data when stop, per Leo and James
> - Drop the filter dynamically updating patch from this series to simply the 
> review
>   of the driver. That patch will be send separately.
> - add a cpumask sysfs attribute and handle the cpu hotplug events, follow the
>   uncore PMU convention
> - Other cleanups and fixes, both in driver and perf tool
> Link: 
> https://lore.kernel.org/lkml/20220407125841.3678-1-yangyic...@hisilicon.com/
> 
> Change since v6:
> - Fix W=1 errors reported by lkp test, thanks
> 
> Change since v5:
> - Squash the PMU patch into PATCH 2 suggested by John
> - refine the commit message of PATCH 1 and some comments
> Link: 
> https://lore.kernel.org/lkml/20220308084930.5142-1-yangyic...@hisilicon.com/
> 
> Change since v4:
> Address the comments from Jonathan, John and Ma Ca, thanks.
> - Use devm* also for allocating the DMA buffers
> - Remove the IRQ handler stub in Patch 2
> - Make functions waiting for hardware state return boolean
> - Manual remove the PMU device as it should be removed first
> - Modifier the orders in probe and removal to make them matched well
> - Make available {directions,type,format} array const and non-global
> - Using the right filter list in filters show and well protect the
>   list with mutex
> - Record the trace status with a boolean @started rather than enum
> - Optimize the process of finding the PTT devices of the perf-tool
> Link: 
> https://lore.kernel.org/linux-pci/20220221084307.33712-1-yangyic...@hisilicon.com/
> 
> Change since v3:
> Address the comments from Jonathan and John, thanks.
> - drop members in the common struct which can be get on the fly
> - reduce buffer struct and organize the buffers with array instead of list
> - reduce the DMA reset wait time to avoid long time busy loop
> - split the available_filters sysfs attribute into two files, for root port
>   and requester respectively. Update the documentation accordingly
> - make IOMMU mapping check earlier in probe to avoid race condition. Also
>   make IOMMU quirk patch prior to driver in the series
> - Cleanups and typos fixes from John and Jonathan
> Link: 
> https://lore.kernel.org/linux-pci/20220124131118.17887-1-yangyic...@hisilicon.com/
> 
> Change since v2:
> - address the comments from Mathieu, thanks.
>   - rename the directory to ptt to match the function of the device
>   - spinoff the declarations to a separate header
>   - split the trace function to several patches
>   - some other comments.
> - make default smmu domain type of PTT device to identity
>   Drop the RMR as it's not recommended and use an iommu_def_domain_type
>   quirk to passthrough the device DMA as suggested by Robin. 
> Link: 
> https://lore.kernel.org/linux-pci/2026090625.53702-1-yangyic...@hisilicon.com/
> 
> Change since v1:
> - switch the user interface of trace to perf from debugfs
> - switch the user interface of tune to sysfs from debugfs
> - add perf tool support to

Re: [PATCH v1 3/7] iommu/amd: Fix sparse warning

2022-06-27 Thread Vasant Hegde via iommu


On 6/23/2022 3:12 PM, Robin Murphy wrote:
> On 2022-06-23 09:03, Joerg Roedel wrote:
>> On Fri, Jun 03, 2022 at 04:51:03PM +0530, Vasant Hegde wrote:
>>> Fix below sparse warning:
>>>    CHECK   drivers/iommu/amd/iommu.c
>>>    drivers/iommu/amd/iommu.c:73:24: warning: symbol 'amd_iommu_ops' was not 
>>> declared. Should it be static?
>>>
>>> Also we are going to introduce v2 page table which has different
>>> pgsize_bitmaps. Hence remove 'const' qualifier.
>>
>> I am not a fan of removing the consts. Please use separate ops
>> structures for v2 page-tables and make then const as well. This probably
>> also has some optimization potential in the future when we can make the
>> ops call-back functions page-table specific.
> 
> TBH it's probably time to retire iommu_ops->pgsize_bitmap anyway. At the very 
> least it would be logical to move it to iommu_domain_ops now, but maybe we 
> could skip ahead and just rely on drivers initialising domain->pgsize_bitmap 
> directly in their .domain_alloc?
> 

Robin,

Something like below? If yes, I will cleanup and get proper fix.


-Vasant


diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 840831d5d2ad..32dd84a7c1da 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1916,6 +1916,7 @@ static int protection_domain_init_v1(struct 
protection_domain *domain, int mode)
return -ENOMEM;
}
 
+   domain->domain.pgsize_bitmap= AMD_IOMMU_PGSIZES;
amd_iommu_domain_set_pgtable(domain, pt_root, mode);
 
return 0;
@@ -2282,7 +2283,6 @@ const struct iommu_ops amd_iommu_ops = {
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
-   .pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.def_domain_type = amd_iommu_def_domain_type,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = amd_iommu_attach_device,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 847ad47a2dfd..73cfba6a6728 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1915,8 +1915,6 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
return NULL;
 
domain->type = type;
-   /* Assume all sizes by default; the driver may override this later */
-   domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
if (!domain->ops)
domain->ops = bus->iommu_ops->default_domain_ops;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e1afe169549..0c028aa71b96 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -255,7 +255,6 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);
 
const struct iommu_domain_ops *default_domain_ops;
-   unsigned long pgsize_bitmap;
struct module *owner;
 };
 


> (and FWIW I'm leaning towards the same for the domain->ops as well; the more 
> I look at ops->default_domain_ops, the more it starts looking like a stupid 
> mess... my stupid mess... sorry about that)
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-06-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct
> mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + ioasid_t max_pasids;
> + int ret = -EINVAL;
> +
> + /* Allocate mm->pasid if necessary. */

this comment is for iommu_sva_alloc_pasid()

> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +


...
> +void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> + struct device *dev = handle->dev;
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> + ioasid_t pasid = iommu_sva_get_pasid(handle);
> +
> + mutex_lock(&iommu_sva_lock);
> + if (refcount_dec_and_test(&domain->bond.users)) {
> + iommu_detach_device_pasid(domain, dev, pasid);
> + iommu_domain_free(domain);
> + }
> + mutex_unlock(&iommu_sva_lock);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> +
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + struct iommu_domain *domain =
> + container_of(handle, struct iommu_domain, bond);
> +
> + return domain->mm->pasid;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);

Looks this is only used by unbind_device. Just open code it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-06-27 Thread Tony Lindgren
* Saravana Kannan  [220623 08:17]:
> On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren  wrote:
> >
> > * Saravana Kannan  [220622 19:05]:
> > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren  wrote:
> > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > >
> > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > was deleted because fw_devlink=on should have short circuited the
> > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > probe function and hitting this -ENOENT failure. That's why I was
> > > asking the other questions.
> >
> > OK. So where is the -EPROBE_DEFER supposed to happen without
> > driver_deferred_probe_check_state() then?
> 
> device_links_check_suppliers() call inside really_probe() would short
> circuit and return an -EPROBE_DEFER if the device links are created as
> expected.

OK

> > Hmm so I'm not seeing any supplier for the top level ocp device in
> > the booting case without your patches. I see the suppliers for the
> > ocp child device instances only.
> 
> Hmmm... this is strange (that the device link isn't there), but this
> is what I suspected.

Yup, maybe it's because of the supplier being a device in the child
interconnect for the ocp.

> Now we need to figure out why it's missing. There are only a few
> things that could cause this and I don't see any of those. I already
> checked to make sure the power domain in this instance had a proper
> driver with a probe() function -- if it didn't, then that's one thing
> that'd could have caused the missing device link. The device does seem
> to have a proper driver, so looks like I can rule that out.
> 
> Can you point me to the dts file that corresponds to the specific
> board you are testing this one? I probably won't find anything, but I
> want to rule out some of the possibilities.

You can use the beaglebone black dts for example, that's
arch/arm/boot/dts/am335x-boneblack.dts and uses am33xx.dtsi for
ocp interconnect with simple-pm-bus.

> All the device link creation logic is inside drivers/base/core.c. So
> if you can look at the existing messages or add other stuff to figure
> out why the device link isn't getting created, that'd be handy. In
> either case, I'll continue staring at the DT and code to see what
> might be happening here.

In device_links_check_suppliers() I see these ocp suppliers:

platform ocp: device_links_check_suppliers: 1024: supplier 44e00d00.prm: 
link->status: 0 link->flags: 01c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e01000.prm: 
link->status: 0 link->flags: 01c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e00c00.prm: 
link->status: 0 link->flags: 01c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e00e00.prm: 
link->status: 0 link->flags: 01c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e01100.prm: 
link->status: 0 link->flags: 01c0
platform ocp: device_links_check_suppliers: 1024: supplier fixedregulator0: 
link->status: 1 link->flags: 01c0

No -EPROBE_DEFER is returned in device_links_check_suppliers() for
44e00c00.prm supplier for beaglebone black for example, 0 gets
returned.

Regards,

Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/dma: Fix missing mutex_init() in iommu_get_msi_cookie()

2022-06-27 Thread Yang Yingliang via iommu
cookie_alloc() is called by iommu_get_dma_cookie() and iommu_get_msi_cookie(),
but the mutex is only initialized in iommu_get_dma_cookie(), move mutex_init()
into cookie_alloc() to make sure the mutex will be initialized.

Fixes: ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain 
initialization")
Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
---
 drivers/iommu/dma-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1910f4f1612b..e29157380c48 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -294,6 +294,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
if (cookie) {
INIT_LIST_HEAD(&cookie->msi_page_list);
cookie->type = type;
+   mutex_init(&cookie->mutex);
}
return cookie;
 }
@@ -311,7 +312,6 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
if (!domain->iova_cookie)
return -ENOMEM;
 
-   mutex_init(&domain->iova_cookie->mutex);
return 0;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-27 Thread Geert Uytterhoeven
Hi Michael,

On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz  wrote:
> Am 18.06.2022 um 00:57 schrieb Arnd Bergmann:
> > From: Arnd Bergmann 
> >
> > All architecture-independent users of virt_to_bus() and bus_to_virt()
> > have been fixed to use the dma mapping interfaces or have been
> > removed now.  This means the definitions on most architectures, and the
> > CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.
> >
> > The only exceptions to this are a few network and scsi drivers for m68k
> > Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
> > with the old interfaces and are probably not worth changing.
>
> The Amiga SCSI drivers are all old WD33C93 ones, and replacing
> virt_to_bus by virt_to_phys in the dma_setup() function there would
> cause no functional change at all.

FTR, the sgiwd93 driver use dma_map_single().

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v9 05/11] iommu/vt-d: Add SVA domain support

2022-06-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Add support for SVA domain allocation and provide an SVA-specific
> iommu_domain_ops.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v9 04/11] iommu: Add sva iommu_domain support

2022-06-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> The sva iommu_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
> 
> - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA
> domain
>   type. The IOMMU drivers that support SVA should provide the sva
>   domain specific iommu_domain_ops.
> - Add a helper to allocate an SVA domain. The iommu_domain_free()
>   is still used to free an SVA domain.
> - Add helpers to attach an SVA domain to a device and the reverse
>   operation.
> 
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, the attach/detach interfaces only apply to devices
> belonging to the singleton groups, and the singleton is immutable in
> fabric i.e. not affected by hotplug.
> 
> The iommu_attach/detach_device_pasid() can be used for other purposes,
> such as kernel DMA with pasid, mediation device, etc.

I'd split this into two patches. One for adding iommu_attach/
detach_device_pasid() and set/block_dev_pasid ops, and the
other for adding SVA.

>  struct iommu_domain {
>   unsigned type;
>   const struct iommu_domain_ops *ops;
>   unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
> - iommu_fault_handler_t handler;
> - void *handler_token;
>   struct iommu_domain_geometry geometry;
>   struct iommu_dma_cookie *iova_cookie;
> + union {
> + struct {/* IOMMU_DOMAIN_DMA */
> + iommu_fault_handler_t handler;
> + void *handler_token;
> + };

why is it DMA domain specific? What about unmanaged 
domain? Unrecoverable fault can happen on any type
including SVA. Hence I think above should be domain type
agnostic.

> + struct {/* IOMMU_DOMAIN_SVA */
> + struct mm_struct *mm;
> + };
> + };
>  };
> 



> +
> +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
> + struct mm_struct *mm)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_domain *domain;
> +
> + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
> + if (!domain)
> + return NULL;
> +
> + domain->type = IOMMU_DOMAIN_SVA;

It's a bit weird that the type has been specified when calling
ops->domain_alloc while it still leaves to the caller to set the
type. But this is not caused by this series. could be cleaned
up separately.

> +
> + mutex_lock(&group->mutex);
> + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> GFP_KERNEL);
> + if (curr)
> + goto out_unlock;

Need check xa_is_err(old).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 5/8] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-06-27 Thread Yicong Yang via iommu
On 2022/6/27 10:02, Leo Yan wrote:
> On Mon, Jun 06, 2022 at 07:55:52PM +0800, Yicong Yang wrote:
>> From: Qi Liu 
>>
>> HiSilicon PCIe tune and trace device (PTT) could dynamically tune
>> the PCIe link's events, and trace the TLP headers).
>>
>> This patch add support for PTT device in perf tool, so users could
>> use 'perf record' to get TLP headers trace data.
>>
>> Signed-off-by: Qi Liu 
>> Signed-off-by: Yicong Yang 
> 
> Just one minor comment.
> 
> [...]
> 
>> diff --git a/tools/perf/util/hisi-ptt.h b/tools/perf/util/hisi-ptt.h
>> new file mode 100644
>> index ..2db9b4056214
>> --- /dev/null
>> +++ b/tools/perf/util/hisi-ptt.h
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * HiSilicon PCIe Trace and Tuning (PTT) support
>> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
>> + */
>> +
>> +#ifndef INCLUDE__PERF_HISI_PTT_H__
>> +#define INCLUDE__PERF_HISI_PTT_H__
>> +
>> +#define HISI_PTT_PMU_NAME   "hisi_ptt"
>> +#define HISI_PTT_AUXTRACE_PRIV_SIZE sizeof(u64)
>> +
>> +struct auxtrace_record *hisi_ptt_recording_init(int *err,
>> +struct perf_pmu *hisi_ptt_pmu);
>> +
>> +int hisi_ptt_process_auxtrace_info(union perf_event *event,
>> +   struct perf_session *session);
> 
> The function hisi_ptt_process_auxtrace_info() is introduced by next
> patch for support PTT decoding, for good practice (e.g. keep
> bisection), it's good to introduce function declaration and definition
> in one patch.
> 

Thanks for catching this. There's something wrong when doing the patch 
splitting.
Will fix this!

> With this fixing, this patch looks good to me:
> 
> Reviewed-by: Leo Yan 
> 

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-27 Thread Michael Schmitz

Arnd,

Am 26.06.2022 um 20:36 schrieb Arnd Bergmann:

There are no platform specific header files other than asm/amigahw.h and
asm/mvme147hw.h, currently only holding register address definitions.
Would it be OK to add m68k_virt_to_bus() in there if it can't remain in
asm/virtconvert.h, Geert?


In that case, I would just leave it under the current name and not change
m68k at all. I don't like the m68k_virt_to_bus() name because there is
not anything CPU specific in what it does, and keeping it in a common
header does nothing to prevent it from being used on other platforms
either.


Fair enough.


32bit powerpc is a different matter though.


It's similar, but unrelated. The two apple ethernet drivers
(bmac and mace) can again either get changed to use the
dma-mapping interfaces, or get a custom pmac_virt_to_bus()/
pmac_bus_to_virt() helper.


Hmmm - I see Finn had done the DMA API conversion on macmace.c which
might give some hints on what to do about mace.c ... no idea about
bmac.c though. And again, haven't got hardware to test, so custom
helpers is it, then.


Ok.


Again, no platform specific headers to shift renamed helpers to, so may 
as well keep this as-is.


Cheers,

Michael




  Arnd


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v9 02/11] iommu: Add max_pasids field in struct dev_iommu

2022-06-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Use this field to save the number of PASIDs that a device is able to
> consume. It is a generic attribute of a device and lifting it into the
> per-device dev_iommu struct could help to avoid the boilerplate code
> in various IOMMU drivers.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h |  2 ++
>  drivers/iommu/iommu.c | 20 
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 03fbb1b71536..d50afb2c9a09 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -364,6 +364,7 @@ struct iommu_fault_param {
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
>   * @priv: IOMMU Driver private data
> + * @max_pasids:  number of PASIDs device can consume

... PASIDs *this* device can consume

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v9 01/11] iommu: Add max_pasids field in struct iommu_device

2022-06-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 21, 2022 10:44 PM
> 
> Use this field to keep the number of supported PASIDs that an IOMMU
> hardware is able to support. This is a generic attribute of an IOMMU
> and lifting it into the per-IOMMU device structure makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver that supports PASID related features should set this
> field before enabling them on the devices.
> 
> In the Intel IOMMU driver, intel_iommu_sm is moved to
> CONFIG_INTEL_IOMMU
> enclave so that the pasid_supported() helper could be used in dmar.c
> without compilation errors.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 2/2] vfio: Use device_iommu_capable()

2022-06-27 Thread Tian, Kevin
> From: Robin Murphy
> Sent: Saturday, June 25, 2022 2:00 AM
> 
> Use the new interface to check the capabilities for our device
> specifically.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Robin Murphy 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination

2022-06-27 Thread Tian, Kevin
> From: Robin Murphy
> Sent: Saturday, June 25, 2022 1:52 AM
> 
> Since IOMMU groups are mandatory for drivers to support, it stands to
> reason that any device which has been successfully added to a group
> must be on a bus supported by that IOMMU driver, and therefore a domain
> viable for any device in the group must be viable for all devices in
> the group. This already has to be the case for the IOMMU API's internal
> default domain, for instance. Thus even if the group contains devices on
> different buses, that can only mean that the IOMMU driver actually
> supports such an odd topology, and so without loss of generality we can
> expect the bus type of any device in a group to be suitable for IOMMU
> API calls.
> 
> Furthermore, scrutiny reveals a lack of protection for the bus being
> removed while vfio_iommu_type1_attach_group() is using it; the reference
> that VFIO holds on the iommu_group ensures that data remains valid, but
> does not prevent the group's membership changing underfoot.
> 
> We can address both concerns by recycling vfio_bus_type() into some
> superficially similar logic to indirect the IOMMU API calls themselves.
> Each call is thus protected from races by the IOMMU group's own locking,
> and we no longer need to hold group-derived pointers beyond that scope.
> It also gives us an easy path for the IOMMU API's migration of bus-based
> interfaces to device-based, of which we can already take the first step
> with device_iommu_capable(). As with domains, any capability must in
> practice be consistent for devices in a given group - and after all it's
> still the same capability which was expected to be consistent across an
> entire bus! - so there's no need for any complicated validation.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> v3: Complete rewrite yet again, and finally it doesn't feel like we're
> stretching any abstraction boundaries the wrong way, and the diffstat
> looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP
> directly in the callback, but decided I like the consistency of minimal
> generic wrappers. And yes, if the capability isn't supported then it
> does end up getting tested for the whole group, but meh, it's harmless.
> 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu