Re: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling
Hi Ethan, On 2022/6/27 21:03, Ethan Zhao wrote: Hi, 在 2022/6/21 22:43, Lu Baolu 写道: Tweak the I/O page fault handling framework to route the page faults to the domain and call the page fault handler retrieved from the domain. This makes the I/O page fault handling framework possible to serve more usage scenarios as long as they have an IOMMU domain and install a page fault handler in it. Some unused functions are also removed to avoid dead code. The iommu_get_domain_for_dev_pasid() which retrieves attached domain for a {device, PASID} pair is used. It will be used by the page fault handling framework which knows {device, PASID} reported from the iommu driver. We have a guarantee that the SVA domain doesn't go away during IOPF handling, because unbind() waits for pending faults with iopf_queue_flush_dev() before freeing the domain. Hence, there's no need to synchronize life cycle of the iommu domains between the unbind() and the interrupt threads. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker --- drivers/iommu/io-pgfault.c | 64 +- 1 file changed, 7 insertions(+), 57 deletions(-) diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index aee9e033012f..4f24ec703479 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf, return iommu_page_response(dev, &resp); } -static enum iommu_page_response_code -iopf_handle_single(struct iopf_fault *iopf) -{ - vm_fault_t ret; - struct mm_struct *mm; - struct vm_area_struct *vma; - unsigned int access_flags = 0; - unsigned int fault_flags = FAULT_FLAG_REMOTE; - struct iommu_fault_page_request *prm = &iopf->fault.prm; - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; - - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) - return status; - - mm = iommu_sva_find(prm->pasid); - if (IS_ERR_OR_NULL(mm)) - return status; - - mmap_read_lock(mm); - - vma = find_extend_vma(mm, prm->addr); - if (!vma) - /* Unmapped area */ - goto out_put_mm; - - if (prm->perm & IOMMU_FAULT_PERM_READ) - access_flags |= VM_READ; - - if (prm->perm & IOMMU_FAULT_PERM_WRITE) { - access_flags |= VM_WRITE; - fault_flags |= FAULT_FLAG_WRITE; - } - - if (prm->perm & IOMMU_FAULT_PERM_EXEC) { - access_flags |= VM_EXEC; - fault_flags |= FAULT_FLAG_INSTRUCTION; - } - - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) - fault_flags |= FAULT_FLAG_USER; - - if (access_flags & ~vma->vm_flags) - /* Access fault */ - goto out_put_mm; - - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL); - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : - IOMMU_PAGE_RESP_SUCCESS; - -out_put_mm: - mmap_read_unlock(mm); - mmput(mm); - - return status; -} - Once the iopf_handle_single() is removed, the name of iopf_handle_group() looks a little weired and confused, does this group mean the iommu group (domain) ? while I take some minutes to No. This is not the iommu group. It's page request group defined by the PCI SIG spec. Multiple page requests could be put in a group with a same group id. All page requests in a group could be responded to device in one shot. Best regards, baolu look into the code, oh, means a batch / list / queue of iopfs , and iopf_handle_group() becomes a generic iopf_handler() . Doe it make sense to revise the names of iopf_handle_group(), iopf_complete_group, iopf_group in this patch set ? Thanks, Ethan static void iopf_handle_group(struct work_struct *work) { struct iopf_group *group; + struct iommu_domain *domain; struct iopf_fault *iopf, *next; enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; group = container_of(work, struct iopf_group, work); + domain = iommu_get_domain_for_dev_pasid(group->dev, + group->last_fault.fault.prm.pasid); + if (!domain || !domain->iopf_handler) + status = IOMMU_PAGE_RESP_INVALID; list_for_each_entry_safe(iopf, next, &group->faults, list) { /* @@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work) * faults in the group if there is an error. */ if (status == IOMMU_PAGE_RESP_SUCCESS) - status = iopf_handle_single(iopf); + status = domain->iopf_handler(&iopf->fault, + domain->fault_data); if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support
On 2022/6/27 19:50, Zhangfei Gao wrote: On 2022/6/21 下午10:43, Lu Baolu wrote: Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Tested-by: Zhangfei Gao Have tested the series on aarch64. Tony has been helping to validate this series on Intel's platform. Tony, can I add your Test-by as well in this series? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()
On 2022/6/27 18:14, Tian, Kevin wrote: From: Lu Baolu Sent: Tuesday, June 21, 2022 10:44 PM +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) +{ + struct iommu_domain *domain; + ioasid_t max_pasids; + int ret = -EINVAL; + + /* Allocate mm->pasid if necessary. */ this comment is for iommu_sva_alloc_pasid() Updated. + max_pasids = dev->iommu->max_pasids; + if (!max_pasids) + return ERR_PTR(-EOPNOTSUPP); + + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1); + if (ret) + return ERR_PTR(ret); + ... +void iommu_sva_unbind_device(struct iommu_sva *handle) +{ + struct device *dev = handle->dev; + struct iommu_domain *domain = + container_of(handle, struct iommu_domain, bond); + ioasid_t pasid = iommu_sva_get_pasid(handle); + + mutex_lock(&iommu_sva_lock); + if (refcount_dec_and_test(&domain->bond.users)) { + iommu_detach_device_pasid(domain, dev, pasid); + iommu_domain_free(domain); + } + mutex_unlock(&iommu_sva_lock); +} +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device); + +u32 iommu_sva_get_pasid(struct iommu_sva *handle) +{ + struct iommu_domain *domain = + container_of(handle, struct iommu_domain, bond); + + return domain->mm->pasid; +} +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); Looks this is only used by unbind_device. Just open code it. It's part of current IOMMU/SVA interfaces for the device drivers, and has been used in various drivers. $ git grep iommu_sva_get_pasid drivers/dma/idxd/cdev.c:pasid = iommu_sva_get_pasid(sva); drivers/iommu/iommu-sva-lib.c: ioasid_t pasid = iommu_sva_get_pasid(handle); drivers/iommu/iommu-sva-lib.c:u32 iommu_sva_get_pasid(struct iommu_sva *handle) drivers/iommu/iommu-sva-lib.c:EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); drivers/misc/uacce/uacce.c: pasid = iommu_sva_get_pasid(handle); include/linux/iommu.h:u32 iommu_sva_get_pasid(struct iommu_sva *handle); include/linux/iommu.h:static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) Or, I missed anything? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support
On 2022/6/27 19:50, Zhangfei Gao wrote: On 2022/6/21 下午10:43, Lu Baolu wrote: Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Tested-by: Zhangfei Gao Have tested the series on aarch64. Thank you very much! Very appreciated for your help! Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 04/11] iommu: Add sva iommu_domain support
Hi Kevin, On 2022/6/27 16:29, Tian, Kevin wrote: From: Lu Baolu Sent: Tuesday, June 21, 2022 10:44 PM The sva iommu_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructure to support SVA domain in the iommu common layer. It includes: - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain type. The IOMMU drivers that support SVA should provide the sva domain specific iommu_domain_ops. - Add a helper to allocate an SVA domain. The iommu_domain_free() is still used to free an SVA domain. - Add helpers to attach an SVA domain to a device and the reverse operation. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, the attach/detach interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. The iommu_attach/detach_device_pasid() can be used for other purposes, such as kernel DMA with pasid, mediation device, etc. I'd split this into two patches. One for adding iommu_attach/ detach_device_pasid() and set/block_dev_pasid ops, and the other for adding SVA. Yes. Make sense. struct iommu_domain { unsigned type; const struct iommu_domain_ops *ops; unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */ - iommu_fault_handler_t handler; - void *handler_token; struct iommu_domain_geometry geometry; struct iommu_dma_cookie *iova_cookie; + union { + struct {/* IOMMU_DOMAIN_DMA */ + iommu_fault_handler_t handler; + void *handler_token; + }; why is it DMA domain specific? What about unmanaged domain? Unrecoverable fault can happen on any type including SVA. Hence I think above should be domain type agnostic. The report_iommu_fault() should be replaced by the new iommu_report_device_fault(). Jean has already started this work. https://lore.kernel.org/linux-iommu/Yo4Nw9QyllT1RZbd@myrica/ Currently this is only for DMA domains, hence Robin suggested to make it exclude with the SVA domain things. https://lore.kernel.org/linux-iommu/f3170016-4d7f-e78e-db48-68305f683...@arm.com/ + struct {/* IOMMU_DOMAIN_SVA */ + struct mm_struct *mm; + }; + }; }; + +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, + struct mm_struct *mm) +{ + const struct iommu_ops *ops = dev_iommu_ops(dev); + struct iommu_domain *domain; + + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); + if (!domain) + return NULL; + + domain->type = IOMMU_DOMAIN_SVA; It's a bit weird that the type has been specified when calling ops->domain_alloc while it still leaves to the caller to set the type. But this is not caused by this series. could be cleaned up separately. Yes. Robin has patches to refactor the domain allocation interface, let's wait and see what it looks like finally. + + mutex_lock(&group->mutex); + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL); + if (curr) + goto out_unlock; Need check xa_is_err(old). Either (1) old entry is a valid pointer, or (2) xa_is_err(curr) are failure cases. Hence, "curr == NULL" is the only check we need. Did I miss anything? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS
Hii Geert Am 28.06.2022 um 09:12 schrieb Michael Schmitz: Hi Geert, On 27/06/22 20:26, Geert Uytterhoeven wrote: Hi Michael, On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz wrote: Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: From: Arnd Bergmann All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. The Amiga SCSI drivers are all old WD33C93 ones, and replacing virt_to_bus by virt_to_phys in the dma_setup() function there would cause no functional change at all. FTR, the sgiwd93 driver use dma_map_single(). Thanks! From what I see, it doesn't have to deal with bounce buffers though? Leaving the bounce buffer handling in place, and taking a few other liberties - this is what converting the easiest case (a3000 SCSI) might look like. Any obvious mistakes? The mvme147 driver would be very similar to handle (after conversion to a platform device). The driver allocates bounce buffers using kmalloc if it hits an unaligned data buffer - can such buffers still even happen these days? If I understand dma_map_single() correctly, the resulting dma handle would be equally misaligned? To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() even though AFAIU memory used for DMA buffers generally isn't consistent on m68k? Thinking ahead to the other two Amiga drivers - I wonder whether allocating a static bounce buffer or a DMA pool at driver init is likely to succeed if the kernel runs from the low 16 MB RAM chunk? It certainly won't succeed if the kernel runs from a higher memory address, so the present bounce buffer logic around amiga_chip_alloc() might still need to be used here. Leaves the question whether converting the gvp11 and a2091 drivers is actually worth it, if bounce buffers still have to be handled explicitly. Untested (except for compile testing), un-checkpatched, don't try this on any disk with valuable data ... Cheers, Michael >From e8c6aa068d27901c49dfb7442d4200cc966350a5 Mon Sep 17 00:00:00 2001 From: Michael Schmitz Date: Tue, 28 Jun 2022 12:45:08 +1200 Subject: [PATCH] scsi - convert m68k WD33C93 drivers to DMA API Use dma_map_single() for gvp11 driver (leave bounce buffer logic unchanged). Compile-tested only. Signed-off-by: Michael Schmitz --- drivers/scsi/a3000.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/a3000.c b/drivers/scsi/a3000.c index dd161885eed1..3c62e8bafb8f 100644 --- a/drivers/scsi/a3000.c +++ b/drivers/scsi/a3000.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -25,8 +26,11 @@ struct a3000_hostdata { struct WD33C93_hostdata wh; struct a3000_scsiregs *regs; + struct device *dev; }; +#define DMA_DIR(d) ((d == DATA_OUT_DIR) ? DMA_TO_DEVICE : DMA_FROM_DEVICE) + static irqreturn_t a3000_intr(int irq, void *data) { struct Scsi_Host *instance = data; @@ -49,12 +53,16 @@ static irqreturn_t a3000_intr(int irq, void *data) static int dma_setup(struct scsi_cmnd *cmd, int dir_in) { struct scsi_pointer *scsi_pointer = WD33C93_scsi_pointer(cmd); + unsigned long len = scsi_pointer->this_residual; struct Scsi_Host *instance = cmd->device->host; struct a3000_hostdata *hdata = shost_priv(instance); struct WD33C93_hostdata *wh = &hdata->wh; struct a3000_scsiregs *regs = hdata->regs; unsigned short cntr = CNTR_PDMD | CNTR_INTEN; - unsigned long addr = virt_to_bus(scsi_pointer->ptr); + dma_addr_t addr; + + addr = dma_map_single(hdata->dev, scsi_pointer->ptr, len, DMA_DIR(dir_in)); + scsi_pointer->dma_handle = addr; /* * if the physical address has the wrong alignment, or if @@ -79,7 +87,7 @@ static int dma_setup(struct scsi_cmnd *cmd, int dir_in) scsi_pointer->this_residual); } - addr = virt_to_bus(wh->dma_bounce_buffer); + addr = virt_to_phys(wh->dma_bounce_buffer); } /* setup dma direction */ @@ -166,6 +174,10 @@ static void dma_stop(struct Scsi_Host *instance, struct scsi_cmnd *SCpnt, wh->dma_bounce_len = 0; } } + dma_unmap_single(hdata->dev, scsi_pointer->dma_handle, + scsi_pointer->this_residual, + DMA_DIR(wh->dma_dir)); + } static struct scsi_host_template amiga_a3000_scsi_template = { @@ -193,6 +205,11 @@ static int __init amiga_a3000_scsi_probe(struct platform_device *pdev) wd33c93_regs wdregs; struct a3000_hostdata *hdata; + if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(32))) { + dev_warn(&pdev->dev, "cannot use 32 bit DMA\n"); + return -ENODEV; + } + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) return -E
Re: [PATCH v3 0/3] phase out CONFIG_VIRT_TO_BUS
Hi Arnd! > If there are no more issues identified with this series, I'll merge it > through the asm-generic tree. The SCSI patches can also get merged > separately through the SCSI maintainers' tree if they prefer. I put patches 1 and 2 in scsi-staging to see if anything breaks... -- Martin K. Petersen Oracle Linux Engineering ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 0/2] Fix console probe delay when stdout-path isn't set
Hi Saravana, On Mon, Jun 27, 2022 at 11:03 PM Saravana Kannan wrote: > > Since the series that fixes console probe delay based on stdout-path[1] got > pulled into driver-core-next, I made these patches on top of them. > > Even if stdout-path isn't set in DT, this patch should take console > probe times back to how they were before the deferred_probe_timeout > clean up series[2]. > > Fabio/Ahmad/Sascha, > > Can you give this a shot please? This series works fine for me (with and without stdout-path), thanks: Tested-by: Fabio Estevam ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v1 2/2] serial: Set probe_no_timeout for all DT based drivers
With commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") the probing of TTY consoles could get delayed if they have optional suppliers that are listed in DT, but those suppliers don't probe by the time kernel boot finishes. The console devices will probe eventually after driver_probe_timeout expires. However, since consoles are often used for debugging kernel issues, it does not make sense to delay their probe. So, set the newly added probe_no_timeout flag for all serial drivers that at DT based. This way, fw_devlink will know not to delay the probing of the consoles past kernel boot. Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") Reported-by: Sascha Hauer Reported-by: Peng Fan Reported-by: Fabio Estevam Reported-by: Ahmad Fatoum Signed-off-by: Saravana Kannan --- drivers/tty/ehv_bytechan.c | 1 + drivers/tty/goldfish.c | 1 + drivers/tty/hvc/hvc_opal.c | 1 + drivers/tty/serial/8250/8250_acorn.c| 1 - drivers/tty/serial/8250/8250_aspeed_vuart.c | 1 + drivers/tty/serial/8250/8250_bcm2835aux.c | 1 + drivers/tty/serial/8250/8250_bcm7271.c | 1 + drivers/tty/serial/8250/8250_dw.c | 1 + drivers/tty/serial/8250/8250_em.c | 1 + drivers/tty/serial/8250/8250_ingenic.c | 1 + drivers/tty/serial/8250/8250_lpc18xx.c | 1 + drivers/tty/serial/8250/8250_mtk.c | 1 + drivers/tty/serial/8250/8250_of.c | 1 + drivers/tty/serial/8250/8250_omap.c | 1 + drivers/tty/serial/8250/8250_pxa.c | 1 + drivers/tty/serial/8250/8250_tegra.c| 1 + drivers/tty/serial/8250/8250_uniphier.c | 1 + drivers/tty/serial/altera_jtaguart.c| 1 + drivers/tty/serial/altera_uart.c| 1 + drivers/tty/serial/amba-pl011.c | 1 + drivers/tty/serial/apbuart.c| 1 + drivers/tty/serial/ar933x_uart.c| 1 + drivers/tty/serial/arc_uart.c | 1 + drivers/tty/serial/atmel_serial.c | 1 + drivers/tty/serial/bcm63xx_uart.c | 1 + drivers/tty/serial/clps711x.c | 1 + drivers/tty/serial/cpm_uart/cpm_uart_core.c | 1 + drivers/tty/serial/digicolor-usart.c| 1 + drivers/tty/serial/fsl_linflexuart.c| 1 + drivers/tty/serial/fsl_lpuart.c | 1 + drivers/tty/serial/imx.c| 1 + drivers/tty/serial/lantiq.c | 1 + drivers/tty/serial/liteuart.c | 1 + drivers/tty/serial/lpc32xx_hs.c | 1 + drivers/tty/serial/max310x.c| 1 + drivers/tty/serial/meson_uart.c | 1 + drivers/tty/serial/milbeaut_usio.c | 1 + drivers/tty/serial/mpc52xx_uart.c | 1 + drivers/tty/serial/mps2-uart.c | 1 + drivers/tty/serial/msm_serial.c | 1 + drivers/tty/serial/mvebu-uart.c | 1 + drivers/tty/serial/mxs-auart.c | 1 + drivers/tty/serial/omap-serial.c| 1 + drivers/tty/serial/owl-uart.c | 1 + drivers/tty/serial/pic32_uart.c | 1 + drivers/tty/serial/pmac_zilog.c | 1 + drivers/tty/serial/pxa.c| 1 + drivers/tty/serial/qcom_geni_serial.c | 1 + drivers/tty/serial/rda-uart.c | 1 + drivers/tty/serial/samsung_tty.c| 1 + drivers/tty/serial/sc16is7xx.c | 1 + drivers/tty/serial/serial-tegra.c | 1 + drivers/tty/serial/sh-sci.c | 1 + drivers/tty/serial/sifive.c | 1 + drivers/tty/serial/sprd_serial.c| 1 + drivers/tty/serial/st-asc.c | 1 + drivers/tty/serial/stm32-usart.c| 1 + drivers/tty/serial/sunhv.c | 1 + drivers/tty/serial/sunplus-uart.c | 1 + drivers/tty/serial/sunsab.c | 1 + drivers/tty/serial/sunsu.c | 1 + drivers/tty/serial/sunzilog.c | 1 + drivers/tty/serial/tegra-tcu.c | 1 + drivers/tty/serial/uartlite.c | 1 + drivers/tty/serial/ucc_uart.c | 1 + drivers/tty/serial/vt8500_serial.c | 1 + drivers/tty/serial/xilinx_uartps.c | 1 + 67 files changed, 66 insertions(+), 1 deletion(-) diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c index 19d32cb6af84..6de710da99be 100644 --- a/drivers/tty/ehv_bytechan.c +++ b/drivers/tty/ehv_bytechan.c @@ -739,6 +739,7 @@ static struct platform_driver ehv_bc_tty_driver = { .driver = { .name = "ehv-bc", .of_match_table = ehv_bc_tty_of_ids, + .probe_no_timeout = true, .suppress_bind_attrs = true, }, .probe = ehv_bc_tty_probe, diff --git a/drivers/tty/goldfish.c b/drivers/tty/goldfish.c index c7968aecd870..f9760598c836 100644 --- a/drivers/tty/goldfish.c +++ b/drivers/tty/goldfish.c @@ -474,6 +474,7 @@ static struct platform_drive
[PATCH v1 1/2] driver core: Add probe_no_timeout flag for drivers
This flag only needs to be set for drivers of devices that meet all the following conditions: - Need to probe successfully before userspace init in started - Have optional suppliers - Can't wait for deferred_probe_timeout to expire fw_devlink=on uses this info, as needed, to ignore dependencies on supplier devices that have not been added or supplier devices that don't have any drivers. It's still up to the driver to decide which of the missing suppliers are optional or not. Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") Reported-by: Sascha Hauer Reported-by: Peng Fan Reported-by: Fabio Estevam Reported-by: Ahmad Fatoum Signed-off-by: Saravana Kannan --- drivers/base/base.h | 1 + drivers/base/core.c | 7 +++ drivers/base/dd.c | 3 +++ include/linux/device.h| 7 +++ include/linux/device/driver.h | 11 +++ 5 files changed, 29 insertions(+) diff --git a/drivers/base/base.h b/drivers/base/base.h index b3a43a164dcd..149822d2086f 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -193,6 +193,7 @@ extern void device_links_no_driver(struct device *dev); extern bool device_links_busy(struct device *dev); extern void device_links_unbind_consumers(struct device *dev); extern void fw_devlink_drivers_done(void); +extern void fw_devlink_probe_no_timeout(void); /* device pm support */ void device_pm_move_to_tail(struct device *dev); diff --git a/drivers/base/core.c b/drivers/base/core.c index ccdd5b4295de..8e18904a1584 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1; static DEFINE_MUTEX(fwnode_link_lock); static bool fw_devlink_is_permissive(void); static bool fw_devlink_drv_reg_done; +static bool fw_devlink_no_timeout; static bool fw_devlink_best_effort; /** @@ -969,6 +970,7 @@ static void device_links_missing_supplier(struct device *dev) static bool dev_is_best_effort(struct device *dev) { return (fw_devlink_best_effort && dev->can_match) || + (fw_devlink_no_timeout && dev->probe_no_timeout) || (dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT)); } @@ -1688,6 +1690,11 @@ void fw_devlink_drivers_done(void) device_links_write_unlock(); } +void fw_devlink_probe_no_timeout(void) +{ + fw_devlink_no_timeout = true; +} + /** * wait_for_init_devices_probe - Try to probe any device needed for init * diff --git a/drivers/base/dd.c b/drivers/base/dd.c index e600dd2afc35..9b0ef2b6a058 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -324,6 +324,8 @@ static int deferred_probe_initcall(void) if (!IS_ENABLED(CONFIG_MODULES)) fw_devlink_drivers_done(); + else + fw_devlink_probe_no_timeout(); /* * Trigger deferred probe again, this time we won't defer anything @@ -734,6 +736,7 @@ static int __driver_probe_device(struct device_driver *drv, struct device *dev) return -EBUSY; dev->can_match = true; + dev->probe_no_timeout = drv->probe_no_timeout; pr_debug("bus: '%s': %s: matched device %s with driver %s\n", drv->bus->name, __func__, dev_name(dev), drv->name); diff --git a/include/linux/device.h b/include/linux/device.h index 424b55df0272..e6246b6cf6cf 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -536,6 +536,12 @@ struct device_physical_location { * @can_match: The device has matched with a driver at least once or it is in * a bus (like AMBA) which can't check for matching drivers until * other devices probe successfully. + * @probe_no_timeout: Set by driver core to indicate that this device's probe + * can't wait till driver_probe_timeout expires. This information + * is used by fw_devlink=on to avoid deferring the probe of this + * device to wait on supplier devices that haven't been added or + * probed successfully. + * See also: probe_no_timeout in struct driver. * @dma_coherent: this particular device is dma coherent, even if the * architecture supports non-coherent devices. * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the @@ -642,6 +648,7 @@ struct device { boolof_node_reused:1; boolstate_synced:1; boolcan_match:1; + boolprobe_no_timeout:1; #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 7acaabde5396..2ce60e511504 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -55,6 +55,15 @@ enum probe_type { * @owner: The module owner. * @mod_name: Used for
[PATCH v1 0/2] Fix console probe delay when stdout-path isn't set
Since the series that fixes console probe delay based on stdout-path[1] got pulled into driver-core-next, I made these patches on top of them. Even if stdout-path isn't set in DT, this patch should take console probe times back to how they were before the deferred_probe_timeout clean up series[2]. Fabio/Ahmad/Sascha, Can you give this a shot please? [1] - https://lore.kernel.org/lkml/20220623080344.783549-1-sarava...@google.com/ [2] - https://lore.kernel.org/lkml/20220601070707.3946847-1-sarava...@google.com/ Thanks, Saravana cc: Rob Herring cc: sascha hauer cc: peng fan cc: kevin hilman cc: ulf hansson cc: len brown cc: pavel machek cc: joerg roedel cc: will deacon cc: andrew lunn cc: heiner kallweit cc: russell king cc: "david s. miller" cc: eric dumazet cc: jakub kicinski cc: paolo abeni cc: linus walleij cc: hideaki yoshifuji cc: david ahern cc: kernel-t...@android.com cc: linux-ker...@vger.kernel.org cc: linux...@vger.kernel.org cc: iommu@lists.linux-foundation.org cc: net...@vger.kernel.org cc: linux-g...@vger.kernel.org Cc: ker...@pengutronix.de Saravana Kannan (2): driver core: Add probe_no_timeout flag for drivers serial: Set probe_no_timeout for all DT based drivers drivers/base/base.h | 1 + drivers/base/core.c | 7 +++ drivers/base/dd.c | 3 +++ drivers/tty/ehv_bytechan.c | 1 + drivers/tty/goldfish.c | 1 + drivers/tty/hvc/hvc_opal.c | 1 + drivers/tty/serial/8250/8250_acorn.c| 1 - drivers/tty/serial/8250/8250_aspeed_vuart.c | 1 + drivers/tty/serial/8250/8250_bcm2835aux.c | 1 + drivers/tty/serial/8250/8250_bcm7271.c | 1 + drivers/tty/serial/8250/8250_dw.c | 1 + drivers/tty/serial/8250/8250_em.c | 1 + drivers/tty/serial/8250/8250_ingenic.c | 1 + drivers/tty/serial/8250/8250_lpc18xx.c | 1 + drivers/tty/serial/8250/8250_mtk.c | 1 + drivers/tty/serial/8250/8250_of.c | 1 + drivers/tty/serial/8250/8250_omap.c | 1 + drivers/tty/serial/8250/8250_pxa.c | 1 + drivers/tty/serial/8250/8250_tegra.c| 1 + drivers/tty/serial/8250/8250_uniphier.c | 1 + drivers/tty/serial/altera_jtaguart.c| 1 + drivers/tty/serial/altera_uart.c| 1 + drivers/tty/serial/amba-pl011.c | 1 + drivers/tty/serial/apbuart.c| 1 + drivers/tty/serial/ar933x_uart.c| 1 + drivers/tty/serial/arc_uart.c | 1 + drivers/tty/serial/atmel_serial.c | 1 + drivers/tty/serial/bcm63xx_uart.c | 1 + drivers/tty/serial/clps711x.c | 1 + drivers/tty/serial/cpm_uart/cpm_uart_core.c | 1 + drivers/tty/serial/digicolor-usart.c| 1 + drivers/tty/serial/fsl_linflexuart.c| 1 + drivers/tty/serial/fsl_lpuart.c | 1 + drivers/tty/serial/imx.c| 1 + drivers/tty/serial/lantiq.c | 1 + drivers/tty/serial/liteuart.c | 1 + drivers/tty/serial/lpc32xx_hs.c | 1 + drivers/tty/serial/max310x.c| 1 + drivers/tty/serial/meson_uart.c | 1 + drivers/tty/serial/milbeaut_usio.c | 1 + drivers/tty/serial/mpc52xx_uart.c | 1 + drivers/tty/serial/mps2-uart.c | 1 + drivers/tty/serial/msm_serial.c | 1 + drivers/tty/serial/mvebu-uart.c | 1 + drivers/tty/serial/mxs-auart.c | 1 + drivers/tty/serial/omap-serial.c| 1 + drivers/tty/serial/owl-uart.c | 1 + drivers/tty/serial/pic32_uart.c | 1 + drivers/tty/serial/pmac_zilog.c | 1 + drivers/tty/serial/pxa.c| 1 + drivers/tty/serial/qcom_geni_serial.c | 1 + drivers/tty/serial/rda-uart.c | 1 + drivers/tty/serial/samsung_tty.c| 1 + drivers/tty/serial/sc16is7xx.c | 1 + drivers/tty/serial/serial-tegra.c | 1 + drivers/tty/serial/sh-sci.c | 1 + drivers/tty/serial/sifive.c | 1 + drivers/tty/serial/sprd_serial.c| 1 + drivers/tty/serial/st-asc.c | 1 + drivers/tty/serial/stm32-usart.c| 1 + drivers/tty/serial/sunhv.c | 1 + drivers/tty/serial/sunplus-uart.c | 1 + drivers/tty/serial/sunsab.c | 1 + drivers/tty/serial/sunsu.c | 1 + drivers/tty/serial/sunzilog.c | 1 + drivers/tty/serial/tegra-tcu.c | 1 + drivers/tty/serial/uartlite.c | 1 + drivers/tty/serial/ucc_uart.c | 1 + drivers/tty/serial/vt8500_serial.c | 1 + drivers/tty/serial/xilinx_uartps.c | 1 + include/linux/device.h | 7 +++ include/linux/device/driver.h | 11 +++ 72
Re: [PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors
On 6/28/22 00:25, John Garry wrote: > ATA devices (struct ata_device) have a max_sectors field which is > configured internally in libata. This is then used to (re)configure the > associated sdev request queue max_sectors value from how it is earlier set > in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set > according to shost limits, which includes host DMA mapping limits. > > Cap the ata_device max_sectors according to shost->max_sectors to respect > this shost limit. > > Signed-off-by: John Garry > Acked-by: Damien Le Moal Nit: please change the patch title to "ata: libata-scsi: Cap ..." > --- > drivers/ata/libata-scsi.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 86dbb1cdfabd..24a43d540d9f 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1060,6 +1060,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, > struct ata_device *dev) > dev->flags |= ATA_DFLAG_NO_UNLOAD; > > /* configure max sectors */ > + dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors); > blk_queue_max_hw_sectors(q, dev->max_sectors); > > if (dev->class == ATA_DEV_ATAPI) { -- Damien Le Moal Western Digital Research ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS
Hi Geert, On 27/06/22 20:26, Geert Uytterhoeven wrote: Hi Michael, On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz wrote: Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: From: Arnd Bergmann All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. The Amiga SCSI drivers are all old WD33C93 ones, and replacing virt_to_bus by virt_to_phys in the dma_setup() function there would cause no functional change at all. FTR, the sgiwd93 driver use dma_map_single(). Thanks! From what I see, it doesn't have to deal with bounce buffers though? Cheers, Michael Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination
On 2022-06-27 20:21, Alex Williamson wrote: On Fri, 24 Jun 2022 18:51:44 +0100 Robin Murphy wrote: Since IOMMU groups are mandatory for drivers to support, it stands to reason that any device which has been successfully added to a group must be on a bus supported by that IOMMU driver, and therefore a domain viable for any device in the group must be viable for all devices in the group. This already has to be the case for the IOMMU API's internal default domain, for instance. Thus even if the group contains devices on different buses, that can only mean that the IOMMU driver actually supports such an odd topology, and so without loss of generality we can expect the bus type of any device in a group to be suitable for IOMMU API calls. Furthermore, scrutiny reveals a lack of protection for the bus being removed while vfio_iommu_type1_attach_group() is using it; the reference that VFIO holds on the iommu_group ensures that data remains valid, but does not prevent the group's membership changing underfoot. We can address both concerns by recycling vfio_bus_type() into some superficially similar logic to indirect the IOMMU API calls themselves. Each call is thus protected from races by the IOMMU group's own locking, and we no longer need to hold group-derived pointers beyond that scope. It also gives us an easy path for the IOMMU API's migration of bus-based interfaces to device-based, of which we can already take the first step with device_iommu_capable(). As with domains, any capability must in practice be consistent for devices in a given group - and after all it's still the same capability which was expected to be consistent across an entire bus! - so there's no need for any complicated validation. Signed-off-by: Robin Murphy --- v3: Complete rewrite yet again, and finally it doesn't feel like we're stretching any abstraction boundaries the wrong way, and the diffstat looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP directly in the callback, but decided I like the consistency of minimal generic wrappers. And yes, if the capability isn't supported then it does end up getting tested for the whole group, but meh, it's harmless. drivers/vfio/vfio_iommu_type1.c | 42 + 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c13b9290e357..a77ff00c677b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1679,18 +1679,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, return ret; } -static int vfio_bus_type(struct device *dev, void *data) -{ - struct bus_type **bus = data; - - if (*bus && *bus != dev->bus) - return -EINVAL; - - *bus = dev->bus; - - return 0; -} - static int vfio_iommu_replay(struct vfio_iommu *iommu, struct vfio_domain *domain) { @@ -2153,13 +2141,25 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, list_splice_tail(iova_copy, iova); } Any objection if I add the following comment: /* Redundantly walks non-present capabilities to simplify caller */ Not at all, feel free - I guess if I felt it was worth pre-empting the review question then it probably is subtle enough to deserve a code comment! Thanks, Robin. Thanks, Alex +static int vfio_iommu_device_capable(struct device *dev, void *data) +{ + return device_iommu_capable(dev, (enum iommu_cap)data); +} + +static int vfio_iommu_domain_alloc(struct device *dev, void *data) +{ + struct iommu_domain **domain = data; + + *domain = iommu_domain_alloc(dev->bus); + return 1; /* Don't iterate */ +} + static int vfio_iommu_type1_attach_group(void *iommu_data, struct iommu_group *iommu_group, enum vfio_group_type type) { struct vfio_iommu *iommu = iommu_data; struct vfio_iommu_group *group; struct vfio_domain *domain, *d; - struct bus_type *bus = NULL; bool resv_msi, msi_remap; phys_addr_t resv_msi_base = 0; struct iommu_domain_geometry *geo; @@ -2192,18 +2192,19 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_unlock; } - /* Determine bus_type in order to allocate a domain */ - ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type); - if (ret) - goto out_free_group; - ret = -ENOMEM; domain = kzalloc(sizeof(*domain), GFP_KERNEL); if (!domain) goto out_free_group; + /* +* Going via the iommu_group iterator avoids races, and trivially gives +* us a representative device for the IOMMU API call. We don't actually +* want to iterate beyond the first device (if any). +*/ ret = -EIO; - domain->domain = iommu_domain_alloc(bus); + iommu_group_for_each_dev(iommu_group, &domain->domain,
Re: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination
On Fri, 24 Jun 2022 18:51:44 +0100 Robin Murphy wrote: > Since IOMMU groups are mandatory for drivers to support, it stands to > reason that any device which has been successfully added to a group > must be on a bus supported by that IOMMU driver, and therefore a domain > viable for any device in the group must be viable for all devices in > the group. This already has to be the case for the IOMMU API's internal > default domain, for instance. Thus even if the group contains devices on > different buses, that can only mean that the IOMMU driver actually > supports such an odd topology, and so without loss of generality we can > expect the bus type of any device in a group to be suitable for IOMMU > API calls. > > Furthermore, scrutiny reveals a lack of protection for the bus being > removed while vfio_iommu_type1_attach_group() is using it; the reference > that VFIO holds on the iommu_group ensures that data remains valid, but > does not prevent the group's membership changing underfoot. > > We can address both concerns by recycling vfio_bus_type() into some > superficially similar logic to indirect the IOMMU API calls themselves. > Each call is thus protected from races by the IOMMU group's own locking, > and we no longer need to hold group-derived pointers beyond that scope. > It also gives us an easy path for the IOMMU API's migration of bus-based > interfaces to device-based, of which we can already take the first step > with device_iommu_capable(). As with domains, any capability must in > practice be consistent for devices in a given group - and after all it's > still the same capability which was expected to be consistent across an > entire bus! - so there's no need for any complicated validation. > > Signed-off-by: Robin Murphy > --- > > v3: Complete rewrite yet again, and finally it doesn't feel like we're > stretching any abstraction boundaries the wrong way, and the diffstat > looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP > directly in the callback, but decided I like the consistency of minimal > generic wrappers. And yes, if the capability isn't supported then it > does end up getting tested for the whole group, but meh, it's harmless. > > drivers/vfio/vfio_iommu_type1.c | 42 + > 1 file changed, 22 insertions(+), 20 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index c13b9290e357..a77ff00c677b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1679,18 +1679,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > return ret; > } > > -static int vfio_bus_type(struct device *dev, void *data) > -{ > - struct bus_type **bus = data; > - > - if (*bus && *bus != dev->bus) > - return -EINVAL; > - > - *bus = dev->bus; > - > - return 0; > -} > - > static int vfio_iommu_replay(struct vfio_iommu *iommu, >struct vfio_domain *domain) > { > @@ -2153,13 +2141,25 @@ static void vfio_iommu_iova_insert_copy(struct > vfio_iommu *iommu, > list_splice_tail(iova_copy, iova); > } > Any objection if I add the following comment: /* Redundantly walks non-present capabilities to simplify caller */ Thanks, Alex > +static int vfio_iommu_device_capable(struct device *dev, void *data) > +{ > + return device_iommu_capable(dev, (enum iommu_cap)data); > +} > + > +static int vfio_iommu_domain_alloc(struct device *dev, void *data) > +{ > + struct iommu_domain **domain = data; > + > + *domain = iommu_domain_alloc(dev->bus); > + return 1; /* Don't iterate */ > +} > + > static int vfio_iommu_type1_attach_group(void *iommu_data, > struct iommu_group *iommu_group, enum vfio_group_type type) > { > struct vfio_iommu *iommu = iommu_data; > struct vfio_iommu_group *group; > struct vfio_domain *domain, *d; > - struct bus_type *bus = NULL; > bool resv_msi, msi_remap; > phys_addr_t resv_msi_base = 0; > struct iommu_domain_geometry *geo; > @@ -2192,18 +2192,19 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, > goto out_unlock; > } > > - /* Determine bus_type in order to allocate a domain */ > - ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type); > - if (ret) > - goto out_free_group; > - > ret = -ENOMEM; > domain = kzalloc(sizeof(*domain), GFP_KERNEL); > if (!domain) > goto out_free_group; > > + /* > + * Going via the iommu_group iterator avoids races, and trivially gives > + * us a representative device for the IOMMU API call. We don't actually > + * want to iterate beyond the first device (if any). > + */ > ret = -EIO; > - domain->domain = iommu_domain_alloc(bus); > + iommu_group_for_each_dev(iommu_group, &domain->domain, > + vfio_iommu_domain_alloc); > if
Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1
On Mon, Jun 27, 2022 at 10:50 AM Rob Herring wrote: > > On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote: > > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote: > > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") > > > enabled iommus and dmas dependency enforcement by default. On some > > > systems, this caused the console device's probe to get delayed until the > > > deferred_probe_timeout expires. > > > > > > We need consoles to work as soon as possible, so mark the console device > > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay > > > the probe of the console device for suppliers without drivers. The > > > driver can then make the decision on where it can probe without those > > > suppliers or defer its probe. > > > > > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") > > > Reported-by: Sascha Hauer > > > Reported-by: Peng Fan > > > Signed-off-by: Saravana Kannan > > > Tested-by: Peng Fan > > > --- > > > drivers/of/base.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/of/base.c b/drivers/of/base.c > > > index d4f98c8469ed..a19cd0c73644 100644 > > > --- a/drivers/of/base.c > > > +++ b/drivers/of/base.c > > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 > > > align)) > > > of_property_read_string(of_aliases, "stdout", &name); > > > if (name) > > > of_stdout = of_find_node_opts_by_path(name, > > > &of_stdout_options); > > > + if (of_stdout) > > > + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT; > > > > The device given in the stdout-path property doesn't necessarily have to > > be consistent with the console= parameter. The former is usually > > statically set in the device trees contained in the kernel while the > > latter is dynamically set by the bootloader. So if you change the > > console uart in the bootloader then you'll still run into this trap. > > > > It's problematic to consult only the device tree for dependencies. I > > found several examples of drivers in the tree for which dma support > > is optional. They use it if they can, but continue without it when > > not available. "hwlock" is another property which consider several > > drivers as optional. Also consider SoCs in early upstreaming phases > > when the device tree is merged with "dmas" or "hwlock" properties, > > but the corresponding drivers are not yet upstreamed. It's not nice > > to defer probing of all these devices for a long time. > > > > I wonder if it wouldn't be a better approach to just probe all devices > > and record the device(node) they are waiting on. Then you know that you > > don't need to probe them again until the device they are waiting for > > is available. > > Can't we have a driver flag 'I have optional dependencies' that will > trigger probe without all dependencies and then the driver can defer > probe if required dependencies are not yet met. Haha... that's kinda what I'm working on right now. But named intentionally in a more limited sense of "I can't wait for the timeout" where fw_devlink will relax and allow the driver to probe (and have it make the call) once we hit late_initcall(). I'm explicitly limiting it to "timeout" because we don't want everyone adding this flag everytime they hit an issue. That'll beat the point of fw_devlink=on. Also, setting the flag for a driver to fix one system might break another system because in the other system the user might want to wait for the timeout because the supplier drivers would be loaded before the timeout. Another option is to restart the timer (if it hasn't expired) when filesystems get mounted (in addition to the current "when new driver gets registered). That way, we might be able to drop the timeout from 10s to 5s. -Saravana ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] of: base: Avoid console probe delay when fw_devlink.strict=1
On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote: > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote: > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") > > enabled iommus and dmas dependency enforcement by default. On some > > systems, this caused the console device's probe to get delayed until the > > deferred_probe_timeout expires. > > > > We need consoles to work as soon as possible, so mark the console device > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay > > the probe of the console device for suppliers without drivers. The > > driver can then make the decision on where it can probe without those > > suppliers or defer its probe. > > > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") > > Reported-by: Sascha Hauer > > Reported-by: Peng Fan > > Signed-off-by: Saravana Kannan > > Tested-by: Peng Fan > > --- > > drivers/of/base.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/of/base.c b/drivers/of/base.c > > index d4f98c8469ed..a19cd0c73644 100644 > > --- a/drivers/of/base.c > > +++ b/drivers/of/base.c > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 > > align)) > > of_property_read_string(of_aliases, "stdout", &name); > > if (name) > > of_stdout = of_find_node_opts_by_path(name, > > &of_stdout_options); > > + if (of_stdout) > > + of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT; > > The device given in the stdout-path property doesn't necessarily have to > be consistent with the console= parameter. The former is usually > statically set in the device trees contained in the kernel while the > latter is dynamically set by the bootloader. So if you change the > console uart in the bootloader then you'll still run into this trap. > > It's problematic to consult only the device tree for dependencies. I > found several examples of drivers in the tree for which dma support > is optional. They use it if they can, but continue without it when > not available. "hwlock" is another property which consider several > drivers as optional. Also consider SoCs in early upstreaming phases > when the device tree is merged with "dmas" or "hwlock" properties, > but the corresponding drivers are not yet upstreamed. It's not nice > to defer probing of all these devices for a long time. > > I wonder if it wouldn't be a better approach to just probe all devices > and record the device(node) they are waiting on. Then you know that you > don't need to probe them again until the device they are waiting for > is available. Can't we have a driver flag 'I have optional dependencies' that will trigger probe without all dependencies and then the driver can defer probe if required dependencies are not yet met. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according to shost limits, which includes host DMA mapping limits. Cap the ata_device max_sectors according to shost->max_sectors to respect this shost limit. Signed-off-by: John Garry Acked-by: Damien Le Moal --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 86dbb1cdfabd..24a43d540d9f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1060,6 +1060,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) dev->flags |= ATA_DFLAG_NO_UNLOAD; /* configure max sectors */ + dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors); blk_queue_max_hw_sectors(q, dev->max_sectors); if (dev->class == ATA_DEV_ATAPI) { -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel parameter. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578 4c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- .../admin-guide/kernel-parameters.txt | 4 +- include/linux/swiotlb.h | 27 +++ kernel/dma/swiotlb.c | 202 ++ 3 files changed, 194 insertions(+), 39 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2522b11e593f..4a6ad177d4b8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5904,8 +5904,10 @@ it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst) swiotlb=[ARM,IA-64,PPC,MIPS,X86] - Format: { | force | noforce } + Format: { [,] | force | noforce } -- Number of I/O TLB slabs +-- Second integer after comma. Number of swiotlb +areas with their own lock. Must be power of 2. force -- force using of bounce buffers even if they wouldn't be automatically used by the kernel noforce -- Never use bounce buffers (for debugging) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..7157428cf3ac 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, #ifdef CONFIG_SWIOTLB extern enum swiotlb_force swiotlb_force; +/** + * struct io_tlb_area - IO TLB memory area descriptor + * + * This is a single area with a single lock. + * + * @used: The number of used IO TLB block. + * @index: The slot index to start searching in this area for next round. + * @lock: The lock to protect the above data structures in the map and + * unmap calls. + */ +struct io_tlb_area { + unsigned long used; + unsigned int index; + spinlock_t lock; +}; + /** * struct io_tlb_mem - IO TLB Memory Pool Descriptor * @@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force; * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -102,6 +120,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); +void __init swiotlb_adjust_nareas(unsigned int nareas); #else static inline void swiotlb_init(bool addressing_limited, unsigned int flags) { @@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev) static inline void swiotlb_adjust_size(unsigned long size) { } + +static inline void swiotlb_adjust_nareas(unsigned int nareas) +{ +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index cb50f8d38360..17154abdfb34 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,6 +70,7 @@ struct io_tlb_mem io_tlb_default_mem; phys_addr_t swiotlb_unencrypted_base; static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT; +static unsigned long default_nareas = 1; static int __init setup_io_tlb_npages(char *str) @@ -79,6 +80,10 @@ setup_io_tlb_npages(char *str) default_nslabs = ALIGN(simple_strtoul(str, &str, 0), IO_TLB_SEGSIZE); } + if (*str == ',') + ++str; + if (isdigit(*str)) + swiotlb_adjust_na
[PATCH 0/2] swiotlb: Split up single swiotlb lock
From: Tianyu Lan Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. Patch 1 is to introduce swiotlb area concept and split up single swiotlb lock. Patch 2 set swiotlb area number with lapic number Tianyu Lan (2): swiotlb: Split up single swiotlb lock x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT .../admin-guide/kernel-parameters.txt | 4 +- arch/x86/kernel/acpi/boot.c | 3 + include/linux/swiotlb.h | 27 +++ kernel/dma/swiotlb.c | 202 ++ 4 files changed, 197 insertions(+), 39 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT
From: Tianyu Lan When initialize swiotlb bounce buffer, smp_init() has not been called and cpu number can not be got from num_online_cpus(). Use the number of lapic entry to set swiotlb area number and keep swiotlb area number equal to cpu number on the x86 platform. Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan --- arch/x86/kernel/acpi/boot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 907cc98b1938..7e13499f2c10 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -1131,6 +1132,8 @@ static int __init acpi_parse_madt_lapic_entries(void) return count; } + swiotlb_adjust_nareas(max(count, x2count)); + x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI, acpi_parse_x2apic_nmi, 0); count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal mapping limit
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request queue max_sectors from dma_opt_mapping_size(), which knows this mapping limit. Signed-off-by: John Garry --- drivers/scsi/scsi_transport_sas.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index 12bff64dade6..1b45248748e0 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -225,6 +225,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev, { struct Scsi_Host *shost = dev_to_shost(dev); struct sas_host_attrs *sas_host = to_sas_host_attrs(shost); + struct device *dma_dev = shost->dma_dev; INIT_LIST_HEAD(&sas_host->rphy_list); mutex_init(&sas_host->lock); @@ -236,6 +237,11 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev, dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n", shost->host_no); + if (dma_dev) { + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); + } + return 0; } -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/5] dma-iommu: Add iommu_dma_opt_mapping_size()
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may be quite slow. Signed-off-by: John Garry Reviewed-by: Damien Le Moal --- drivers/iommu/dma-iommu.c | 6 ++ drivers/iommu/iova.c | 5 + include/linux/iova.h | 2 ++ 3 files changed, 13 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f90251572a5d..9e1586447ee8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1459,6 +1459,11 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev) return (1UL << __ffs(domain->pgsize_bitmap)) - 1; } +static size_t iommu_dma_opt_mapping_size(void) +{ + return iova_rcache_range(); +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = iommu_dma_alloc, .free = iommu_dma_free, @@ -1479,6 +1484,7 @@ static const struct dma_map_ops iommu_dma_ops = { .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .get_merge_boundary = iommu_dma_get_merge_boundary, + .opt_mapping_size = iommu_dma_opt_mapping_size, }; /* diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index db77aa675145..9f00b58d546e 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); static void free_iova_rcaches(struct iova_domain *iovad); +unsigned long iova_rcache_range(void) +{ + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); +} + static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node) { struct iova_domain *iovad; diff --git a/include/linux/iova.h b/include/linux/iova.h index 320a70e40233..c6ba6d95d79c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t iova) int iova_cache_get(void); void iova_cache_put(void); +unsigned long iova_rcache_range(void); + void free_iova(struct iova_domain *iovad, unsigned long pfn); void __free_iova(struct iova_domain *iovad, struct iova *iova); struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 3/5] scsi: core: Cap shost max_sectors according to DMA mapping limits only once
The shost->max_sectors is repeatedly capped according to the host DMA mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so set only once when adding the host. Signed-off-by: John Garry --- drivers/scsi/hosts.c| 5 + drivers/scsi/scsi_lib.c | 4 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 8352f90d997d..d04bd2c7c9f1 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -236,6 +236,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, shost->dma_dev = dma_dev; + if (dma_dev->dma_mask) { + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT); + } + error = scsi_mq_setup_tags(shost); if (error) goto fail; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 6ffc9e4258a8..6ce8acea322a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1884,10 +1884,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize); } - if (dev->dma_mask) { - shost->max_sectors = min_t(unsigned int, shost->max_sectors, - dma_max_mapping_size(dev) >> SECTOR_SHIFT); - } blk_queue_max_hw_sectors(q, shost->max_sectors); blk_queue_segment_boundary(q, shost->dma_boundary); dma_set_seg_boundary(dev, shost->dma_boundary); -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/5] dma-mapping: Add dma_opt_mapping_size()
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by: John Garry Reviewed-by: Damien Le Moal --- Documentation/core-api/dma-api.rst | 9 + include/linux/dma-map-ops.h| 1 + include/linux/dma-mapping.h| 5 + kernel/dma/mapping.c | 12 4 files changed, 27 insertions(+) diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 6d6d0edd2d27..b3cd9763d28b 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -204,6 +204,15 @@ Returns the maximum size of a mapping for the device. The size parameter of the mapping functions like dma_map_single(), dma_map_page() and others should not be larger than the returned value. +:: + + size_t + dma_opt_mapping_size(struct device *dev); + +Returns the maximum optimal size of a mapping for the device. Mapping large +buffers may take longer so device drivers are advised to limit total DMA +streaming mappings length to the returned value. + :: bool diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d5b06b3a4a6..98ceba6fa848 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -69,6 +69,7 @@ struct dma_map_ops { int (*dma_supported)(struct device *dev, u64 mask); u64 (*get_required_mask)(struct device *dev); size_t (*max_mapping_size)(struct device *dev); + size_t (*opt_mapping_size)(void); unsigned long (*get_merge_boundary)(struct device *dev); }; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..fe3849434b2a 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -144,6 +144,7 @@ int dma_set_mask(struct device *dev, u64 mask); int dma_set_coherent_mask(struct device *dev, u64 mask); u64 dma_get_required_mask(struct device *dev); size_t dma_max_mapping_size(struct device *dev); +size_t dma_opt_mapping_size(struct device *dev); bool dma_need_sync(struct device *dev, dma_addr_t dma_addr); unsigned long dma_get_merge_boundary(struct device *dev); struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size, @@ -266,6 +267,10 @@ static inline size_t dma_max_mapping_size(struct device *dev) { return 0; } +static inline size_t dma_opt_mapping_size(struct device *dev) +{ + return 0; +} static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr) { return false; diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index db7244291b74..1bfe11b1edb6 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -773,6 +773,18 @@ size_t dma_max_mapping_size(struct device *dev) } EXPORT_SYMBOL_GPL(dma_max_mapping_size); +size_t dma_opt_mapping_size(struct device *dev) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + size_t size = SIZE_MAX; + + if (ops && ops->opt_mapping_size) + size = ops->opt_mapping_size(); + + return min(dma_max_mapping_size(dev), size); +} +EXPORT_SYMBOL_GPL(dma_opt_mapping_size); + bool dma_need_sync(struct device *dev, dma_addr_t dma_addr) { const struct dma_map_ops *ops = get_dma_ops(dev); -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/5] DMA mapping changes for SCSI core
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. The SCSI SAS transport code is modified only to use this limit. For now I did not want to touch other hosts as I have a concern that this change could cause a performance regression. I also added a patch for libata-scsi as it does not currently honour the shost max_sectors limit. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ [1] https://lore.kernel.org/linux-iommu/f5b78c9c-312e-70ab-ecbb-f14623a4b...@arm.com/ Changes since v3: - Apply max DMA optimial limit to SAS hosts only Note: Even though "scsi: core: Cap shost max_sectors only once when adding" is a subset of a previous patch I did not transfer the RB tags - Rebase on v5.19-rc4 Changes since v2: - Rebase on v5.19-rc1 - Add Damien's tag to 2/4 (thanks) Changes since v1: - Relocate scsi_add_host_with_dma() dma_dev check (Reported by Dan) - Add tags from Damien and Martin (thanks) - note: I only added Martin's tag to the SCSI patch John Garry (5): dma-mapping: Add dma_opt_mapping_size() dma-iommu: Add iommu_dma_opt_mapping_size() scsi: core: Cap shost max_sectors according to DMA mapping limits only once scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal mapping limit libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors Documentation/core-api/dma-api.rst | 9 + drivers/ata/libata-scsi.c | 1 + drivers/iommu/dma-iommu.c | 6 ++ drivers/iommu/iova.c | 5 + drivers/scsi/hosts.c | 5 + drivers/scsi/scsi_lib.c| 4 drivers/scsi/scsi_transport_sas.c | 6 ++ include/linux/dma-map-ops.h| 1 + include/linux/dma-mapping.h| 5 + include/linux/iova.h | 2 ++ kernel/dma/mapping.c | 12 11 files changed, 52 insertions(+), 4 deletions(-) -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 4/5] iommu/io-pgtable-dart: Add DART PTE support for t6000
On 2022-06-21 08:18, Janne Grunau wrote: From: Sven Peter The DARTs present in the M1 Pro/Max/Ultra SoC use a diffent PTE format. They support a 42bit physical address space by shifting the paddr and extending its mask inside the PTE. They also come with mandatory sub-page protection now which we just configure to always allow access to the entire page. This feature is already present but optional on the previous DARTs which allows to unconditionally configure it. Signed-off-by: Sven Peter Co-developed-by: Janne Grunau Signed-off-by: Janne Grunau --- Changes in v3: - apply change to io-pgtable-dart.c - handle pte <> paddr conversion based on the pte format instead of the output address size Changes in v2: - add APPLE_DART2 PTE format drivers/iommu/io-pgtable-dart.c | 51 +++-- drivers/iommu/io-pgtable.c | 1 + include/linux/io-pgtable.h | 1 + 3 files changed, 45 insertions(+), 8 deletions(-) [...] @@ -536,7 +571,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) if (!cfg->coherent_walk) return NULL; - if (cfg->oas > 36) + if (cfg->oas != 36 && cfg->oas != 42) return NULL; Wouldn't it make sense to tie this to the format? Maybe 36-bit OAS is still valid with v2, but presumably 42-bit with v1 definitely isn't. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/5] iommu/io-pgtable: Move Apple DART support to its own file
On 2022-06-21 08:18, Janne Grunau wrote: The pte format used by the DARTs found in the Apple M1 (t8103) is not fully compatible with io-pgtable-arm. The 24 MSB are used for subpage protection (mapping only parts of page) and conflict with the address mask. In addition bit 1 is not available for tagging entries but disables subpage protection. Subpage protection could be useful to support a CPU granule of 4k with the fixed IOMMU page size of 16k. The DARTs found on Apple M1 Pro/Max/Ultra use another different pte format which is even less compatible. To support an output address size of 42 bit the address is shifted down by 4. Subpage protection is mandatory and bit 1 signifies uncached mappings used by the display controller. It would be advantageous to share code for all known Apple DART variants to support common features. The page table allocator for DARTs is less complex since it uses a two levels of translation table without support for huge pages. Signed-off-by: Janne Grunau --- Changes in v3: - move APPLE_DART to its own io-pgtable implementation, copied from io-pgtable-arm and simplified Changes in v2: - add APPLE_DART2 io-pgtable format MAINTAINERS | 1 + drivers/iommu/Kconfig | 1 - drivers/iommu/Makefile | 2 +- drivers/iommu/io-pgtable-arm.c | 63 drivers/iommu/io-pgtable-dart.c | 580 drivers/iommu/io-pgtable.c | 2 + 6 files changed, 584 insertions(+), 65 deletions(-) create mode 100644 drivers/iommu/io-pgtable-dart.c diff --git a/MAINTAINERS b/MAINTAINERS index 1fc9ead83d2a..028b7e31e589 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1848,6 +1848,7 @@ F:drivers/clk/clk-apple-nco.c F:drivers/i2c/busses/i2c-pasemi-core.c F:drivers/i2c/busses/i2c-pasemi-platform.c F:drivers/iommu/apple-dart.c +F: drivers/iommu/io-pgtable-dart.c F:drivers/irqchip/irq-apple-aic.c F:drivers/mailbox/apple-mailbox.c F:drivers/nvme/host/apple.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index c79a0df090c0..ed6d8260f330 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -294,7 +294,6 @@ config APPLE_DART tristate "Apple DART IOMMU Support" depends on ARCH_APPLE || (COMPILE_TEST && !GENERIC_ATOMIC64) select IOMMU_API - select IOMMU_IO_PGTABLE_LPAE You still need to select the base IO_PGTABLE. FWIW I think that's probably the only crucial issue here - lots more comments below, but they're primarily things that could all be unpicked later. default ARCH_APPLE help Support for Apple DART (Device Address Resolution Table) IOMMUs diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 44475a9b3eea..bd68c93bbd62 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -29,4 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o -obj-$(CONFIG_APPLE_DART) += apple-dart.o +obj-$(CONFIG_APPLE_DART) += apple-dart.o io-pgtable-dart.o diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 94ff319ae8ac..d7f5e23da643 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -130,9 +130,6 @@ #define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x88ULL #define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL -#define APPLE_DART_PTE_PROT_NO_WRITE (1<<7) -#define APPLE_DART_PTE_PROT_NO_READ (1<<8) - /* IOPTE accessors */ #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) @@ -406,15 +403,6 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, { arm_lpae_iopte pte; - if (data->iop.fmt == APPLE_DART) { - pte = 0; - if (!(prot & IOMMU_WRITE)) - pte |= APPLE_DART_PTE_PROT_NO_WRITE; - if (!(prot & IOMMU_READ)) - pte |= APPLE_DART_PTE_PROT_NO_READ; - return pte; - } - if (data->iop.fmt == ARM_64_LPAE_S1 || data->iop.fmt == ARM_32_LPAE_S1) { pte = ARM_LPAE_PTE_nG; @@ -1107,52 +1095,6 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) return NULL; } -static struct io_pgtable * -apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) -{ - struct arm_lpae_io_pgtable *data; - int i; - - if (cfg->oas > 36) - return NULL; - - data = arm_lpae_alloc_pgtable(cfg); - if (!data) - return NULL; - - /* -* The table format itself always uses two levels, but the total VA -* space is mapped by four separate tables, making the MMIO registers -* an effective "level 1". For simplicity, though, we treat this -* equivalently to LPAE stage 2 concatenation at level 2, with the -
RE: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node
On Fri, 24 Jun 2022, Shameerali Kolothum Thodi wrote: Hi, -Original Message- From: Steven Price [mailto:steven.pr...@arm.com] Sent: 17 June 2022 13:42 To: Shameerali Kolothum Thodi ; linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; iommu@lists.linux-foundation.org Cc: Linuxarm ; lorenzo.pieral...@arm.com; j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang ; Guohanjun (Hanjun Guo) ; sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org Subject: Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node On 15/06/2022 11:10, Shameer Kolothum wrote: Hi v12 --> v13 -No changes. Rebased to 5.19-rc1. -Picked up tags received from Laurentiu, Hanjun and Will. Thanks!. You've already got my Tested-by tags, but just to confirm I gave this a spin and it works fine. Thanks Steve. I think the series is now in a good shape to be merged. Hi Will/Robin, Appreciate, if you could please take a look at the remaining SMMU related patches(7-9) and provide your approval? Thanks, Shameer First of all thanks to all of you for keeping this going. I've read through most of this patch series and it doesn't read like the best sunny days. I do understand that there are incentives to get things right; sometimes first make it work, then make it better? Running code often seems a better alternative than wrong words on paper as users don't care about the paper. They only care if their hardware becomes a paperweight because it's not working. I was trying to find diplomatic words but the general problem has become so much bigger than just this change as I am faced with the fact that vendors are talking to give up maintaining Arm/ACPI and go back to FDT exclusively, which I believe would be the wrong but an understandable exit out of a roundabout. For me this Arm/Linux/ACPI problem becomes double-impact, as I am not even a Linux person. And part of what Arm/ACPI was solving was the any OS can just works on Arm hardware; for a while people were hoping it could make FDT the next Flash; it just seems it'll not be because people cannot get fixes or workarounds for real world problems into Linux timely? So a very polite but firm prod towards Cambridge from here as well in the hope that you can make a big change to this world by helping not to miss the next merge window/release leading to way bigger impact. It would be rather sad to close the Arm/ACPI chapter for good but it seems that we may be standing on the verge of it if things do not move quick now and different in the future. It'll certainly need change from all sides but the good things is that at the end of the day we all want to make the world a better place. As I mentioned, I have no stakes in this Linux change. I just care about Arm and ACPI because I saw light and a chance in it and I would love to see it stay. Let's all work together in one direction and make it a brighter future for everyone. Can we? Are you in? May God bless you and your work, Bjoern ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device
On Mon, Jun 27, 2022 at 09:25:42PM +0800, Yicong Yang wrote: > On 2022/6/27 21:12, Greg KH wrote: > > On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote: > >> Hi Greg, > >> > >> Since the kernel side of this device has been reviewed for 8 versions with > >> all comments addressed and no more comment since v9 posted in 5.19-rc1, > >> is it ok to merge it first (for Patch 1-3 and 7-8)? > > > > I am not the maintainer of this subsystem, so I do not understand why > > you are asking me :( > > > > I checked the log of drivers/hwtracing and seems patches of > coresight/intel_th/stm > applied by different maintainers and I see you applied some patches of > intel_th/stm. > Should any of these three maintainers or you can help applied this? I was hoping Mark would have a look, since he knows this ARM stuff better than me. But ISTR he's somewhat busy atm too. But an ACK from the CoreSight people would also be appreciated. And Arnaldo usually doesn't pick up the userspace perf bits until the kernel side is sorted. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 02/10] dt-bindings: display: tegra: Convert to json-schema
On Fri, Jun 24, 2022 at 11:26 AM Rob Herring wrote: > > On Tue, 21 Jun 2022 18:10:14 +0300, Mikko Perttunen wrote: > > From: Thierry Reding > > > > Convert the Tegra host1x controller bindings from the free-form text > > format to json-schema. > > > > This also adds the missing display-hub DT bindings that were not > > previously documented. > > > > Reviewed-by: Rob Herring > > Signed-off-by: Thierry Reding > > --- > > .../display/tegra/nvidia,tegra114-mipi.txt| 41 -- > > .../display/tegra/nvidia,tegra114-mipi.yaml | 74 ++ > > .../display/tegra/nvidia,tegra124-dpaux.yaml | 149 > > .../display/tegra/nvidia,tegra124-sor.yaml| 206 ++ > > .../display/tegra/nvidia,tegra124-vic.yaml| 71 ++ > > .../display/tegra/nvidia,tegra186-dc.yaml | 85 +++ > > .../tegra/nvidia,tegra186-display.yaml| 310 > > .../tegra/nvidia,tegra186-dsi-padctl.yaml | 45 ++ > > .../display/tegra/nvidia,tegra20-dc.yaml | 181 + > > .../display/tegra/nvidia,tegra20-dsi.yaml | 159 + > > .../display/tegra/nvidia,tegra20-epp.yaml | 70 ++ > > .../display/tegra/nvidia,tegra20-gr2d.yaml| 73 ++ > > .../display/tegra/nvidia,tegra20-gr3d.yaml| 214 ++ > > .../display/tegra/nvidia,tegra20-hdmi.yaml| 126 > > .../display/tegra/nvidia,tegra20-host1x.txt | 675 -- > > .../display/tegra/nvidia,tegra20-host1x.yaml | 347 + > > .../display/tegra/nvidia,tegra20-isp.yaml | 67 ++ > > .../display/tegra/nvidia,tegra20-mpe.yaml | 73 ++ > > .../display/tegra/nvidia,tegra20-tvo.yaml | 58 ++ > > .../display/tegra/nvidia,tegra20-vi.yaml | 163 + > > .../display/tegra/nvidia,tegra210-csi.yaml| 52 ++ > > .../pinctrl/nvidia,tegra124-dpaux-padctl.txt | 59 -- > > 22 files changed, 2523 insertions(+), 775 deletions(-) > > delete mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra114-mipi.txt > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra114-mipi.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-dpaux.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-vic.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-display.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra186-dsi-padctl.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dsi.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-epp.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr2d.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr3d.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-hdmi.yaml > > delete mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-isp.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-mpe.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-tvo.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-vi.yaml > > create mode 100644 > > Documentation/devicetree/bindings/display/tegra/nvidia,tegra210-csi.yaml > > delete mode 100644 > > Documentation/devicetree/bindings/pinctrl/nvidia,tegra124-dpaux-padctl.txt > > > > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' > on your patch (DT_CHECKER_FLAGS is new in v5.13): > > yamllint warnings/errors: > > dtschema/dtc warnings/errors: > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml: > allOf:1:if:not:properties: {'contains': {'const': 'nvidia,panel'}} should > not be valid under {'$ref': '#/definitions/sub-schemas'} > hint: A json-schema keyword was found instead of a DT property name. > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml# > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra124-sor.yaml: > ignoring, error in schema: allOf: 1: if: not: properties > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-gr3d.example.dtb: >
Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device
On 2022/6/27 21:12, Greg KH wrote: > On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote: >> Hi Greg, >> >> Since the kernel side of this device has been reviewed for 8 versions with >> all comments addressed and no more comment since v9 posted in 5.19-rc1, >> is it ok to merge it first (for Patch 1-3 and 7-8)? > > I am not the maintainer of this subsystem, so I do not understand why > you are asking me :( > I checked the log of drivers/hwtracing and seems patches of coresight/intel_th/stm applied by different maintainers and I see you applied some patches of intel_th/stm. Should any of these three maintainers or you can help applied this? Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device
On Mon, Jun 27, 2022 at 07:18:12PM +0800, Yicong Yang wrote: > Hi Greg, > > Since the kernel side of this device has been reviewed for 8 versions with > all comments addressed and no more comment since v9 posted in 5.19-rc1, > is it ok to merge it first (for Patch 1-3 and 7-8)? I am not the maintainer of this subsystem, so I do not understand why you are asking me :( thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 10/11] iommu: Per-domain I/O page fault handling
Hi, 在 2022/6/21 22:43, Lu Baolu 写道: Tweak the I/O page fault handling framework to route the page faults to the domain and call the page fault handler retrieved from the domain. This makes the I/O page fault handling framework possible to serve more usage scenarios as long as they have an IOMMU domain and install a page fault handler in it. Some unused functions are also removed to avoid dead code. The iommu_get_domain_for_dev_pasid() which retrieves attached domain for a {device, PASID} pair is used. It will be used by the page fault handling framework which knows {device, PASID} reported from the iommu driver. We have a guarantee that the SVA domain doesn't go away during IOPF handling, because unbind() waits for pending faults with iopf_queue_flush_dev() before freeing the domain. Hence, there's no need to synchronize life cycle of the iommu domains between the unbind() and the interrupt threads. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker --- drivers/iommu/io-pgfault.c | 64 +- 1 file changed, 7 insertions(+), 57 deletions(-) diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index aee9e033012f..4f24ec703479 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf, return iommu_page_response(dev, &resp); } -static enum iommu_page_response_code -iopf_handle_single(struct iopf_fault *iopf) -{ - vm_fault_t ret; - struct mm_struct *mm; - struct vm_area_struct *vma; - unsigned int access_flags = 0; - unsigned int fault_flags = FAULT_FLAG_REMOTE; - struct iommu_fault_page_request *prm = &iopf->fault.prm; - enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; - - if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) - return status; - - mm = iommu_sva_find(prm->pasid); - if (IS_ERR_OR_NULL(mm)) - return status; - - mmap_read_lock(mm); - - vma = find_extend_vma(mm, prm->addr); - if (!vma) - /* Unmapped area */ - goto out_put_mm; - - if (prm->perm & IOMMU_FAULT_PERM_READ) - access_flags |= VM_READ; - - if (prm->perm & IOMMU_FAULT_PERM_WRITE) { - access_flags |= VM_WRITE; - fault_flags |= FAULT_FLAG_WRITE; - } - - if (prm->perm & IOMMU_FAULT_PERM_EXEC) { - access_flags |= VM_EXEC; - fault_flags |= FAULT_FLAG_INSTRUCTION; - } - - if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) - fault_flags |= FAULT_FLAG_USER; - - if (access_flags & ~vma->vm_flags) - /* Access fault */ - goto out_put_mm; - - ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL); - status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : - IOMMU_PAGE_RESP_SUCCESS; - -out_put_mm: - mmap_read_unlock(mm); - mmput(mm); - - return status; -} - Once the iopf_handle_single() is removed, the name of iopf_handle_group() looks a little weired and confused, does this group mean the iommu group (domain) ? while I take some minutes to look into the code, oh, means a batch / list / queue of iopfs , and iopf_handle_group() becomes a generic iopf_handler() . Doe it make sense to revise the names of iopf_handle_group(), iopf_complete_group, iopf_group in this patch set ? Thanks, Ethan static void iopf_handle_group(struct work_struct *work) { struct iopf_group *group; + struct iommu_domain *domain; struct iopf_fault *iopf, *next; enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; group = container_of(work, struct iopf_group, work); + domain = iommu_get_domain_for_dev_pasid(group->dev, + group->last_fault.fault.prm.pasid); + if (!domain || !domain->iopf_handler) + status = IOMMU_PAGE_RESP_INVALID; list_for_each_entry_safe(iopf, next, &group->faults, list) { /* @@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work) * faults in the group if there is an error. */ if (status == IOMMU_PAGE_RESP_SUCCESS) - status = iopf_handle_single(iopf); + status = domain->iopf_handler(&iopf->fault, + domain->fault_data); if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) -- "firm, enduring, strong, and long-lived" ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] ACPI: VIOT: Fix ACS setup
Currently acpi_viot_init() gets called after the pci device has been scanned and pci_enable_acs() has been called. So pci_request_acs() fails to be taken into account leading to wrong single iommu group topologies when dealing with multi-function root ports for instance. We cannot simply move the acpi_viot_init() earlier, similarly as the IORT init because the VIOT parsing relies on the pci scan. However we can detect VIOT is present earlier and in such a case, request ACS. Introduce a new acpi_viot_early_init() routine that allows to call pci_request_acs() before the scan. Fixes: 3cf485540e7b ("ACPI: Add driver for the VIOT table") Signed-off-by: Eric Auger Reported-by: Jin Liu --- drivers/acpi/bus.c| 1 + drivers/acpi/viot.c | 23 +-- include/linux/acpi_viot.h | 2 ++ 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 86fa61a21826..906ad8153fd9 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -1400,6 +1400,7 @@ static int __init acpi_init(void) pci_mmcfg_late_init(); acpi_iort_init(); + acpi_viot_early_init(); acpi_hest_init(); acpi_ghes_init(); acpi_scan_init(); diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c index d2256326c73a..3c1be123e4d6 100644 --- a/drivers/acpi/viot.c +++ b/drivers/acpi/viot.c @@ -248,6 +248,23 @@ static int __init viot_parse_node(const struct acpi_viot_header *hdr) return ret; } +/** + * acpi_viot_early_init - Test the presence of VIOT and enable ACS + * + * If the VIOT does exist, ACS must be enabled. This cannot be + * done in acpi_viot_init() which is called after the bus scan + */ +void __init acpi_viot_early_init(void) +{ + acpi_status status; + struct acpi_table_header *hdr; + + status = acpi_get_table(ACPI_SIG_VIOT, 0, &hdr); + if (!ACPI_FAILURE(status)) + pci_request_acs(); + acpi_put_table(hdr); +} + /** * acpi_viot_init - Parse the VIOT table * @@ -319,12 +336,6 @@ static int viot_pci_dev_iommu_init(struct pci_dev *pdev, u16 dev_id, void *data) epid = ((domain_nr - ep->segment_start) << 16) + dev_id - ep->bdf_start + ep->endpoint_id; - /* -* If we found a PCI range managed by the viommu, we're -* the one that has to request ACS. -*/ - pci_request_acs(); - return viot_dev_iommu_init(&pdev->dev, ep->viommu, epid); } diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h index 1eb8ee5b0e5f..e58d60f8ff2e 100644 --- a/include/linux/acpi_viot.h +++ b/include/linux/acpi_viot.h @@ -6,10 +6,12 @@ #include #ifdef CONFIG_ACPI_VIOT +void __init acpi_viot_early_init(void); void __init acpi_viot_init(void); int viot_iommu_configure(struct device *dev); #else static inline void acpi_viot_init(void) {} +static inline void acpi_viot_early_init(void) {} static inline int viot_iommu_configure(struct device *dev) { return -ENODEV; -- 2.35.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node
On 2022-06-24 16:44, Shameerali Kolothum Thodi via iommu wrote: -Original Message- From: Steven Price [mailto:steven.pr...@arm.com] Sent: 17 June 2022 13:42 To: Shameerali Kolothum Thodi ; linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; iommu@lists.linux-foundation.org Cc: Linuxarm ; lorenzo.pieral...@arm.com; j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang ; Guohanjun (Hanjun Guo) ; sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org Subject: Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node On 15/06/2022 11:10, Shameer Kolothum wrote: Hi v12 --> v13 -No changes. Rebased to 5.19-rc1. -Picked up tags received from Laurentiu, Hanjun and Will. Thanks!. You've already got my Tested-by tags, but just to confirm I gave this a spin and it works fine. Thanks Steve. I think the series is now in a good shape to be merged. Hi Will/Robin, Appreciate, if you could please take a look at the remaining SMMU related patches(7-9) and provide your approval? I said v12 looked fine, but for the avoidance of doubt, here it is again, as formally as can be: Acked-by: Robin Murphy Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 06/11] arm-smmu-v3/sva: Add SVA domain support
On 2022/6/21 下午10:43, Lu Baolu wrote: Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Tested-by: Zhangfei Gao Have tested the series on aarch64. Thanks --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 69 +++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 + 3 files changed, 78 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index d2ba86470c42..96399dd3a67a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm); void arm_smmu_sva_unbind(struct iommu_sva *handle); u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle); void arm_smmu_sva_notifier_synchronize(void); +struct iommu_domain *arm_smmu_sva_domain_alloc(void); #else /* CONFIG_ARM_SMMU_V3_SVA */ static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) { @@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle) } static inline void arm_smmu_sva_notifier_synchronize(void) {} + +static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void) +{ + return NULL; +} #endif /* CONFIG_ARM_SMMU_V3_SVA */ #endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index f155d406c5d5..fc4555dac5b4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -549,3 +549,72 @@ void arm_smmu_sva_notifier_synchronize(void) */ mmu_notifier_synchronize(); } + +static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + int ret = 0; + struct mm_struct *mm; + struct iommu_sva *handle; + + if (domain->type != IOMMU_DOMAIN_SVA) + return -EINVAL; + + mm = domain->mm; + if (WARN_ON(!mm)) + return -ENODEV; + + mutex_lock(&sva_lock); + handle = __arm_smmu_sva_bind(dev, mm); + if (IS_ERR(handle)) + ret = PTR_ERR(handle); + mutex_unlock(&sva_lock); + + return ret; +} + +static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain, +struct device *dev, ioasid_t id) +{ + struct mm_struct *mm = domain->mm; + struct arm_smmu_bond *bond = NULL, *t; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + mutex_lock(&sva_lock); + list_for_each_entry(t, &master->bonds, list) { + if (t->mm == mm) { + bond = t; + break; + } + } + + if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) { + list_del(&bond->list); + arm_smmu_mmu_notifier_put(bond->smmu_mn); + kfree(bond); + } + mutex_unlock(&sva_lock); +} + +static void arm_smmu_sva_domain_free(struct iommu_domain *domain) +{ + kfree(domain); +} + +static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { + .set_dev_pasid = arm_smmu_sva_set_dev_pasid, + .block_dev_pasid= arm_smmu_sva_block_dev_pasid, + .free = arm_smmu_sva_domain_free, +}; + +struct iommu_domain *arm_smmu_sva_domain_alloc(void) +{ + struct iommu_domain *domain; + + domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return NULL; + domain->ops = &arm_smmu_sva_domain_ops; + + return domain; +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ae8ec8df47c1..a30b252e2f95 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1999,6 +1999,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) { struct arm_smmu_domain *smmu_domain; + if (type == IOMMU_DOMAIN_SVA) + return arm_smmu_sva_domain_alloc(); + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_DMA_FQ && ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 0/8] Add support for HiSilicon PCIe Tune and Trace device
Hi Greg, Since the kernel side of this device has been reviewed for 8 versions with all comments addressed and no more comment since v9 posted in 5.19-rc1, is it ok to merge it first (for Patch 1-3 and 7-8)? Thanks. On 2022/6/6 19:55, Yicong Yang wrote: > HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex > integrated Endpoint (RCiEP) device, providing the capability > to dynamically monitor and tune the PCIe traffic (tune), > and trace the TLP headers (trace). > > PTT tune is designed for monitoring and adjusting PCIe link parameters. > We provide several parameters of the PCIe link. Through the driver, > user can adjust the value of certain parameter to affect the PCIe link > for the purpose of enhancing the performance in certian situation. > > PTT trace is designed for dumping the TLP headers to the memory, which > can be used to analyze the transactions and usage condition of the PCIe > Link. Users can choose filters to trace headers, by either requester > ID, or those downstream of a set of Root Ports on the same core of the > PTT device. It's also supported to trace the headers of certain type and > of certain direction. > > The driver registers a PMU device for each PTT device. The trace can > be used through `perf record` and the traced headers can be decoded > by `perf report`. The perf command support for the device is also > added in this patchset. The tune can be used through the sysfs > attributes of related PMU device. See the documentation for the > detailed usage. > > Change since v8: > - Cleanups and one minor fix from Jonathan and John, thanks > Link: > https://lore.kernel.org/lkml/20220516125223.32012-1-yangyic...@hisilicon.com/ > > Change since v7: > - Configure the DMA in probe rather than in runtime. Also use devres to manage > PMU device as we have no order problem now > - Refactor the config validation function per John and Leo > - Use a spinlock hisi_ptt::pmu_lock instead of mutex to serialize the perf > process > in pmu::start as it's in atomic context > - Only commit the traced data when stop, per Leo and James > - Drop the filter dynamically updating patch from this series to simply the > review > of the driver. That patch will be send separately. > - add a cpumask sysfs attribute and handle the cpu hotplug events, follow the > uncore PMU convention > - Other cleanups and fixes, both in driver and perf tool > Link: > https://lore.kernel.org/lkml/20220407125841.3678-1-yangyic...@hisilicon.com/ > > Change since v6: > - Fix W=1 errors reported by lkp test, thanks > > Change since v5: > - Squash the PMU patch into PATCH 2 suggested by John > - refine the commit message of PATCH 1 and some comments > Link: > https://lore.kernel.org/lkml/20220308084930.5142-1-yangyic...@hisilicon.com/ > > Change since v4: > Address the comments from Jonathan, John and Ma Ca, thanks. > - Use devm* also for allocating the DMA buffers > - Remove the IRQ handler stub in Patch 2 > - Make functions waiting for hardware state return boolean > - Manual remove the PMU device as it should be removed first > - Modifier the orders in probe and removal to make them matched well > - Make available {directions,type,format} array const and non-global > - Using the right filter list in filters show and well protect the > list with mutex > - Record the trace status with a boolean @started rather than enum > - Optimize the process of finding the PTT devices of the perf-tool > Link: > https://lore.kernel.org/linux-pci/20220221084307.33712-1-yangyic...@hisilicon.com/ > > Change since v3: > Address the comments from Jonathan and John, thanks. > - drop members in the common struct which can be get on the fly > - reduce buffer struct and organize the buffers with array instead of list > - reduce the DMA reset wait time to avoid long time busy loop > - split the available_filters sysfs attribute into two files, for root port > and requester respectively. Update the documentation accordingly > - make IOMMU mapping check earlier in probe to avoid race condition. Also > make IOMMU quirk patch prior to driver in the series > - Cleanups and typos fixes from John and Jonathan > Link: > https://lore.kernel.org/linux-pci/20220124131118.17887-1-yangyic...@hisilicon.com/ > > Change since v2: > - address the comments from Mathieu, thanks. > - rename the directory to ptt to match the function of the device > - spinoff the declarations to a separate header > - split the trace function to several patches > - some other comments. > - make default smmu domain type of PTT device to identity > Drop the RMR as it's not recommended and use an iommu_def_domain_type > quirk to passthrough the device DMA as suggested by Robin. > Link: > https://lore.kernel.org/linux-pci/2026090625.53702-1-yangyic...@hisilicon.com/ > > Change since v1: > - switch the user interface of trace to perf from debugfs > - switch the user interface of tune to sysfs from debugfs > - add perf tool support to
Re: [PATCH v1 3/7] iommu/amd: Fix sparse warning
On 6/23/2022 3:12 PM, Robin Murphy wrote: > On 2022-06-23 09:03, Joerg Roedel wrote: >> On Fri, Jun 03, 2022 at 04:51:03PM +0530, Vasant Hegde wrote: >>> Fix below sparse warning: >>> CHECK drivers/iommu/amd/iommu.c >>> drivers/iommu/amd/iommu.c:73:24: warning: symbol 'amd_iommu_ops' was not >>> declared. Should it be static? >>> >>> Also we are going to introduce v2 page table which has different >>> pgsize_bitmaps. Hence remove 'const' qualifier. >> >> I am not a fan of removing the consts. Please use separate ops >> structures for v2 page-tables and make then const as well. This probably >> also has some optimization potential in the future when we can make the >> ops call-back functions page-table specific. > > TBH it's probably time to retire iommu_ops->pgsize_bitmap anyway. At the very > least it would be logical to move it to iommu_domain_ops now, but maybe we > could skip ahead and just rely on drivers initialising domain->pgsize_bitmap > directly in their .domain_alloc? > Robin, Something like below? If yes, I will cleanup and get proper fix. -Vasant diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 840831d5d2ad..32dd84a7c1da 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1916,6 +1916,7 @@ static int protection_domain_init_v1(struct protection_domain *domain, int mode) return -ENOMEM; } + domain->domain.pgsize_bitmap= AMD_IOMMU_PGSIZES; amd_iommu_domain_set_pgtable(domain, pt_root, mode); return 0; @@ -2282,7 +2283,6 @@ const struct iommu_ops amd_iommu_ops = { .get_resv_regions = amd_iommu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, .is_attach_deferred = amd_iommu_is_attach_deferred, - .pgsize_bitmap = AMD_IOMMU_PGSIZES, .def_domain_type = amd_iommu_def_domain_type, .default_domain_ops = &(const struct iommu_domain_ops) { .attach_dev = amd_iommu_attach_device, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 847ad47a2dfd..73cfba6a6728 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1915,8 +1915,6 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, return NULL; domain->type = type; - /* Assume all sizes by default; the driver may override this later */ - domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; if (!domain->ops) domain->ops = bus->iommu_ops->default_domain_ops; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e1afe169549..0c028aa71b96 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -255,7 +255,6 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); const struct iommu_domain_ops *default_domain_ops; - unsigned long pgsize_bitmap; struct module *owner; }; > (and FWIW I'm leaning towards the same for the domain->ops as well; the more > I look at ops->default_domain_ops, the more it starts looking like a stupid > mess... my stupid mess... sorry about that) > > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 07/11] iommu/sva: Refactoring iommu_sva_bind/unbind_device()
> From: Lu Baolu > Sent: Tuesday, June 21, 2022 10:44 PM > +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct > mm_struct *mm) > +{ > + struct iommu_domain *domain; > + ioasid_t max_pasids; > + int ret = -EINVAL; > + > + /* Allocate mm->pasid if necessary. */ this comment is for iommu_sva_alloc_pasid() > + max_pasids = dev->iommu->max_pasids; > + if (!max_pasids) > + return ERR_PTR(-EOPNOTSUPP); > + > + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1); > + if (ret) > + return ERR_PTR(ret); > + ... > +void iommu_sva_unbind_device(struct iommu_sva *handle) > +{ > + struct device *dev = handle->dev; > + struct iommu_domain *domain = > + container_of(handle, struct iommu_domain, bond); > + ioasid_t pasid = iommu_sva_get_pasid(handle); > + > + mutex_lock(&iommu_sva_lock); > + if (refcount_dec_and_test(&domain->bond.users)) { > + iommu_detach_device_pasid(domain, dev, pasid); > + iommu_domain_free(domain); > + } > + mutex_unlock(&iommu_sva_lock); > +} > +EXPORT_SYMBOL_GPL(iommu_sva_unbind_device); > + > +u32 iommu_sva_get_pasid(struct iommu_sva *handle) > +{ > + struct iommu_domain *domain = > + container_of(handle, struct iommu_domain, bond); > + > + return domain->mm->pasid; > +} > +EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); Looks this is only used by unbind_device. Just open code it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
* Saravana Kannan [220623 08:17]: > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren wrote: > > > > * Saravana Kannan [220622 19:05]: > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren wrote: > > > > This issue is no directly related fw_devlink. It is a side effect of > > > > removing driver_deferred_probe_check_state(). We no longer return > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state(). > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state() > > > was deleted because fw_devlink=on should have short circuited the > > > probe attempt with an -EPROBE_DEFER before reaching the bus/driver > > > probe function and hitting this -ENOENT failure. That's why I was > > > asking the other questions. > > > > OK. So where is the -EPROBE_DEFER supposed to happen without > > driver_deferred_probe_check_state() then? > > device_links_check_suppliers() call inside really_probe() would short > circuit and return an -EPROBE_DEFER if the device links are created as > expected. OK > > Hmm so I'm not seeing any supplier for the top level ocp device in > > the booting case without your patches. I see the suppliers for the > > ocp child device instances only. > > Hmmm... this is strange (that the device link isn't there), but this > is what I suspected. Yup, maybe it's because of the supplier being a device in the child interconnect for the ocp. > Now we need to figure out why it's missing. There are only a few > things that could cause this and I don't see any of those. I already > checked to make sure the power domain in this instance had a proper > driver with a probe() function -- if it didn't, then that's one thing > that'd could have caused the missing device link. The device does seem > to have a proper driver, so looks like I can rule that out. > > Can you point me to the dts file that corresponds to the specific > board you are testing this one? I probably won't find anything, but I > want to rule out some of the possibilities. You can use the beaglebone black dts for example, that's arch/arm/boot/dts/am335x-boneblack.dts and uses am33xx.dtsi for ocp interconnect with simple-pm-bus. > All the device link creation logic is inside drivers/base/core.c. So > if you can look at the existing messages or add other stuff to figure > out why the device link isn't getting created, that'd be handy. In > either case, I'll continue staring at the DT and code to see what > might be happening here. In device_links_check_suppliers() I see these ocp suppliers: platform ocp: device_links_check_suppliers: 1024: supplier 44e00d00.prm: link->status: 0 link->flags: 01c0 platform ocp: device_links_check_suppliers: 1024: supplier 44e01000.prm: link->status: 0 link->flags: 01c0 platform ocp: device_links_check_suppliers: 1024: supplier 44e00c00.prm: link->status: 0 link->flags: 01c0 platform ocp: device_links_check_suppliers: 1024: supplier 44e00e00.prm: link->status: 0 link->flags: 01c0 platform ocp: device_links_check_suppliers: 1024: supplier 44e01100.prm: link->status: 0 link->flags: 01c0 platform ocp: device_links_check_suppliers: 1024: supplier fixedregulator0: link->status: 1 link->flags: 01c0 No -EPROBE_DEFER is returned in device_links_check_suppliers() for 44e00c00.prm supplier for beaglebone black for example, 0 gets returned. Regards, Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next] iommu/dma: Fix missing mutex_init() in iommu_get_msi_cookie()
cookie_alloc() is called by iommu_get_dma_cookie() and iommu_get_msi_cookie(), but the mutex is only initialized in iommu_get_dma_cookie(), move mutex_init() into cookie_alloc() to make sure the mutex will be initialized. Fixes: ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain initialization") Reported-by: Hulk Robot Signed-off-by: Yang Yingliang --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 1910f4f1612b..e29157380c48 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -294,6 +294,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type) if (cookie) { INIT_LIST_HEAD(&cookie->msi_page_list); cookie->type = type; + mutex_init(&cookie->mutex); } return cookie; } @@ -311,7 +312,6 @@ int iommu_get_dma_cookie(struct iommu_domain *domain) if (!domain->iova_cookie) return -ENOMEM; - mutex_init(&domain->iova_cookie->mutex); return 0; } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS
Hi Michael, On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz wrote: > Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: > > From: Arnd Bergmann > > > > All architecture-independent users of virt_to_bus() and bus_to_virt() > > have been fixed to use the dma mapping interfaces or have been > > removed now. This means the definitions on most architectures, and the > > CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. > > > > The only exceptions to this are a few network and scsi drivers for m68k > > Amiga and VME machines and ppc32 Macintosh. These drivers work correctly > > with the old interfaces and are probably not worth changing. > > The Amiga SCSI drivers are all old WD33C93 ones, and replacing > virt_to_bus by virt_to_phys in the dma_setup() function there would > cause no functional change at all. FTR, the sgiwd93 driver use dma_map_single(). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 05/11] iommu/vt-d: Add SVA domain support
> From: Lu Baolu > Sent: Tuesday, June 21, 2022 10:44 PM > > Add support for SVA domain allocation and provide an SVA-specific > iommu_domain_ops. > > Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 04/11] iommu: Add sva iommu_domain support
> From: Lu Baolu > Sent: Tuesday, June 21, 2022 10:44 PM > > The sva iommu_domain represents a hardware pagetable that the IOMMU > hardware could use for SVA translation. This adds some infrastructure > to support SVA domain in the iommu common layer. It includes: > > - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA > domain > type. The IOMMU drivers that support SVA should provide the sva > domain specific iommu_domain_ops. > - Add a helper to allocate an SVA domain. The iommu_domain_free() > is still used to free an SVA domain. > - Add helpers to attach an SVA domain to a device and the reverse > operation. > > Some buses, like PCI, route packets without considering the PASID value. > Thus a DMA target address with PASID might be treated as P2P if the > address falls into the MMIO BAR of other devices in the group. To make > things simple, the attach/detach interfaces only apply to devices > belonging to the singleton groups, and the singleton is immutable in > fabric i.e. not affected by hotplug. > > The iommu_attach/detach_device_pasid() can be used for other purposes, > such as kernel DMA with pasid, mediation device, etc. I'd split this into two patches. One for adding iommu_attach/ detach_device_pasid() and set/block_dev_pasid ops, and the other for adding SVA. > struct iommu_domain { > unsigned type; > const struct iommu_domain_ops *ops; > unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */ > - iommu_fault_handler_t handler; > - void *handler_token; > struct iommu_domain_geometry geometry; > struct iommu_dma_cookie *iova_cookie; > + union { > + struct {/* IOMMU_DOMAIN_DMA */ > + iommu_fault_handler_t handler; > + void *handler_token; > + }; why is it DMA domain specific? What about unmanaged domain? Unrecoverable fault can happen on any type including SVA. Hence I think above should be domain type agnostic. > + struct {/* IOMMU_DOMAIN_SVA */ > + struct mm_struct *mm; > + }; > + }; > }; > > + > +struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, > + struct mm_struct *mm) > +{ > + const struct iommu_ops *ops = dev_iommu_ops(dev); > + struct iommu_domain *domain; > + > + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); > + if (!domain) > + return NULL; > + > + domain->type = IOMMU_DOMAIN_SVA; It's a bit weird that the type has been specified when calling ops->domain_alloc while it still leaves to the caller to set the type. But this is not caused by this series. could be cleaned up separately. > + > + mutex_lock(&group->mutex); > + curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, > GFP_KERNEL); > + if (curr) > + goto out_unlock; Need check xa_is_err(old). ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 5/8] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver
On 2022/6/27 10:02, Leo Yan wrote: > On Mon, Jun 06, 2022 at 07:55:52PM +0800, Yicong Yang wrote: >> From: Qi Liu >> >> HiSilicon PCIe tune and trace device (PTT) could dynamically tune >> the PCIe link's events, and trace the TLP headers). >> >> This patch add support for PTT device in perf tool, so users could >> use 'perf record' to get TLP headers trace data. >> >> Signed-off-by: Qi Liu >> Signed-off-by: Yicong Yang > > Just one minor comment. > > [...] > >> diff --git a/tools/perf/util/hisi-ptt.h b/tools/perf/util/hisi-ptt.h >> new file mode 100644 >> index ..2db9b4056214 >> --- /dev/null >> +++ b/tools/perf/util/hisi-ptt.h >> @@ -0,0 +1,19 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* >> + * HiSilicon PCIe Trace and Tuning (PTT) support >> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. >> + */ >> + >> +#ifndef INCLUDE__PERF_HISI_PTT_H__ >> +#define INCLUDE__PERF_HISI_PTT_H__ >> + >> +#define HISI_PTT_PMU_NAME "hisi_ptt" >> +#define HISI_PTT_AUXTRACE_PRIV_SIZE sizeof(u64) >> + >> +struct auxtrace_record *hisi_ptt_recording_init(int *err, >> +struct perf_pmu *hisi_ptt_pmu); >> + >> +int hisi_ptt_process_auxtrace_info(union perf_event *event, >> + struct perf_session *session); > > The function hisi_ptt_process_auxtrace_info() is introduced by next > patch for support PTT decoding, for good practice (e.g. keep > bisection), it's good to introduce function declaration and definition > in one patch. > Thanks for catching this. There's something wrong when doing the patch splitting. Will fix this! > With this fixing, this patch looks good to me: > > Reviewed-by: Leo Yan > Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS
Arnd, Am 26.06.2022 um 20:36 schrieb Arnd Bergmann: There are no platform specific header files other than asm/amigahw.h and asm/mvme147hw.h, currently only holding register address definitions. Would it be OK to add m68k_virt_to_bus() in there if it can't remain in asm/virtconvert.h, Geert? In that case, I would just leave it under the current name and not change m68k at all. I don't like the m68k_virt_to_bus() name because there is not anything CPU specific in what it does, and keeping it in a common header does nothing to prevent it from being used on other platforms either. Fair enough. 32bit powerpc is a different matter though. It's similar, but unrelated. The two apple ethernet drivers (bmac and mace) can again either get changed to use the dma-mapping interfaces, or get a custom pmac_virt_to_bus()/ pmac_bus_to_virt() helper. Hmmm - I see Finn had done the DMA API conversion on macmace.c which might give some hints on what to do about mace.c ... no idea about bmac.c though. And again, haven't got hardware to test, so custom helpers is it, then. Ok. Again, no platform specific headers to shift renamed helpers to, so may as well keep this as-is. Cheers, Michael Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 02/11] iommu: Add max_pasids field in struct dev_iommu
> From: Lu Baolu > Sent: Tuesday, June 21, 2022 10:44 PM > > Use this field to save the number of PASIDs that a device is able to > consume. It is a generic attribute of a device and lifting it into the > per-device dev_iommu struct could help to avoid the boilerplate code > in various IOMMU drivers. > > Signed-off-by: Lu Baolu > --- > include/linux/iommu.h | 2 ++ > drivers/iommu/iommu.c | 20 > 2 files changed, 22 insertions(+) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 03fbb1b71536..d50afb2c9a09 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -364,6 +364,7 @@ struct iommu_fault_param { > * @fwspec: IOMMU fwspec data > * @iommu_dev:IOMMU device this device is linked to > * @priv: IOMMU Driver private data > + * @max_pasids: number of PASIDs device can consume ... PASIDs *this* device can consume Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v9 01/11] iommu: Add max_pasids field in struct iommu_device
> From: Lu Baolu > Sent: Tuesday, June 21, 2022 10:44 PM > > Use this field to keep the number of supported PASIDs that an IOMMU > hardware is able to support. This is a generic attribute of an IOMMU > and lifting it into the per-IOMMU device structure makes it possible > to allocate a PASID for device without calls into the IOMMU drivers. > Any iommu driver that supports PASID related features should set this > field before enabling them on the devices. > > In the Intel IOMMU driver, intel_iommu_sm is moved to > CONFIG_INTEL_IOMMU > enclave so that the pasid_supported() helper could be used in dmar.c > without compilation errors. > > Signed-off-by: Lu Baolu > Reviewed-by: Jean-Philippe Brucker Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v3 2/2] vfio: Use device_iommu_capable()
> From: Robin Murphy > Sent: Saturday, June 25, 2022 2:00 AM > > Use the new interface to check the capabilities for our device > specifically. > > Reviewed-by: Lu Baolu > Signed-off-by: Robin Murphy Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination
> From: Robin Murphy > Sent: Saturday, June 25, 2022 1:52 AM > > Since IOMMU groups are mandatory for drivers to support, it stands to > reason that any device which has been successfully added to a group > must be on a bus supported by that IOMMU driver, and therefore a domain > viable for any device in the group must be viable for all devices in > the group. This already has to be the case for the IOMMU API's internal > default domain, for instance. Thus even if the group contains devices on > different buses, that can only mean that the IOMMU driver actually > supports such an odd topology, and so without loss of generality we can > expect the bus type of any device in a group to be suitable for IOMMU > API calls. > > Furthermore, scrutiny reveals a lack of protection for the bus being > removed while vfio_iommu_type1_attach_group() is using it; the reference > that VFIO holds on the iommu_group ensures that data remains valid, but > does not prevent the group's membership changing underfoot. > > We can address both concerns by recycling vfio_bus_type() into some > superficially similar logic to indirect the IOMMU API calls themselves. > Each call is thus protected from races by the IOMMU group's own locking, > and we no longer need to hold group-derived pointers beyond that scope. > It also gives us an easy path for the IOMMU API's migration of bus-based > interfaces to device-based, of which we can already take the first step > with device_iommu_capable(). As with domains, any capability must in > practice be consistent for devices in a given group - and after all it's > still the same capability which was expected to be consistent across an > entire bus! - so there's no need for any complicated validation. > > Signed-off-by: Robin Murphy > --- > > v3: Complete rewrite yet again, and finally it doesn't feel like we're > stretching any abstraction boundaries the wrong way, and the diffstat > looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP > directly in the callback, but decided I like the consistency of minimal > generic wrappers. And yes, if the capability isn't supported then it > does end up getting tested for the whole group, but meh, it's harmless. > Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu