RE: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting
> From: Steve Wahl > Sent: Thursday, May 19, 2022 3:58 AM > > On Fri, May 13, 2022 at 10:09:46AM +0800, Baolu Lu wrote: > > On 2022/5/13 07:12, Steve Wahl wrote: > > > On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote: > > > > To support up to 64 sockets with 10 DMAR units each (640), make the > > > > value of DMAR_UNITS_SUPPORTED adjustable by a config variable, > > > > CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when > MAXSMP is > > > > set. > > > > > > > > If the available hardware exceeds DMAR_UNITS_SUPPORTED > (previously set > > > > to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to > > > > allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ > > > > remapping doesn't support X2APIC mode x2apic disabled"; and the > system > > > > fails to boot properly. > > > > > > > > Signed-off-by: Steve Wahl > > > > > > I've received a report from the kernel test robot , > > > that this patch causes an error (shown below) when > > > CONFIG_IOMMU_SUPPORT is not set. > > > > > > In my opinion, this is because include/linux/dmar.h and > > > include/linux/intel-iommu are being #included when they are not really > > > being used. > > > > > > I've tried placing the contents of intel-iommu.h within an #ifdef > > > CONFIG_INTEL_IOMMU, and that fixes the problem. > > > > > > Two questions: > > > > > > A) Is this the desired approach to to the fix? > > > > Most part of include/linux/intel-iommu.h is private to Intel IOMMU > > driver. They should be put in a header like drivers/iommu/intel > > /iommu.h. Eventually, we should remove include/linux/intel-iommu.h > > and device drivers interact with iommu subsystem through the IOMMU > > kAPIs. > > > > Best regards, > > baolu > > Baolu's recent patch to move intel-iommu.h private still allows my > [PATCH v2] to apply with no changes, and gets rid of the compile > errors when CONFIG_IOMMU_SUPPORT is not set, so the kernel test robot > should be happy now. > > Is there another step I should do to bring this patch back into focus? > This looks good to me. Reviewed-by: Kevin Tian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users
Hi Kevin, On Mon, 23 May 2022 08:25:33 +, "Tian, Kevin" wrote: > > From: Jacob Pan > > Sent: Thursday, May 19, 2022 2:21 AM > > > > DMA mapping API is the de facto standard for in-kernel DMA. It operates > > on a per device/RID basis which is not PASID-aware. > > > > Some modern devices such as Intel Data Streaming Accelerator, PASID is > > required for certain work submissions. To allow such devices use DMA > > mapping API, we need the following functionalities: > > 1. Provide device a way to retrieve a PASID for work submission within > > the kernel > > 2. Enable the kernel PASID on the IOMMU for the device > > 3. Attach the kernel PASID to the device's default DMA domain, let it > > be IOVA or physical address in case of pass-through. > > > > This patch introduces a driver facing API that enables DMA API > > PASID usage. Once enabled, device drivers can continue to use DMA APIs > > as is. There is no difference in dma_handle between without PASID and > > with PASID. > > > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/dma-iommu.c | 114 > > ++ > > include/linux/dma-iommu.h | 3 + > > 2 files changed, 117 insertions(+) > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index 1ca85d37eeab..6ad7ba619ef0 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page { > > phys_addr_t phys; > > }; > > > > +static DECLARE_IOASID_SET(iommu_dma_pasid); > > + > > enum iommu_dma_cookie_type { > > IOMMU_DMA_IOVA_COOKIE, > > IOMMU_DMA_MSI_COOKIE, > > @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct > > iommu_domain *domain) > > domain->iova_cookie = NULL; > > } > > > > +/* Protect iommu_domain DMA PASID data */ > > +static DEFINE_MUTEX(dma_pasid_lock); > > +/** > > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the > > device's > > + * DMA domain. > > + * @dev: Device to be enabled > > + * @pasid: The returned kernel PASID to be used for DMA > > + * > > + * DMA request with PASID will be mapped the same way as the legacy > > DMA. > > + * If the device is in pass-through, PASID will also pass-through. If > > the > > + * device is in IOVA, the PASID will point to the same IOVA page table. > > + * > > + * @return err code or 0 on success > > + */ > > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid) > > iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from > a API p.o.v. > I agree dma_pasid is too broad, technically it is dma_api_pasid but seems too long. My concern with dma_domain_pasid is that the pasid can also be used for identity domain. > > +{ > > + struct iommu_domain *dom; > > + ioasid_t id, max; > > + int ret = 0; > > + > > + dom = iommu_get_domain_for_dev(dev); > > + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid) > > + return -ENODEV; > > + > > + /* Only support domain types that DMA API can be used */ > > + if (dom->type == IOMMU_DOMAIN_UNMANAGED || > > + dom->type == IOMMU_DOMAIN_BLOCKED) { > > + dev_warn(dev, "Invalid domain type %d", dom->type); > > + return -EPERM; > > + } > > WARN_ON. > > and probably we can just check whether domain is default domain here. > good point, I will just use struct iommu_domain *def_domain = iommu_get_dma_domain(dev); > > + > > + mutex_lock(_pasid_lock); > > + id = dom->dma_pasid; > > + if (!id) { > > + /* > > +* First device to use PASID in its DMA domain, > > allocate > > +* a single PASID per DMA domain is all we need, it is > > also > > +* good for performance when it comes down to IOTLB > > flush. > > +*/ > > + max = 1U << dev->iommu->pasid_bits; > > + if (!max) { > > + ret = -EINVAL; > > + goto done_unlock; > > + } > > + > > + id = ioasid_alloc(_dma_pasid, 1, max, dev); > > + if (id == INVALID_IOASID) { > > + ret = -ENOMEM; > > + goto done_unlock; > > + } > > + > > + dom->dma_pasid = id; > > + atomic_set(>dma_pasid_users, 1); > > this is always accessed with lock held hence no need to be atomic. > good catch, will fix > > + } > > + > > + ret = iommu_attach_device_pasid(dom, dev, id); > > + if (!ret) { > > + *pasid = id; > > + atomic_inc(>dma_pasid_users); > > + goto done_unlock; > > + } > > + > > + if (atomic_dec_and_test(>dma_pasid_users)) { > > + ioasid_free(id); > > + dom->dma_pasid = 0; > > + } > > +done_unlock: > > + mutex_unlock(_pasid_lock); > > + return ret; > > +} > > +EXPORT_SYMBOL(iommu_attach_dma_pasid); > > + > > +/** > > + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID > > + * @dev: Device's PASID DMA to be disabled > > + * > > + * It is the device
Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown
> >> Sigh. In theory drivers should never declare coherent memory like > >> this, and there has been some work to fix remoteproc in that regard. > >> > >> But I guess until that is merged we'll need somthing like this fix. > > > > Hi, > > > > Thanks for your comment. > > As I didn't see other fix of this issue, should we use this patch > > before the remoteproc work you mentioned is merged? > > TBH I think it would be better "fixed" with a kmemleak_ignore() and a > big comment, rather than adding API cruft for something that isn't a > real problem. I'm quite sure that no real-world user is unbinding > remoteproc drivers frequently enough that leaking a 48-byte allocation > each time has any practical significance. > Actually we stop 2 remote processors using the same remoteproc driver when system suspend or screen off on our arm64 platform. And the 48-byte slab allocation will be aligned up to 128 bytes on arm64. So the leak can be significant in our use case especially in stress test... We really don't want to ignore it. Maybe there're other way to fix it without adding APIs? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V2 0/6] IOMMU-DMA - support DMA_ATTR_LOW_ADDRESS attribute
Ping! On Thu, May 12, 2022 at 9:09 AM Ajay Kumar wrote: > > This patchset is a rebase of original patches from Marek Szyprowski: > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2321261.html > > The patches have been rebased on Joro's IOMMU tree "next" branch: > https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > > This patchset is needed to address the IOVA address dependency issue between > firmware buffers and other buffers in Samsung s5p-mfc driver. > > There have been few discussions in the past on how to find a generic > soultion for this issue, ranging from adding an entirely new API to choose > IOVA window[1], to adding a DMA attribute DMA_ATTR_LOW_ADDRESS which handles > buffer allocation from lower address[2]. > This is a continuation of initial work from Marek for approach [2]. > Patches have been tested with latest version of Samsung s5p-mfc driver. > > Changes since V1: > [PATCH V2 1/6] > - Rebase on latest tree. > > [PATCH V2 2/6] > - Rebase on latest tree. > Added a missing check for iova_pfn in __iova_rcache_get() > Discard changes from drivers/iommu/intel/iommu.c which are not necessary > > [PATCH V2 3/6] > - Rebase on latest tree. > > [PATCH V2 4/6] > - Rebase on latest tree > > [PATCH V2 5/6] > - Rebase on latest tree > > [PATCH V2 6/6] > - Rebase on latest tree. > > Marek Szyprowski (6): > dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute > iommu: iova: properly handle 0 as a valid IOVA address > iommu: iova: add support for 'first-fit' algorithm > iommu: dma-iommu: refactor iommu_dma_alloc_iova() > iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS > media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS > > References: > [1] > https://lore.kernel.org/linux-iommu/20200811054912.ga...@infradead.org/ > > [2] > https://lore.kernel.org/linux-mm/bff57cbe-2247-05e1-9059-d9c66d64c...@arm.com > > drivers/iommu/dma-iommu.c | 77 +++- > drivers/iommu/iova.c | 91 ++- > .../media/platform/samsung/s5p-mfc/s5p_mfc.c | 8 +- > include/linux/dma-mapping.h | 6 ++ > include/linux/iova.h | 3 + > 5 files changed, 156 insertions(+), 29 deletions(-) > > > base-commit: faf93cfaadfaaff2a5c35d6301b45aa2f6e4ddb2 > -- > 2.17.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error
On 2022-05-23 12:54, Johan Hovold wrote: On Mon, May 23, 2022 at 11:11:45AM +, cgel@gmail.com wrote: From: Minghao Chi The OF node should be put before returning error in ipmmu_init(), otherwise node's refcount will be leaked. Reported-by: Zeal Robot Signed-off-by: Minghao Chi --- drivers/iommu/ipmmu-vmsa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 8fdb84b3642b..f6440b106f46 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void) return 0; np = of_find_matching_node(NULL, ipmmu_of_ids); + of_node_put(np); if (!np) return 0; - of_node_put(np); - ret = platform_driver_register(_driver); if (ret < 0) return ret; NAK Indeed. How exactly can we hold a refcount on NULL, let alone leak it? Static checkers are great for flagging up code that *might* have issues, but please actually *look* at the code and apply some thought before sending a patch. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration
On Fri, May 20, 2022 at 05:15:55PM -0700, Saravana Kannan wrote: > On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor wrote: > > > > On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote: > > > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor > > > wrote: > > > > > > > > Hi Saravana, > > > > > > > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote: > > > > > The deferred probe timer that's used for this currently starts at > > > > > late_initcall and runs for driver_deferred_probe_timeout seconds. The > > > > > assumption being that all available drivers would be loaded and > > > > > registered before the timer expires. This means, the > > > > > driver_deferred_probe_timeout has to be pretty large for it to cover > > > > > the > > > > > worst case. But if we set the default value for it to cover the worst > > > > > case, it would significantly slow down the average case. For this > > > > > reason, the default value is set to 0. > > > > > > > > > > Also, with CONFIG_MODULES=y and the current default values of > > > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with > > > > > missing > > > > > drivers will cause their consumer devices to always defer their > > > > > probes. > > > > > This is because device links created by fw_devlink defer the probe > > > > > even > > > > > before the consumer driver's probe() is called. > > > > > > > > > > Instead of a fixed timeout, if we extend an unexpired deferred probe > > > > > timer on every successful driver registration, with the expectation > > > > > more > > > > > modules would be loaded in the near future, then the default value of > > > > > driver_deferred_probe_timeout only needs to be as long as the worst > > > > > case > > > > > time difference between two consecutive module loads. > > > > > > > > > > So let's implement that and set the default value to 10 seconds when > > > > > CONFIG_MODULES=y. > > > > > > > > > > Cc: Greg Kroah-Hartman > > > > > Cc: "Rafael J. Wysocki" > > > > > Cc: Rob Herring > > > > > Cc: Linus Walleij > > > > > Cc: Will Deacon > > > > > Cc: Ulf Hansson > > > > > Cc: Kevin Hilman > > > > > Cc: Thierry Reding > > > > > Cc: Mark Brown > > > > > Cc: Pavel Machek > > > > > Cc: Geert Uytterhoeven > > > > > Cc: Yoshihiro Shimoda > > > > > Cc: Paul Kocialkowski > > > > > Cc: linux-g...@vger.kernel.org > > > > > Cc: linux...@vger.kernel.org > > > > > Cc: iommu@lists.linux-foundation.org > > > > > Signed-off-by: Saravana Kannan > > > > > > > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this > > > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe > > > > timeout on driver registration") in next-20220520 (bisect log below). > > > > > > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- > > > > defconfig bzImage > > > > > > > > $ timeout --foreground 15m stdbuf -oL -eL \ > > > > qemu-system-s390x \ > > > > -initrd ... \ > > > > -M s390-ccw-virtio \ > > > > -display none \ > > > > -kernel arch/s390/boot/bzImage \ > > > > -m 512m \ > > > > -nodefaults \ > > > > -serial mon:stdio > > > > ... > > > > [2.077303] In-situ OAM (IOAM) with IPv6 > > > > [2.077639] NET: Registered PF_PACKET protocol family > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer > > > > available by default. Update your scripts to load br_netfilter if you > > > > need this. > > > > [2.078795] Key type dns_resolver registered > > > > [2.079317] cio: Channel measurement facility initialized using > > > > format extended (mode autodetected) > > > > [2.081494] Discipline DIAG cannot be used without z/VM > > > > [ 260.626363] random: crng init done > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout) > > > > > > > > We have a simple rootfs available if necessary: > > > > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst > > > > > > > > If there is any other information I can provide, please let me know! > > > > > > Hmm... strange. Can you please try the following command line options > > > and tell me which of these has the issue and which don't? > > > > Sure thing! > > > > > 1) deferred_probe_timeout=0 > > > > No issue. > > > > > 2) deferred_probe_timeout=1 > > > 3) deferred_probe_timeout=300 > > > > Both of these appear to hang in the same way, I let each sit for five > > minutes. > > Strange that a sufficiently large timeout isn't helping. Is it trying > to boot off a network mount? I'll continue looking into this next > week. I don't think so, it seems like doing that requires some extra flags that we do not have: https://wiki.qemu.org/Features/S390xNetworkBoot If you need any additional information or want something tested, please let me know! Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org
Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility
On 4/29/22 00:53, David Gibson wrote: On Thu, Mar 24, 2022 at 04:04:03PM -0600, Alex Williamson wrote: On Wed, 23 Mar 2022 21:33:42 -0300 Jason Gunthorpe wrote: On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote: My overall question here would be whether we can actually achieve a compatibility interface that has sufficient feature transparency that we can dump vfio code in favor of this interface, or will there be enough niche use cases that we need to keep type1 and vfio containers around through a deprecation process? Other than SPAPR, I think we can. Does this mean #ifdef CONFIG_PPC in vfio core to retain infrastructure for POWER support? There are a few different levels to consider for dealing with PPC. For a suitable long term interface for ppc hosts and guests dropping this is fine: the ppc specific iommu model was basically an ill-conceived idea from the beginning, because none of us had sufficiently understood what things were general and what things where iommu model/hw specific. ..mostly. There are several points of divergence for the ppc iommu model. 1) Limited IOVA windows. This one turned out to not really be ppc specific, and is (rightly) handlded generically in the new interface. No problem here. 2) Costly GUPs. pseries (the most common ppc machine type) always expects a (v)IOMMU. That means that unlike the common x86 model of a host with IOMMU, but guests with no-vIOMMU, guest initiated maps/unmaps can be a hot path. Accounting in that path can be prohibitive (and on POWER8 in particular it prevented us from optimizing that path the way we wanted). We had two solutions for that, in v1 the explicit ENABLE/DISABLE calls, which preaccounted based on the IOVA window sizes. That was improved in the v2 which used the concept of preregistration. IIUC iommufd can achieve the same effect as preregistration using IOAS_COPY, so this one isn't really a problem either. I am getting rid of those POWER8-related realmode handlers as POWER9 has MMU enabled when hcalls are handled. Costly GUP problem is still there though (which base IOAS should solve?). 3) "dynamic DMA windows" (DDW). The IBM IOMMU hardware allows for 2 IOVA windows, which aren't contiguous with each other. The base addresses of each of these are fixed, but the size of each window, the pagesize (i.e. granularity) of each window and the number of levels in the IOMMU pagetable are runtime configurable. Because it's true in the hardware, it's also true of the vIOMMU interface defined by the IBM hypervisor (and adpoted by KVM as well). So, guests can request changes in how these windows are handled. Typical Linux guests will use the "low" window (IOVA 0..2GiB) dynamically, and the high window (IOVA 1<<60..???) to map all of RAM. However, as a hypervisor we can't count on that; the guest can use them however it wants. The guest actually does this already. AIX has always been like that, Linux is forced to do that for SRIOV VFs as there can be many VFs and TCEs (==IOPTEs) are limited resource. The today's pseries IOMMU code first tried mapping 1:1 (as it has been for ages) but if there is not enough TCEs - it removes the first window (which increases the TCE budget), creates a new 64bit window (as big as possible but not necessarily enough for 1:1, 64K/2M IOMMU page sizes allowed) and does map/unmap as drivers go. Which means the guest RAM does not need to be all mapped in that base IOAS suggested down this thread as that would mean all memory is pinned and powervm won't be able to swap it out (yeah, it can do such thing now!). Not sure if we really want to support this or stick to a simpler design. (3) still needs a plan for how to fit it into the /dev/iommufd model. This is a secondary reason that in the past I advocated for the user requesting specific DMA windows which the kernel would accept or refuse, rather than having a query function - it connects easily to the DDW model. With the query-first model we'd need some sort of extension here, not really sure what it should look like. Then, there's handling existing qemu (or other software) that is using the VFIO SPAPR_TCE interfaces. First, it's not entirely clear if this should be a goal or not: as others have noted, working actively to port qemu to the new interface at the same time as making a comprehensive in-kernel compat layer is arguably redundant work. That said, if we did want to handle this in an in-kernel compat layer, here's roughly what you'd need for SPAPR_TCE v2: - VFIO_IOMMU_SPAPR_TCE_GET_INFO I think this should be fairly straightforward; the information you need should be in the now generic IOVA window stuff and would just need massaging into the expected format. - VFIO_IOMMU_SPAPR_REGISTER_MEMORY / VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY IIUC, these could be traslated into map/unmap operations onto a second implicit IOAS which represents the preregistered memory
[PATCH] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 ++ drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 4 ++ 3 files changed, 56 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7820711c4560..22e9a0085475 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -5,11 +5,19 @@ #include #include +#include #include #include #include "arm-smmu.h" +#define QCOM_DUMMY_VAL -1 + +/* Implementation Defined Register Space 0 registers */ +#define QCOM_SMMU_STATS_SYNC_INV_TBU_ACK 0x5dc +#define QCOM_SMMU_TBU_PWR_STATUS 0x2204 +#define QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR 0x2670 + struct qcom_smmu { struct arm_smmu_device smmu; bool bypass_quirk; @@ -22,6 +30,46 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) return container_of(smmu, struct qcom_smmu, smmu); } +static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) +{ + u32 sync_inv_ack, sync_inv_progress, tbu_pwr_status; + unsigned int spin_cnt, delay; + u32 reg; + int ret; + + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); + for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { + for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE)) + return; + cpu_relax(); + } + udelay(delay); + } + + sync_inv_ack = arm_smmu_readl(smmu, ARM_SMMU_IMPL_DEF0, + QCOM_SMMU_STATS_SYNC_INV_TBU_ACK); + + ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_TBU_PWR_STATUS, + _pwr_status); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU power status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR, + _inv_progress); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read SAFE WAIT counter: %d\n", ret); + + dev_err_ratelimited(smmu->dev, + "TLB sync timed out -- SMMU may be deadlocked\n" + "TBU: sync_inv_ack %#x power_status %#x sync_inv_progress %#x\n", + sync_inv_ack, tbu_pwr_status, sync_inv_progress); +} + static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx, u32 reg) { @@ -374,6 +422,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = { .def_domain_type = qcom_smmu_def_domain_type, .reset = qcom_smmu500_reset, .write_s2cr = qcom_smmu_write_s2cr, + .tlb_sync = qcom_smmu_tlb_sync, }; static const struct arm_smmu_impl qcom_adreno_smmu_impl = { @@ -382,6 +431,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = { .reset = qcom_smmu500_reset, .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank, .write_sctlr = qcom_adreno_smmu_write_sctlr, + .tlb_sync = qcom_smmu_tlb_sync, }; static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2ed3594f384e..4c5b51109835 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2099,6 +2099,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (IS_ERR(smmu->base)) return PTR_ERR(smmu->base); ioaddr = res->start; + smmu->ioaddr = ioaddr; + /* * The resource size should effectively match the value of SMMU_TOP; * stash that temporarily until we know PAGESIZE to validate it with. diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h index 2b9b42fb6f30..8cf6567d970f 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h @@ -278,6 +278,7 @@ struct arm_smmu_device { struct device *dev;
Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address
On 2022-05-11 13:15, Ajay Kumar wrote: From: Marek Szyprowski Zero is a valid DMA and IOVA address on many architectures, so adjust the IOVA management code to properly handle it. A new value IOVA_BAD_ADDR (~0UL) is introduced as a generic value for the error case. Adjust all callers of the alloc_iova_fast() function for the new return value. And when does anything actually need this? In fact if you were to stop iommu-dma from reserving IOVA 0 - which you don't - it would only show how patch #3 is broken. Also note that it's really nothing to do with architectures either way; iommu-dma simply chooses to reserve IOVA 0 for its own convenience, mostly because it can. Much the same way that 0 is typically a valid CPU VA, but mapping something meaningful there is just asking for a world of pain debugging NULL-dereference bugs. Robin. Signed-off-by: Marek Szyprowski Signed-off-by: Ajay Kumar --- drivers/iommu/dma-iommu.c | 16 +--- drivers/iommu/iova.c | 13 + include/linux/iova.h | 1 + 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 1ca85d37eeab..16218d6a0703 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -605,7 +605,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, { struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; - unsigned long shift, iova_len, iova = 0; + unsigned long shift, iova_len, iova = IOVA_BAD_ADDR; if (cookie->type == IOMMU_DMA_MSI_COOKIE) { cookie->msi_iova += size; @@ -625,11 +625,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift, false); - if (!iova) + if (iova == IOVA_BAD_ADDR) iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift, true); - return (dma_addr_t)iova << shift; + if (iova != IOVA_BAD_ADDR) + return (dma_addr_t)iova << shift; + return DMA_MAPPING_ERROR; } static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, @@ -688,7 +690,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size = iova_align(iovad, size + iova_off); iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev); - if (!iova) + if (iova == DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) { @@ -799,7 +801,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev, size = iova_align(iovad, size); iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev); - if (!iova) + if (iova == DMA_MAPPING_ERROR) goto out_free_pages; if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL)) @@ -1204,7 +1206,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, } iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); - if (!iova) { + if (iova == DMA_MAPPING_ERROR) { ret = -ENOMEM; goto out_restore_sg; } @@ -1516,7 +1518,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, return NULL; iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); - if (!iova) + if (iova == DMA_MAPPING_ERROR) goto out_free_page; if (iommu_map(domain, iova, msi_addr, size, prot)) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index db77aa675145..ae0fe0a6714e 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -429,6 +429,8 @@ EXPORT_SYMBOL_GPL(free_iova); * This function tries to satisfy an iova allocation from the rcache, * and falls back to regular allocation on failure. If regular allocation * fails too and the flush_rcache flag is set then the rcache will be flushed. + * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case + * of a failure. */ unsigned long alloc_iova_fast(struct iova_domain *iovad, unsigned long size, @@ -447,7 +449,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size, size = roundup_pow_of_two(size); iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1); - if (iova_pfn) + if (iova_pfn != IOVA_BAD_ADDR) return iova_pfn; retry: @@ -456,7 +458,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size, unsigned int cpu; if (!flush_rcache) - return 0; + return IOVA_BAD_ADDR; /* Try replenishing IOVAs by flushing rcache. */ flush_rcache = false; @@
Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain
Hi Kevin, On Mon, 23 May 2022 09:14:04 +, "Tian, Kevin" wrote: > > From: Tian, Kevin > > Sent: Monday, May 23, 2022 3:55 PM > > > > > From: Jacob Pan > > > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct > > > iommu_domain *domain) > > > +{ > > > + struct iommu_domain *tdomain; > > > + struct iommu_group *group; > > > + unsigned long index; > > > + ioasid_t pasid = INVALID_IOASID; > > > + > > > + group = iommu_group_get(dev); > > > + if (!group) > > > + return pasid; > > > + > > > + xa_for_each(>pasid_array, index, tdomain) { > > > + if (domain == tdomain) { > > > + pasid = index; > > > + break; > > > + } > > > + } > > > > Don't we need to acquire the group lock here? > > pasid_array is under RCU read lock so it is protected though may have stale data. It also used in atomic context for TLB flush, cannot take the group mutex. If the caller does detach_dev_pasid while doing TLB flush, it could result in extra flush but harmless. > > Btw the intention of this function is a bit confusing. Patch01 already > > stores the pasid under domain hence it's redundant to get it > > indirectly from xarray index. You could simply introduce a flag bit > > (e.g. dma_pasid_enabled) in device_domain_info and then directly > > use domain->dma_pasid once the flag is true. > > > > Just saw your discussion with Jason about v3. While it makes sense > to not specialize DMA domain in iommu driver, the use of this function > should only be that when the call chain doesn't pass down a pasid > value e.g. when doing cache invalidation for domain map/unmap. If > the upper interface already carries a pasid e.g. in detach_dev_pasid() > iommu driver can simply verify that the corresponding pasid xarray > entry points to the specified domain instead of using this function to > loop xarray and then verify the returned pasid (as done in patch03/04). Excellent point, I could just use xa_load(pasid) to compare the domain instead of loop through xa. I will add another helper. bool iommu_is_pasid_domain_attached(struct device *dev, struct iommu_domain *domain, ioasid_t pasid) { struct iommu_group *group; bool ret = false; group = iommu_group_get(dev); if (WARN_ON(!group)) return false; if (domain == xa_load(>pasid_array, pasid)) ret = true; iommu_group_put(group); return ret; } Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration
On Mon, May 23, 2022 at 8:17 AM Nathan Chancellor wrote: > > On Fri, May 20, 2022 at 05:15:55PM -0700, Saravana Kannan wrote: > > On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor wrote: > > > > > > On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote: > > > > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor > > > > wrote: > > > > > > > > > > Hi Saravana, > > > > > > > > > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote: > > > > > > The deferred probe timer that's used for this currently starts at > > > > > > late_initcall and runs for driver_deferred_probe_timeout seconds. > > > > > > The > > > > > > assumption being that all available drivers would be loaded and > > > > > > registered before the timer expires. This means, the > > > > > > driver_deferred_probe_timeout has to be pretty large for it to > > > > > > cover the > > > > > > worst case. But if we set the default value for it to cover the > > > > > > worst > > > > > > case, it would significantly slow down the average case. For this > > > > > > reason, the default value is set to 0. > > > > > > > > > > > > Also, with CONFIG_MODULES=y and the current default values of > > > > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with > > > > > > missing > > > > > > drivers will cause their consumer devices to always defer their > > > > > > probes. > > > > > > This is because device links created by fw_devlink defer the probe > > > > > > even > > > > > > before the consumer driver's probe() is called. > > > > > > > > > > > > Instead of a fixed timeout, if we extend an unexpired deferred probe > > > > > > timer on every successful driver registration, with the expectation > > > > > > more > > > > > > modules would be loaded in the near future, then the default value > > > > > > of > > > > > > driver_deferred_probe_timeout only needs to be as long as the worst > > > > > > case > > > > > > time difference between two consecutive module loads. > > > > > > > > > > > > So let's implement that and set the default value to 10 seconds when > > > > > > CONFIG_MODULES=y. > > > > > > > > > > > > Cc: Greg Kroah-Hartman > > > > > > Cc: "Rafael J. Wysocki" > > > > > > Cc: Rob Herring > > > > > > Cc: Linus Walleij > > > > > > Cc: Will Deacon > > > > > > Cc: Ulf Hansson > > > > > > Cc: Kevin Hilman > > > > > > Cc: Thierry Reding > > > > > > Cc: Mark Brown > > > > > > Cc: Pavel Machek > > > > > > Cc: Geert Uytterhoeven > > > > > > Cc: Yoshihiro Shimoda > > > > > > Cc: Paul Kocialkowski > > > > > > Cc: linux-g...@vger.kernel.org > > > > > > Cc: linux...@vger.kernel.org > > > > > > Cc: iommu@lists.linux-foundation.org > > > > > > Signed-off-by: Saravana Kannan > > > > > > > > > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this > > > > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe > > > > > timeout on driver registration") in next-20220520 (bisect log below). > > > > > > > > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- > > > > > defconfig bzImage > > > > > > > > > > $ timeout --foreground 15m stdbuf -oL -eL \ > > > > > qemu-system-s390x \ > > > > > -initrd ... \ > > > > > -M s390-ccw-virtio \ > > > > > -display none \ > > > > > -kernel arch/s390/boot/bzImage \ > > > > > -m 512m \ > > > > > -nodefaults \ > > > > > -serial mon:stdio > > > > > ... > > > > > [2.077303] In-situ OAM (IOAM) with IPv6 > > > > > [2.077639] NET: Registered PF_PACKET protocol family > > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer > > > > > available by default. Update your scripts to load br_netfilter if you > > > > > need this. > > > > > [2.078795] Key type dns_resolver registered > > > > > [2.079317] cio: Channel measurement facility initialized using > > > > > format extended (mode autodetected) > > > > > [2.081494] Discipline DIAG cannot be used without z/VM > > > > > [ 260.626363] random: crng init done > > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout) > > > > > > > > > > We have a simple rootfs available if necessary: > > > > > > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst > > > > > > > > > > If there is any other information I can provide, please let me know! > > > > > > > > Hmm... strange. Can you please try the following command line options > > > > and tell me which of these has the issue and which don't? > > > > > > Sure thing! > > > > > > > 1) deferred_probe_timeout=0 > > > > > > No issue. > > > > > > > 2) deferred_probe_timeout=1 > > > > 3) deferred_probe_timeout=300 > > > > > > Both of these appear to hang in the same way, I let each sit for five > > > minutes. > > > > Strange that a sufficiently large timeout isn't helping. Is it trying > > to boot off a network mount? I'll continue looking into this next > > week. > > I don't think so, it seems like doing that requires some extra flags
Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()
On 21/05/2022 00:33, Damien Le Moal wrote: Hi Damien, +unsigned long iova_rcache_range(void) Why not a size_t return type ? The IOVA code generally uses unsigned long for size/range while dam-iommu uses size_t as appropiate, so I'm just sticking to that. +{ + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); +} + Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain
> From: Jacob Pan > Sent: Thursday, May 19, 2022 2:21 AM > > IOMMU group maintains a PASID array which stores the associated IOMMU > domains. This patch introduces a helper function to do domain to PASID > look up. It will be used by TLB flush and device-PASID attach verification. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/iommu.c | 22 ++ > include/linux/iommu.h | 6 +- > 2 files changed, 27 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 00d0262a1fe9..22f44833db64 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -3199,3 +3199,25 @@ struct iommu_domain > *iommu_get_domain_for_iopf(struct device *dev, > > return domain; > } > + > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct > iommu_domain *domain) > +{ > + struct iommu_domain *tdomain; > + struct iommu_group *group; > + unsigned long index; > + ioasid_t pasid = INVALID_IOASID; > + > + group = iommu_group_get(dev); > + if (!group) > + return pasid; > + > + xa_for_each(>pasid_array, index, tdomain) { > + if (domain == tdomain) { > + pasid = index; > + break; > + } > + } Don't we need to acquire the group lock here? Btw the intention of this function is a bit confusing. Patch01 already stores the pasid under domain hence it's redundant to get it indirectly from xarray index. You could simply introduce a flag bit (e.g. dma_pasid_enabled) in device_domain_info and then directly use domain->dma_pasid once the flag is true. > + iommu_group_put(group); > + > + return pasid; > +} > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 36ad007084cc..c0440a4be699 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct > iommu_domain *domain, > struct device *dev, ioasid_t pasid); > struct iommu_domain * > iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid); > - > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct > iommu_domain *domain); > #else /* CONFIG_IOMMU_API */ > > struct iommu_ops {}; > @@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev, > ioasid_t pasid) > { > return NULL; > } > +static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct > iommu_domain *domain) > +{ > + return INVALID_IOASID; > +} > #endif /* CONFIG_IOMMU_API */ > > #ifdef CONFIG_IOMMU_SVA > -- > 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users
> From: Jacob Pan > Sent: Thursday, May 19, 2022 2:21 AM > > DMA mapping API is the de facto standard for in-kernel DMA. It operates > on a per device/RID basis which is not PASID-aware. > > Some modern devices such as Intel Data Streaming Accelerator, PASID is > required for certain work submissions. To allow such devices use DMA > mapping API, we need the following functionalities: > 1. Provide device a way to retrieve a PASID for work submission within > the kernel > 2. Enable the kernel PASID on the IOMMU for the device > 3. Attach the kernel PASID to the device's default DMA domain, let it > be IOVA or physical address in case of pass-through. > > This patch introduces a driver facing API that enables DMA API > PASID usage. Once enabled, device drivers can continue to use DMA APIs as > is. There is no difference in dma_handle between without PASID and with > PASID. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/dma-iommu.c | 114 > ++ > include/linux/dma-iommu.h | 3 + > 2 files changed, 117 insertions(+) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 1ca85d37eeab..6ad7ba619ef0 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page { > phys_addr_t phys; > }; > > +static DECLARE_IOASID_SET(iommu_dma_pasid); > + > enum iommu_dma_cookie_type { > IOMMU_DMA_IOVA_COOKIE, > IOMMU_DMA_MSI_COOKIE, > @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct > iommu_domain *domain) > domain->iova_cookie = NULL; > } > > +/* Protect iommu_domain DMA PASID data */ > +static DEFINE_MUTEX(dma_pasid_lock); > +/** > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the > device's > + * DMA domain. > + * @dev: Device to be enabled > + * @pasid: The returned kernel PASID to be used for DMA > + * > + * DMA request with PASID will be mapped the same way as the legacy DMA. > + * If the device is in pass-through, PASID will also pass-through. If the > + * device is in IOVA, the PASID will point to the same IOVA page table. > + * > + * @return err code or 0 on success > + */ > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid) iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from a API p.o.v. > +{ > + struct iommu_domain *dom; > + ioasid_t id, max; > + int ret = 0; > + > + dom = iommu_get_domain_for_dev(dev); > + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid) > + return -ENODEV; > + > + /* Only support domain types that DMA API can be used */ > + if (dom->type == IOMMU_DOMAIN_UNMANAGED || > + dom->type == IOMMU_DOMAIN_BLOCKED) { > + dev_warn(dev, "Invalid domain type %d", dom->type); > + return -EPERM; > + } WARN_ON. and probably we can just check whether domain is default domain here. > + > + mutex_lock(_pasid_lock); > + id = dom->dma_pasid; > + if (!id) { > + /* > + * First device to use PASID in its DMA domain, allocate > + * a single PASID per DMA domain is all we need, it is also > + * good for performance when it comes down to IOTLB flush. > + */ > + max = 1U << dev->iommu->pasid_bits; > + if (!max) { > + ret = -EINVAL; > + goto done_unlock; > + } > + > + id = ioasid_alloc(_dma_pasid, 1, max, dev); > + if (id == INVALID_IOASID) { > + ret = -ENOMEM; > + goto done_unlock; > + } > + > + dom->dma_pasid = id; > + atomic_set(>dma_pasid_users, 1); this is always accessed with lock held hence no need to be atomic. > + } > + > + ret = iommu_attach_device_pasid(dom, dev, id); > + if (!ret) { > + *pasid = id; > + atomic_inc(>dma_pasid_users); > + goto done_unlock; > + } > + > + if (atomic_dec_and_test(>dma_pasid_users)) { > + ioasid_free(id); > + dom->dma_pasid = 0; > + } > +done_unlock: > + mutex_unlock(_pasid_lock); > + return ret; > +} > +EXPORT_SYMBOL(iommu_attach_dma_pasid); > + > +/** > + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID > + * @dev: Device's PASID DMA to be disabled > + * > + * It is the device driver's responsibility to ensure no more incoming DMA > + * requests with the kernel PASID before calling this function. IOMMU driver > + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and > + * drained. > + * > + */ > +void iommu_detach_dma_pasid(struct device *dev) > +{ > + struct iommu_domain *dom; > + ioasid_t pasid; > + > + dom = iommu_get_domain_for_dev(dev); > + if (WARN_ON(!dom || !dom->ops || !dom->ops->detach_dev_pasid)) > + return; > + > + /* Only support DMA API managed domain type
Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown
> Sigh. In theory drivers should never declare coherent memory like > this, and there has been some work to fix remoteproc in that regard. > > But I guess until that is merged we'll need somthing like this fix. Hi, Thanks for your comment. As I didn't see other fix of this issue, should we use this patch before the remoteproc work you mentioned is merged? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error
From: Minghao Chi The OF node should be put before returning error in ipmmu_init(), otherwise node's refcount will be leaked. Reported-by: Zeal Robot Signed-off-by: Minghao Chi --- drivers/iommu/ipmmu-vmsa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 8fdb84b3642b..f6440b106f46 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void) return 0; np = of_find_matching_node(NULL, ipmmu_of_ids); + of_node_put(np); if (!np) return 0; - of_node_put(np); - ret = platform_driver_register(_driver); if (ret < 0) return ret; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error
On Mon, May 23, 2022 at 11:11:45AM +, cgel@gmail.com wrote: > From: Minghao Chi > > The OF node should be put before returning error in ipmmu_init(), > otherwise node's refcount will be leaked. > > Reported-by: Zeal Robot > Signed-off-by: Minghao Chi > --- > drivers/iommu/ipmmu-vmsa.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c > index 8fdb84b3642b..f6440b106f46 100644 > --- a/drivers/iommu/ipmmu-vmsa.c > +++ b/drivers/iommu/ipmmu-vmsa.c > @@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void) > return 0; > > np = of_find_matching_node(NULL, ipmmu_of_ids); > + of_node_put(np); > if (!np) > return 0; > > - of_node_put(np); > - > ret = platform_driver_register(_driver); > if (ret < 0) > return ret; NAK ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/4] DMA mapping changes for SCSI core
On 22/05/2022 23:22, Damien Le Moal wrote: On 2022/05/22 22:13, Christoph Hellwig wrote: The whole series looks fine to me. I'll happily queue it up in the dma-mapping tree if the SCSI and ATA maintainers are ok with that. Fine with me. I sent an acked-by for the libata bit. Thanks, I'm going to have to post a v2 and I figure that with the timing that I'll have to wait for v5.20 now. Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits
On 21/05/2022 00:30, Damien Le Moal wrote: diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f69b77cbf538..a3ae6345473b 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, shost->can_queue); Hi Damien, + if (dma_dev->dma_mask) { + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); + } Nit: you could drop the curly brackets here. Some people prefer this style - multi-line statements have curly brackets, while single-line statements conform to the official coding style (and don't use brackets). I'll just stick with what we have unless there is a consensus to change. Thanks, John + error = scsi_init_sense_cache(shost); if (error) goto fail; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()
On 2022/05/20 17:23, John Garry wrote: > Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which > allows the drivers to know the optimal mapping limit and thus limit the > requested IOVA lengths. > > This value is based on the IOVA rcache range limit, as IOVAs allocated > above this limit must always be newly allocated, which may be quite slow. > > Signed-off-by: John Garry > --- > drivers/iommu/dma-iommu.c | 6 ++ > drivers/iommu/iova.c | 5 + > include/linux/iova.h | 2 ++ > 3 files changed, 13 insertions(+) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 09f6e1c0f9c0..f619e41b9172 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -1442,6 +1442,11 @@ static unsigned long > iommu_dma_get_merge_boundary(struct device *dev) > return (1UL << __ffs(domain->pgsize_bitmap)) - 1; > } > > +static size_t iommu_dma_opt_mapping_size(void) > +{ > + return iova_rcache_range(); > +} > + > static const struct dma_map_ops iommu_dma_ops = { > .alloc = iommu_dma_alloc, > .free = iommu_dma_free, > @@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = { > .map_resource = iommu_dma_map_resource, > .unmap_resource = iommu_dma_unmap_resource, > .get_merge_boundary = iommu_dma_get_merge_boundary, > + .opt_mapping_size = iommu_dma_opt_mapping_size, > }; > > /* > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index db77aa675145..9f00b58d546e 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain > *iovad, > static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain > *iovad); > static void free_iova_rcaches(struct iova_domain *iovad); > > +unsigned long iova_rcache_range(void) > +{ > + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); > +} > + > static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node) > { > struct iova_domain *iovad; > diff --git a/include/linux/iova.h b/include/linux/iova.h > index 320a70e40233..c6ba6d95d79c 100644 > --- a/include/linux/iova.h > +++ b/include/linux/iova.h > @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain > *iovad, dma_addr_t iova) > int iova_cache_get(void); > void iova_cache_put(void); > > +unsigned long iova_rcache_range(void); > + > void free_iova(struct iova_domain *iovad, unsigned long pfn); > void __free_iova(struct iova_domain *iovad, struct iova *iova); > struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, Reviewed-by: Damien Le Moal -- Damien Le Moal Western Digital Research ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits
On 2022/05/23 15:53, John Garry wrote: > On 21/05/2022 00:30, Damien Le Moal wrote: >>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >>> index f69b77cbf538..a3ae6345473b 100644 >>> --- a/drivers/scsi/hosts.c >>> +++ b/drivers/scsi/hosts.c >>> @@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, >>> struct device *dev, >>> shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, >>>shost->can_queue); >>> > > Hi Damien, > >>> + if (dma_dev->dma_mask) { >>> + shost->max_sectors = min_t(unsigned int, shost->max_sectors, >>> + dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); >>> + } >> Nit: you could drop the curly brackets here. > > Some people prefer this style - multi-line statements have curly > brackets, while single-line statements conform to the official coding > style (and don't use brackets). OK. > > I'll just stick with what we have unless there is a consensus to change. > > Thanks, > John > >> >>> + >>> error = scsi_init_sense_cache(shost); >>> if (error) >>> goto fail; > -- Damien Le Moal Western Digital Research ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain
> From: Tian, Kevin > Sent: Monday, May 23, 2022 3:55 PM > > > From: Jacob Pan > > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct > > iommu_domain *domain) > > +{ > > + struct iommu_domain *tdomain; > > + struct iommu_group *group; > > + unsigned long index; > > + ioasid_t pasid = INVALID_IOASID; > > + > > + group = iommu_group_get(dev); > > + if (!group) > > + return pasid; > > + > > + xa_for_each(>pasid_array, index, tdomain) { > > + if (domain == tdomain) { > > + pasid = index; > > + break; > > + } > > + } > > Don't we need to acquire the group lock here? > > Btw the intention of this function is a bit confusing. Patch01 already > stores the pasid under domain hence it's redundant to get it > indirectly from xarray index. You could simply introduce a flag bit > (e.g. dma_pasid_enabled) in device_domain_info and then directly > use domain->dma_pasid once the flag is true. > Just saw your discussion with Jason about v3. While it makes sense to not specialize DMA domain in iommu driver, the use of this function should only be that when the call chain doesn't pass down a pasid value e.g. when doing cache invalidation for domain map/unmap. If the upper interface already carries a pasid e.g. in detach_dev_pasid() iommu driver can simply verify that the corresponding pasid xarray entry points to the specified domain instead of using this function to loop xarray and then verify the returned pasid (as done in patch03/04). Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits
Hi John, url: https://github.com/intel-lab-lkp/linux/commits/John-Garry/DMA-mapping-changes-for-SCSI-core/20220520-163049 base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next config: s390-randconfig-m031-20220519 (https://download.01.org/0day-ci/archive/20220521/202205210545.gks834ds-...@intel.com/config) compiler: s390-linux-gcc (GCC) 11.3.0 If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot Reported-by: Dan Carpenter smatch warnings: drivers/scsi/hosts.c:243 scsi_add_host_with_dma() warn: variable dereferenced before check 'dma_dev' (see line 228) vim +/dma_dev +243 drivers/scsi/hosts.c d139b9bd0e52dd James Bottomley 2009-11-05 209 int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, d139b9bd0e52dd James Bottomley 2009-11-05 210 struct device *dma_dev) ^1da177e4c3f41 Linus Torvalds2005-04-16 211 { ^1da177e4c3f41 Linus Torvalds2005-04-16 212struct scsi_host_template *sht = shost->hostt; ^1da177e4c3f41 Linus Torvalds2005-04-16 213int error = -EINVAL; ^1da177e4c3f41 Linus Torvalds2005-04-16 214 91921e016a2199 Hannes Reinecke 2014-06-25 215shost_printk(KERN_INFO, shost, "%s\n", ^1da177e4c3f41 Linus Torvalds2005-04-16 216 sht->info ? sht->info(shost) : sht->name); ^1da177e4c3f41 Linus Torvalds2005-04-16 217 ^1da177e4c3f41 Linus Torvalds2005-04-16 218if (!shost->can_queue) { 91921e016a2199 Hannes Reinecke 2014-06-25 219 shost_printk(KERN_ERR, shost, 91921e016a2199 Hannes Reinecke 2014-06-25 220 "can_queue = 0 no longer supported\n"); 542bd1377a9630 James Bottomley 2008-04-21 221goto fail; ^1da177e4c3f41 Linus Torvalds2005-04-16 222} ^1da177e4c3f41 Linus Torvalds2005-04-16 223 50b6cb3516365c Dexuan Cui2021-10-07 224/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ 50b6cb3516365c Dexuan Cui2021-10-07 225shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, ea2f0f77538c50 John Garry2021-05-19 226 shost->can_queue); ea2f0f77538c50 John Garry2021-05-19 227 2ad7ba6ca08593 John Garry2022-05-20 @228if (dma_dev->dma_mask) { ^ The patch adds a new unchecked dereference 2ad7ba6ca08593 John Garry2022-05-20 229 shost->max_sectors = min_t(unsigned int, shost->max_sectors, 2ad7ba6ca08593 John Garry2022-05-20 230 dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); 2ad7ba6ca08593 John Garry2022-05-20 231} 2ad7ba6ca08593 John Garry2022-05-20 232 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 233error = scsi_init_sense_cache(shost); 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 234if (error) 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 235goto fail; 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 236 d285203cf647d7 Christoph Hellwig 2014-01-17 237error = scsi_mq_setup_tags(shost); 542bd1377a9630 James Bottomley 2008-04-21 238if (error) 542bd1377a9630 James Bottomley 2008-04-21 239goto fail; d285203cf647d7 Christoph Hellwig 2014-01-17 240 ^1da177e4c3f41 Linus Torvalds2005-04-16 241if (!shost->shost_gendev.parent) ^1da177e4c3f41 Linus Torvalds2005-04-16 242 shost->shost_gendev.parent = dev ? dev : _bus; 3c8d9a957d0ae6 James Bottomley 2012-05-04 @243if (!dma_dev) The old code checked for NULL 3c8d9a957d0ae6 James Bottomley 2012-05-04 244dma_dev = shost->shost_gendev.parent; 3c8d9a957d0ae6 James Bottomley 2012-05-04 245 d139b9bd0e52dd James Bottomley 2009-11-05 246shost->dma_dev = dma_dev; ^1da177e4c3f41 Linus Torvalds2005-04-16 247 5c6fab9d558470 Mika Westerberg 2016-02-18 248/* 5c6fab9d558470 Mika Westerberg 2016-02-18 249 * Increase usage count temporarily here so that calling 5c6fab9d558470 Mika Westerberg 2016-02-18 250 * scsi_autopm_put_host() will trigger runtime idle if there is 5c6fab9d558470 Mika Westerberg 2016-02-18 251 * nothing else preventing suspending the device. 5c6fab9d558470 Mika Westerberg 2016-02-18 252 */ 5c6fab9d558470 Mika Westerberg 2016-02-18 253 pm_runtime_get_noresume(>shost_gendev); bc4f24014de58f Alan Stern2010-06-17 254 pm_runtime_set_active(>shost_gendev); bc4f24014de58f Alan Stern2010-06-17 255 pm_runtime_enable(>shost_gendev); bc4f24014de58f Alan Stern2010-06-17 256 device_enable_async_suspend(>shost_gendev); bc4f24014de58f Alan Stern
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/19 15:20, Lu Baolu wrote: The iommu_sva_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructure to support SVA domain in the iommu common layer. It includes: - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain type. - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that support SVA should provide the callbacks. - Add helpers to allocate and free an SVA domain. - Add helpers to set an SVA domain to a device and the reverse operation. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, the attach/detach interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. The iommu_set/block_device_pasid() can be used for other purposes, such as kernel DMA with pasid, mediation device, etc. Hence, it is put in the iommu.c. Suggested-by: Jean-Philippe Brucker Suggested-by: Jason Gunthorpe Signed-off-by: Lu Baolu --- include/linux/iommu.h | 51 + drivers/iommu/iommu-sva-lib.h | 15 drivers/iommu/iommu-sva-lib.c | 48 +++ drivers/iommu/iommu.c | 71 +++ 4 files changed, 185 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0c358b7c583b..e8cf82d46ce1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -64,6 +64,9 @@ struct iommu_domain_geometry { #define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */ #define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue*/ +#define __IOMMU_DOMAIN_SHARED (1U << 4) /* Page table shared from CPU */ +#define __IOMMU_DOMAIN_HOST_VA (1U << 5) /* Host CPU virtual address */ + /* * This are the possible domain-types * @@ -86,6 +89,8 @@ struct iommu_domain_geometry { #define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING |\ __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SHARED |\ +__IOMMU_DOMAIN_HOST_VA) struct iommu_domain { unsigned type; @@ -254,6 +259,7 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); const struct iommu_domain_ops *default_domain_ops; + const struct iommu_domain_ops *sva_domain_ops; Per Joerg's comment in anther thread, https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/ adding a sva_domain_ops here is not the right way to go. If no objection, I will make the sva domain go through the generic domain_alloc/free() callbacks in the next version. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()
On 2022/05/23 16:01, John Garry wrote: > On 21/05/2022 00:33, Damien Le Moal wrote: > > Hi Damien, > >>> +unsigned long iova_rcache_range(void) >> Why not a size_t return type ? > > The IOVA code generally uses unsigned long for size/range while > dam-iommu uses size_t as appropiate, so I'm just sticking to that. OK. > >> >>> +{ >>> + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); >>> +} >>> + > > Thanks, > John -- Damien Le Moal Western Digital Research ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown
On 2022-05-23 11:15, Mark-PK Tsai wrote: Sigh. In theory drivers should never declare coherent memory like this, and there has been some work to fix remoteproc in that regard. But I guess until that is merged we'll need somthing like this fix. Hi, Thanks for your comment. As I didn't see other fix of this issue, should we use this patch before the remoteproc work you mentioned is merged? TBH I think it would be better "fixed" with a kmemleak_ignore() and a big comment, rather than adding API cruft for something that isn't a real problem. I'm quite sure that no real-world user is unbinding remoteproc drivers frequently enough that leaking a 48-byte allocation each time has any practical significance. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits
On 23/05/2022 12:08, Dan Carpenter wrote: Thanks for the report 50b6cb3516365c Dexuan Cui2021-10-07 224/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ 50b6cb3516365c Dexuan Cui2021-10-07 225shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, ea2f0f77538c50 John Garry2021-05-19 226 shost->can_queue); ea2f0f77538c50 John Garry2021-05-19 227 2ad7ba6ca08593 John Garry2022-05-20 @228if (dma_dev->dma_mask) { ^ I knew that we fixed up dma_dev to be non-NULL, but I thought it was earlier in this function... The patch adds a new unchecked dereference 2ad7ba6ca08593 John Garry2022-05-20 229shost->max_sectors = min_t(unsigned int, shost->max_sectors, 2ad7ba6ca08593 John Garry2022-05-20 230 dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); 2ad7ba6ca08593 John Garry2022-05-20 231} 2ad7ba6ca08593 John Garry2022-05-20 232 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 233error = scsi_init_sense_cache(shost); 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 234if (error) 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 235goto fail; 0a6ac4ee7c2109 Christoph Hellwig 2017-01-03 236 d285203cf647d7 Christoph Hellwig 2014-01-17 237error = scsi_mq_setup_tags(shost); 542bd1377a9630 James Bottomley 2008-04-21 238if (error) 542bd1377a9630 James Bottomley 2008-04-21 239goto fail; d285203cf647d7 Christoph Hellwig 2014-01-17 240 ^1da177e4c3f41 Linus Torvalds2005-04-16 241if (!shost->shost_gendev.parent) ^1da177e4c3f41 Linus Torvalds2005-04-16 242 shost->shost_gendev.parent = dev ? dev : _bus; 3c8d9a957d0ae6 James Bottomley 2012-05-04 @243if (!dma_dev) Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On 5/23/2022 10:48 PM, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 ++ drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 4 ++ 3 files changed, 56 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7820711c4560..22e9a0085475 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -5,11 +5,19 @@ #include #include +#include #include #include #include "arm-smmu.h" +#define QCOM_DUMMY_VAL -1 + +/* Implementation Defined Register Space 0 registers */ +#define QCOM_SMMU_STATS_SYNC_INV_TBU_ACK 0x5dc +#define QCOM_SMMU_TBU_PWR_STATUS 0x2204 +#define QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR 0x2670 + struct qcom_smmu { struct arm_smmu_device smmu; bool bypass_quirk; @@ -22,6 +30,46 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) return container_of(smmu, struct qcom_smmu, smmu); } +static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) +{ + u32 sync_inv_ack, sync_inv_progress, tbu_pwr_status; + unsigned int spin_cnt, delay; + u32 reg; + int ret; + + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); + for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { + for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE)) + return; + cpu_relax(); + } + udelay(delay); + } + + sync_inv_ack = arm_smmu_readl(smmu, ARM_SMMU_IMPL_DEF0, + QCOM_SMMU_STATS_SYNC_INV_TBU_ACK); Sorry, this doesn't work always, looks like on earlier chipsets this is a secure register and reading it from non-secure world would probably blow. Also this register can be in other implementation defined space for different chipsets. So I think we can use SCM call here and have a device specific data based on already existing compatible for QCOM SoCs to identify IMP_DEF space used. + ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_TBU_PWR_STATUS, + _pwr_status); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU power status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR, + _inv_progress); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read SAFE WAIT counter: %d\n", ret); + + dev_err_ratelimited(smmu->dev, + "TLB sync timed out -- SMMU may be deadlocked\n" + "TBU: sync_inv_ack %#x power_status %#x sync_inv_progress %#x\n", + sync_inv_ack, tbu_pwr_status, sync_inv_progress); +} + static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx, u32 reg) { @@ -374,6 +422,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = { .def_domain_type = qcom_smmu_def_domain_type, .reset = qcom_smmu500_reset, .write_s2cr = qcom_smmu_write_s2cr, + .tlb_sync = qcom_smmu_tlb_sync, }; static const struct arm_smmu_impl qcom_adreno_smmu_impl = { @@ -382,6 +431,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = { .reset = qcom_smmu500_reset, .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank, .write_sctlr = qcom_adreno_smmu_write_sctlr, + .tlb_sync = qcom_smmu_tlb_sync, }; static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu, diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2ed3594f384e..4c5b51109835 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2099,6 +2099,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (IS_ERR(smmu->base)) return PTR_ERR(smmu->base); ioaddr = res->start; + smmu->ioaddr = ioaddr; + /*
Re: [PATCH 0/4] DMA mapping changes for SCSI core
Christoph, > The whole series looks fine to me. I'll happily queue it up in the > dma-mapping tree if the SCSI and ATA maintainers are ok with that. Works for me. Reviewed-by: Martin K. Petersen -- Martin K. Petersen Oracle Linux Engineering ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration
virtio \ > > > > > > -display none \ > > > > > > -kernel arch/s390/boot/bzImage \ > > > > > > -m 512m \ > > > > > > -nodefaults \ > > > > > > -serial mon:stdio > > > > > > ... > > > > > > [2.077303] In-situ OAM (IOAM) with IPv6 > > > > > > [2.077639] NET: Registered PF_PACKET protocol family > > > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer > > > > > > available by default. Update your scripts to load br_netfilter if > > > > > > you need this. > > > > > > [2.078795] Key type dns_resolver registered > > > > > > [2.079317] cio: Channel measurement facility initialized using > > > > > > format extended (mode autodetected) > > > > > > [2.081494] Discipline DIAG cannot be used without z/VM > > > > > > [ 260.626363] random: crng init done > > > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 > > > > > > (timeout) > > > > > > > > > > > > We have a simple rootfs available if necessary: > > > > > > > > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst > > > > > > > > > > > > If there is any other information I can provide, please let me know! > > > > > > > > > > Hmm... strange. Can you please try the following command line options > > > > > and tell me which of these has the issue and which don't? > > > > > > > > Sure thing! > > > > > > > > > 1) deferred_probe_timeout=0 > > > > > > > > No issue. > > > > > > > > > 2) deferred_probe_timeout=1 > > > > > 3) deferred_probe_timeout=300 > > > > > > > > Both of these appear to hang in the same way, I let each sit for five > > > > minutes. > > > > > > Strange that a sufficiently large timeout isn't helping. Is it trying > > > to boot off a network mount? I'll continue looking into this next > > > week. > > > > I don't think so, it seems like doing that requires some extra flags > > that we do not have: > > > > https://wiki.qemu.org/Features/S390xNetworkBoot > > > > If you need any additional information or want something tested, please > > let me know! > > I'll try to get qemu going on my end, but I'm not too confident I'll > be able to get to it in a timely fashion. So if you can help figure > out where this boot process is hanging, that'd be very much > appreciated. Sure thing! Information included below, I am more than happy to continue to test and debug as you need. > Couple of suggestions for debugging: > > Can you add a log to "wait_for_device_probe()" and see if that's > getting called right before the boot process hangs? If it does, can > you get a stacktrace (I just add a WARN_ON(1) when I need a stack > trace)? It's unlikely this is the case because > deferred_probe_timeout=1 still causes an issue for you, but I'd be > good to rule out. If I add a pr_info() call at the top of wait_for_device_probe(), I see it right before the process hangs. Adding WARN_ON(1) right below that reveals dasd_eckd_init() in drivers/s390/block/dasd_eckd.c calls wait_for_device_probe(): [4.610397] [ cut here ] [4.610520] WARNING: CPU: 0 PID: 1 at drivers/base/dd.c:742 wait_for_device_probe+0x28/0x110 [4.611134] Modules linked in: [4.611593] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-next-20220523-dirty #1 [4.611830] Hardware name: QEMU 8561 QEMU (KVM/Linux) [4.612017] Krnl PSW : 0704c0018000 00ce4b3c (wait_for_device_probe+0x2c/0x110) [4.612258]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [4.612387] Krnl GPRS: 8000f071 0027 000c 017f91d8 [4.612457]f071 017f9218 01a655a0 0006 [4.612521]0002 01965810 019d51a0 [4.612585]02218000 0125bcc8 00ce4b38 0380bc80 [4.614814] Krnl Code: 00ce4b2c: e3e0f0980024stg %r14,152(%r15) [4.614814]00ce4b32: c0e594cbbrasl %r14,00cd74c8 [4.614814] #00ce4b38: af00mc 0,0 [4.614814] >00ce4b3c: c0100054d1falarl %r1,0177ef30 [4.614814]00ce4b42: e3101012lt %r1,0(%r1) [4.614814]00ce4b48: a784002dbrc 8,00ce4ba2 [4.614814]00ce4b4c: d727f0a0f0a0xc 160(40,%r15),160(%r15) [4.614814]00ce4b52: 41b0f0a0la %r11,160(%r15) [4.615698] Call Trace: [4.616559] [<00ce4b3c>] wait_for_device_probe+0x2c/0x110 [4.616744] ([<00ce4b38>] wait_for_device_probe+0x28/0x110) [4.616841] [<0196593e>] dasd_eckd_init+0x12e/0x178 [4.616913] [<00100936>] do_one_initcall+0x46/0x1e8 [4.616983] [<01920706>] do_initcalls+0x126/0x150 [4.617046] [<0192095e>] kernel_init_freeable+0x1ae/0x1f0 [4.617110] [<00ce85a6>] kernel_init+0x2e/0x168 [4.617171] [<00103320>] __ret_from_fork+0x40/0x58 [4.617233] [<00cf5eaa>] ret_from_fork+0xa/0x40 [4.617352] Last Breaking-Event-Address: [4.617393] [<00e0e098>] __s390_indirect_jump_r14+0x0/0xc [4.617481] ---[ end trace ]--- > Let's try to rule out if deferred_probe_extend_timeout() is causing > some issues. So, without my patch, what happens if you set: > deferred_probe_timeout=1 > deferred_probe_timeout=300 At commit 6ee60e9c9f2f ("MAINTAINERS: add Russ Weight as a firmware loader maintainer"), both deferred_probe_timeout=1 and deferred_probe_timeout=300 hang the boot. > If deferred_probe_timeout=1 causes an issue even without my patch, > then in addition, can you try commenting out the call to > fw_devlink_drivers_done() inside deferred_probe_timeout_work_func() > and try again? Sure, that does not appear to make a difference with deferred_probe_timeout=1. Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration
> > timeout on driver registration") in next-20220520 (bisect log > > > > > > > below). > > > > > > > > > > > > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- > > > > > > > defconfig bzImage > > > > > > > > > > > > > > $ timeout --foreground 15m stdbuf -oL -eL \ > > > > > > > qemu-system-s390x \ > > > > > > > -initrd ... \ > > > > > > > -M s390-ccw-virtio \ > > > > > > > -display none \ > > > > > > > -kernel arch/s390/boot/bzImage \ > > > > > > > -m 512m \ > > > > > > > -nodefaults \ > > > > > > > -serial mon:stdio > > > > > > > ... > > > > > > > [2.077303] In-situ OAM (IOAM) with IPv6 > > > > > > > [2.077639] NET: Registered PF_PACKET protocol family > > > > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no > > > > > > > longer available by default. Update your scripts to load > > > > > > > br_netfilter if you need this. > > > > > > > [2.078795] Key type dns_resolver registered > > > > > > > [2.079317] cio: Channel measurement facility initialized > > > > > > > using format extended (mode autodetected) > > > > > > > [2.081494] Discipline DIAG cannot be used without z/VM > > > > > > > [ 260.626363] random: crng init done > > > > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 > > > > > > > (timeout) > > > > > > > > > > > > > > We have a simple rootfs available if necessary: > > > > > > > > > > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst > > > > > > > > > > > > > > If there is any other information I can provide, please let me > > > > > > > know! > > > > > > > > > > > > Hmm... strange. Can you please try the following command line > > > > > > options > > > > > > and tell me which of these has the issue and which don't? > > > > > > > > > > Sure thing! > > > > > > > > > > > 1) deferred_probe_timeout=0 > > > > > > > > > > No issue. > > > > > > > > > > > 2) deferred_probe_timeout=1 > > > > > > 3) deferred_probe_timeout=300 > > > > > > > > > > Both of these appear to hang in the same way, I let each sit for five > > > > > minutes. > > > > > > > > Strange that a sufficiently large timeout isn't helping. Is it trying > > > > to boot off a network mount? I'll continue looking into this next > > > > week. > > > > > > I don't think so, it seems like doing that requires some extra flags > > > that we do not have: > > > > > > https://wiki.qemu.org/Features/S390xNetworkBoot > > > > > > If you need any additional information or want something tested, please > > > let me know! > > > > I'll try to get qemu going on my end, but I'm not too confident I'll > > be able to get to it in a timely fashion. So if you can help figure > > out where this boot process is hanging, that'd be very much > > appreciated. > > Sure thing! Information included below, I am more than happy to continue > to test and debug as you need. > > > Couple of suggestions for debugging: > > > > Can you add a log to "wait_for_device_probe()" and see if that's > > getting called right before the boot process hangs? If it does, can > > you get a stacktrace (I just add a WARN_ON(1) when I need a stack > > trace)? It's unlikely this is the case because > > deferred_probe_timeout=1 still causes an issue for you, but I'd be > > good to rule out. > > If I add a pr_info() call at the top of wait_for_device_probe(), I see > it right before the process hangs. Adding WARN_ON(1) right below that > reveals dasd_eckd_init() in drivers/s390/block/dasd_eckd.c calls > wait_for_device_probe(): > > [4.610397] [ cut here ] > [4.610520] WARNING: CPU: 0 PID: 1 at drivers/base/dd.c:742 > wait_for_device_probe+0x28/0x110 > [4.611134] Modules linked in: > [4.611593] CPU: