Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
On Sat, 2 Oct 2021 22:48:24 +0530 Ajay Garg wrote: > Thanks Lu for the reply. > > > > > Isn't the domain should be switched from a default domain to an > > unmanaged domain when the device is assigned to the guest? > > > > Even you want to r-setup the same mappings, you need to un-map all > > existing mappings, right? > > > > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities > need to take. > May be the patch could suppress the flooding till then? No, this is wrong. The pte values should not exist, it doesn't matter that they're the same. Is the host driver failing to remove mappings and somehow they persist in the new vfio owned domain? There's definitely a bug beyond logging going on here. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
04.10.2021 22:23, Thierry Reding пишет: > On Sun, Oct 03, 2021 at 04:09:56AM +0300, Dmitry Osipenko wrote: >> 23.04.2021 19:32, Thierry Reding пишет: >>> I've made corresponding changes in the proprietary bootloader, added a >>> compatibility shim in U-Boot (which forwards information created by the >>> proprietary bootloader to the kernel) and the attached patches to test >>> this on Jetson TX1, Jetson TX2 and Jetson AGX Xavier. >> >> Could you please tell what downstream kernel does for coping with the >> identity mappings in conjunction with the original proprietary bootloader? >> >> If there is some other method of passing mappings to kernel, could it be >> supported by upstream? Putting burden on users to upgrade bootloader >> feels a bit inconvenient. > > It depends on the chip generation. As far as I know there have been > several iterations. The earliest was to pass this information via a > command-line option, but more recent versions use device tree to pass > this information in a similar way as described here. However, these > use non-standard DT bindings, so I don't think we can just implement > them as-is. Is it possible to boot upstream kernel with that original bootloader? I remember seeing other platforms, like QCOM, supporting downstream quirks in upstream kernel on a side, i.e. they are undocumented, but the additional support code is there. That is what "normal" people want. You should consider doing that for Tegra too, if possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/5] iommu: Support identity mappings of reserved-memory regions
On Sun, Oct 03, 2021 at 04:09:56AM +0300, Dmitry Osipenko wrote: > 23.04.2021 19:32, Thierry Reding пишет: > > I've made corresponding changes in the proprietary bootloader, added a > > compatibility shim in U-Boot (which forwards information created by the > > proprietary bootloader to the kernel) and the attached patches to test > > this on Jetson TX1, Jetson TX2 and Jetson AGX Xavier. > > Could you please tell what downstream kernel does for coping with the > identity mappings in conjunction with the original proprietary bootloader? > > If there is some other method of passing mappings to kernel, could it be > supported by upstream? Putting burden on users to upgrade bootloader > feels a bit inconvenient. It depends on the chip generation. As far as I know there have been several iterations. The earliest was to pass this information via a command-line option, but more recent versions use device tree to pass this information in a similar way as described here. However, these use non-standard DT bindings, so I don't think we can just implement them as-is. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote: > Hi Barry, > > On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > > > > I assume KVA mode can avoid this iotlb flush as the device is using > > > > the page table of the kernel and sharing the whole kernel space. But > > > > will users be glad to accept this mode? > > > > > > You can avoid the lock be identity mapping the physical address space > > > of the kernel and maping map/unmap a NOP. > > > > > > KVA is just a different way to achive this identity map with slightly > > > different security properties than the normal way, but it doesn't > > > reach to the same security level as proper map/unmap. > > > > > > I'm not sure anyone who cares about DMA security would see value in > > > the slight difference between KVA and a normal identity map. > > > > yes. This is an important question. if users want a high security level, > > kva might not their choice; if users don't want the security, they are > > using iommu passthrough. So when will users choose KVA? > Right, KVAs sit in the middle in terms of performance and security. > Performance is better than IOVA due to IOTLB flush as you mentioned. Also > not too far behind of pass-through. The IOTLB flush is not on a DMA path but on a vmap path, so it is very hard to compare the two things.. Maybe vmap can be made to do lazy IOTLB flush or something and it could be closer > Security-wise, KVA respects kernel mapping. So permissions are better > enforced than pass-through and identity mapping. Is this meaningful? Isn't the entire physical map still in the KVA and isn't it entirely RW ? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Barry, On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > I assume KVA mode can avoid this iotlb flush as the device is using > > > the page table of the kernel and sharing the whole kernel space. But > > > will users be glad to accept this mode? > > > > You can avoid the lock be identity mapping the physical address space > > of the kernel and maping map/unmap a NOP. > > > > KVA is just a different way to achive this identity map with slightly > > different security properties than the normal way, but it doesn't > > reach to the same security level as proper map/unmap. > > > > I'm not sure anyone who cares about DMA security would see value in > > the slight difference between KVA and a normal identity map. > > yes. This is an important question. if users want a high security level, > kva might not their choice; if users don't want the security, they are > using iommu passthrough. So when will users choose KVA? Right, KVAs sit in the middle in terms of performance and security. Performance is better than IOVA due to IOTLB flush as you mentioned. Also not too far behind of pass-through. Security-wise, KVA respects kernel mapping. So permissions are better enforced than pass-through and identity mapping. To balance performance and security, we are proposing KVA is only supported on trusted devices. On an Intel platform, it would be based on ACPI SATC (SoC Integrated Address Translation Cache (SATC) reporting structure, VT-d spec. 8.2). I am also adding a kernel iommu parameter to allow user override. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
Am 04.10.21 um 15:27 schrieb Jason Gunthorpe: On Mon, Oct 04, 2021 at 03:22:22PM +0200, Christian König wrote: That use case is completely unrelated to GUP and when this doesn't work we have quite a problem. My read is that unmap_mapping_range() guarentees the physical TLB hardware is serialized across all CPUs upon return. Thanks, that's what I wanted to make sure. Christian. It also guarentees GUP slow is serialized due to the page table spinlocks. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu: Some IOVA code reorganisation
On 2021-10-04 12:44, Will Deacon wrote: On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote: The IOVA domain structure is a bit overloaded, holding: - IOVA tree management - FQ control - IOVA rcache memories Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c uses the FQ feature. This series separates out that structure. In addition, it moves the FQ code into dma-iommu.c . This is not strictly necessary, but it does make it easier for the FQ domain lookup the rcache domain. The rcache code stays where it is, as it may be reworked in future, so there is not much point in relocating and then discarding. This topic was initially discussed and suggested (I think) by Robin here: https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/ It would be useful to have Robin's Ack on patches 2-4. The implementation looks straightforward to me, but the thread above isn't very clear about what is being suggested. FWIW I actually got about half-way through writing my own equivalent of patches 2-3, except tackling it from the other direction - simplifying the FQ code *before* moving whatever was left to iommu-dma, then I got side-tracked trying to make io-pgtable use that freelist properly, and then I've been on holiday the last 2 weeks. I've got other things to catch up on first but I'll try to get to this later this week. To play devil's advocate: there aren't many direct users of the iovad code: either they'll die out entirely (and everybody will use the dma-iommu code) and it's fine having the flush queue code where it is, or we'll get more users and the likelihood of somebody else wanting flush queues increases. I think the FQ code is mostly just here as a historical artefact, since the IOVA allocator was the only thing common to the Intel and AMD DMA ops when the common FQ implementation was factored out of those, so although it's essentially orthogonal it was still related enough that it was an easy place to stick it. Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu: Some IOVA code reorganisation
On 04/10/2021 12:44, Will Deacon wrote: On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote: The IOVA domain structure is a bit overloaded, holding: - IOVA tree management - FQ control - IOVA rcache memories Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c uses the FQ feature. This series separates out that structure. In addition, it moves the FQ code into dma-iommu.c . This is not strictly necessary, but it does make it easier for the FQ domain lookup the rcache domain. The rcache code stays where it is, as it may be reworked in future, so there is not much point in relocating and then discarding. This topic was initially discussed and suggested (I think) by Robin here: https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/ Hi Will, It would be useful to have Robin's Ack on patches 2-4. The implementation looks straightforward to me, but the thread above isn't very clear about what is being suggested. Sure, I intentionally didn't add names to patches so avoid possible incorrect attribution. To play devil's advocate: there aren't many direct users of the iovad code: either they'll die out entirely (and everybody will use the dma-iommu code) and it's fine having the flush queue code where it is, or we'll get more users and the likelihood of somebody else wanting flush queues increases. I make it 5x direct users (including vdpa). Anyway, as I mentioned, I'm not totally determined to relocate the FQ code. It's just that dma-iommu is the only user today and co-locating makes the iova rcache domain info lookup easier from the FQ code. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] iommu/iova: Avoid double-negatives in magazine helpers
On 04/10/2021 12:38, Will Deacon wrote: Hi Will, To avoid this, stop using double-negatives, like !iova_magazine_full() and !iova_magazine_empty(), and use positive tests, like iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these can safely deal with cpu_rcache->{loaded, prev} = NULL. I don't understand why you're saying that things like !iova_magazine_empty() are double-negatives. What about e.g. !list_empty() elsewhre in the kernel? IMO, a check for an empty magazine is a negative check, as opposed to a check for availability. But I'm not saying that patterns like !list_empty() are a bad practice. I'm just saying that they are not NULL safe, and that matters in this case as we can potentially pass a NULL pointer. The crux of the fix seems to be: @@ -783,8 +787,9 @@ static bool __iova_rcache_insert(struct iova_caching_domain *rcached, if (new_mag) { spin_lock(&rcache->lock); if (rcache->depot_size < MAX_GLOBAL_MAGS) { - rcache->depot[rcache->depot_size++] = - cpu_rcache->loaded; + if (cpu_rcache->loaded) + rcache->depot[rcache->depot_size++] = + cpu_rcache->loaded; Which could be independent of the renaming? If cpu_rcache->loaded was NULL, then we crash before we reach this code. Anyway, since I earlier reworked init_iova_rcaches() to properly handle failed mem allocations for rcache->cpu_rcaches, I can rework further to fail the init for failed mem allocations for cpu_rcaches->loaded, so we don't need this change. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
On Mon, Oct 04, 2021 at 03:22:22PM +0200, Christian König wrote: > That use case is completely unrelated to GUP and when this doesn't work we > have quite a problem. My read is that unmap_mapping_range() guarentees the physical TLB hardware is serialized across all CPUs upon return. It also guarentees GUP slow is serialized due to the page table spinlocks. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
Am 04.10.21 um 15:11 schrieb Jason Gunthorpe: On Mon, Oct 04, 2021 at 08:58:35AM +0200, Christian König wrote: I'm not following this discussion to closely, but try to look into it from time to time. Am 01.10.21 um 19:45 schrieb Jason Gunthorpe: On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote: In device-dax, the refcount is only used to prevent the device, and therefore the pages, from going away on device unbind. Pages cannot be recycled, as you say, as they are mapped linearly within the device. The address space invalidation is done only when the device is unbound. By address space invalidation I mean invalidation of the VMA that is pointing to those pages. device-dax may not have a issue with use-after-VMA-invalidation by it's very nature since every PFN always points to the same thing. fsdax and this p2p stuff are different though. Before the invalidation, an active flag is cleared to ensure no new mappings can be created while the unmap is proceeding. unmap_mapping_range() should sequence itself with the TLB flush and AFIAK unmap_mapping_range() kicks off the TLB flush and then returns. It doesn't always wait for the flush to fully finish. Ie some cases use RCU to lock the page table against GUP fast and so the put_page() doesn't happen until the call_rcu completes - after a grace period. The unmap_mapping_range() does not wait for grace periods. Wow, wait a second. That is quite a boomer. At least in all GEM/TTM based graphics drivers that could potentially cause a lot of trouble. I've just double checked and we certainly have the assumption that when unmap_mapping_range() returns the pte is gone and the TLB flush completed in quite a number of places. Do you have more information when and why that can happen? There are two things to keep in mind, flushing the PTEs from the HW and serializing against gup_fast. If you start at unmap_mapping_range() the page is eventually discovered in zap_pte_range() and the PTE cleared. It is then passed into __tlb_remove_page() which puts it on the batch->pages list The page free happens in tlb_batch_pages_flush() via free_pages_and_swap_cache() The tlb_batch_pages_flush() happens via zap_page_range() -> tlb_finish_mmu(), presumably after the HW has wiped the TLB's on all CPUs. On x86 this is done with an IPI and also serializes gup fast, so OK The interesting case is CONFIG_MMU_GATHER_RCU_TABLE_FREE which doesn't rely on IPIs anymore to synchronize with gup-fast. In this configuration it means when unmap_mapping_range() returns the TLB will have been flushed, but no serialization with GUP fast was done. This is OK if the GUP fast cannot return the page at all. I assume this generally describes the DRM caes? Yes, exactly that. GUP is completely forbidden for such mappings. But what about accesses by other CPUs? In other words our use case is like the following: 1. We found that we need exclusive access to the higher level object a page belongs to. 2. The lock of the higher level object is taken. The lock is also taken in the fault handler for the VMA which inserts the PTE in the first place. 3. unmap_mapping_range() for the range of the object is called, the expectation is that when that function returns only the kernel can have a mapping of the pages backing the object. 4. The kernel has exclusive access to the pages and we know that userspace can't mess with them any more. That use case is completely unrelated to GUP and when this doesn't work we have quite a problem. I should probably note that we recently switched from VM_MIXEDMAP to using VM_PFNMAP because the former didn't prevented GUP on all architectures. Christian. However, if the GUP fast can return the page then something, somewhere, needs to serialize the page free with the RCU as the GUP fast can be observing the old PTE before it was zap'd until the RCU grace expires. Relying on the page ref being !0 to protect GUP fast is not safe because the page ref can be incr'd immediately upon page re-use. Interestingly I looked around for this on PPC and I only found RCU delayed freeing of the page table level, not RCU delayed freeing of pages themselves.. I wonder if it was missed? There is a path on PPC (tlb_remove_table_sync_one) that triggers an IPI but it looks like an exception, and we wouldn't need the RCU at all if we used IPI to serialize GUP fast... It makes logical sense if the RCU also frees the pages on CONFIG_MMU_GATHER_RCU_TABLE_FREE so anything returnable by GUP fast must be refcounted and freed by tlb_batch_pages_flush(), not by the caller of unmap_mapping_range(). If we expect to allow the caller of unmap_mapping_range() to free then CONFIG_MMU_GATHER_RCU_TABLE_FREE can't really exist, we always need to trigger a serializing IPI during tlb_batch_pages_flush() AFAICT, at least Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailm
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
On Mon, Oct 04, 2021 at 08:58:35AM +0200, Christian König wrote: > I'm not following this discussion to closely, but try to look into it from > time to time. > > Am 01.10.21 um 19:45 schrieb Jason Gunthorpe: > > On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote: > > > > > In device-dax, the refcount is only used to prevent the device, and > > > therefore the pages, from going away on device unbind. Pages cannot be > > > recycled, as you say, as they are mapped linearly within the device. The > > > address space invalidation is done only when the device is unbound. > > By address space invalidation I mean invalidation of the VMA that is > > pointing to those pages. > > > > device-dax may not have a issue with use-after-VMA-invalidation by > > it's very nature since every PFN always points to the same > > thing. fsdax and this p2p stuff are different though. > > > > > Before the invalidation, an active flag is cleared to ensure no new > > > mappings can be created while the unmap is proceeding. > > > unmap_mapping_range() should sequence itself with the TLB flush and > > AFIAK unmap_mapping_range() kicks off the TLB flush and then > > returns. It doesn't always wait for the flush to fully finish. Ie some > > cases use RCU to lock the page table against GUP fast and so the > > put_page() doesn't happen until the call_rcu completes - after a grace > > period. The unmap_mapping_range() does not wait for grace periods. > > Wow, wait a second. That is quite a boomer. At least in all GEM/TTM based > graphics drivers that could potentially cause a lot of trouble. > > I've just double checked and we certainly have the assumption that when > unmap_mapping_range() returns the pte is gone and the TLB flush completed in > quite a number of places. > > Do you have more information when and why that can happen? There are two things to keep in mind, flushing the PTEs from the HW and serializing against gup_fast. If you start at unmap_mapping_range() the page is eventually discovered in zap_pte_range() and the PTE cleared. It is then passed into __tlb_remove_page() which puts it on the batch->pages list The page free happens in tlb_batch_pages_flush() via free_pages_and_swap_cache() The tlb_batch_pages_flush() happens via zap_page_range() -> tlb_finish_mmu(), presumably after the HW has wiped the TLB's on all CPUs. On x86 this is done with an IPI and also serializes gup fast, so OK The interesting case is CONFIG_MMU_GATHER_RCU_TABLE_FREE which doesn't rely on IPIs anymore to synchronize with gup-fast. In this configuration it means when unmap_mapping_range() returns the TLB will have been flushed, but no serialization with GUP fast was done. This is OK if the GUP fast cannot return the page at all. I assume this generally describes the DRM caes? However, if the GUP fast can return the page then something, somewhere, needs to serialize the page free with the RCU as the GUP fast can be observing the old PTE before it was zap'd until the RCU grace expires. Relying on the page ref being !0 to protect GUP fast is not safe because the page ref can be incr'd immediately upon page re-use. Interestingly I looked around for this on PPC and I only found RCU delayed freeing of the page table level, not RCU delayed freeing of pages themselves.. I wonder if it was missed? There is a path on PPC (tlb_remove_table_sync_one) that triggers an IPI but it looks like an exception, and we wouldn't need the RCU at all if we used IPI to serialize GUP fast... It makes logical sense if the RCU also frees the pages on CONFIG_MMU_GATHER_RCU_TABLE_FREE so anything returnable by GUP fast must be refcounted and freed by tlb_batch_pages_flush(), not by the caller of unmap_mapping_range(). If we expect to allow the caller of unmap_mapping_range() to free then CONFIG_MMU_GATHER_RCU_TABLE_FREE can't really exist, we always need to trigger a serializing IPI during tlb_batch_pages_flush() AFAICT, at least Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()
On Wed, Aug 18, 2021 at 04:04:52PM +0800, Zhen Lei wrote: > Although the parameter 'cmd' is always passed by a local array variable, > and only this function modifies it, the compiler does not know this. The > compiler almost always reads the value of cmd[i] from memory rather than > directly using the value cached in the register. This generates many > useless instruction operations and affects the performance to some extent. > > To guide the compiler for proper optimization, 'cmd' is defined as a local > array variable, marked as register, and copied to the output parameter at > a time when the function is returned. > > The optimization effect can be viewed by running the "size arm-smmu-v3.o" > command. > > Before: >textdata bss dec hex > 269541348 56 283586ec6 > > After: >textdata bss dec hex > 267621348 56 281666e06 > > Signed-off-by: Zhen Lei > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 01e95b56ffa07d1..7cec0c967f71d86 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -234,10 +234,12 @@ static int queue_remove_raw(struct arm_smmu_queue *q, > u64 *ent) > } > > /* High-level queue accessors */ > -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > +static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent > *ent) > { > - memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT); > - cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode); > + register u64 cmd[CMDQ_ENT_DWORDS]; 'register' is pretty badly specified outside of a handful of limited uses in conjunction with inline asm, so I really don't think we should be using it here. > + cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); This generates a compiler warning about taking the address of a 'register' variable. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-v3: Stop pre-zeroing batch commands in arm_smmu_atc_inv_master()
On Tue, 17 Aug 2021 19:34:11 +0800, Zhen Lei wrote: > Pre-zeroing the batched commands structure is inefficient, as individual > commands are zeroed later in arm_smmu_cmdq_build_cmd(). Therefore, only > the member 'num' needs to be initialized to 0. > > Applied to will (for-joerg/arm-smmu/updates), thanks! [1/1] iommu/arm-smmu-v3: Stop pre-zeroing batch commands in arm_smmu_atc_inv_master() https://git.kernel.org/will/c/93f9f7958f12 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/2] iommu/arm-smmu-v3: Perform some simple optimizations for arm_smmu_cmdq_build_cmd()
On Wed, 18 Aug 2021 16:04:50 +0800, Zhen Lei wrote: > v1 --> v2: > 1. Add patch 1, Properly handle the return value of arm_smmu_cmdq_build_cmd() > 2. Remove arm_smmu_cmdq_copy_cmd(). In addition, when build command fails, > out_cmd is not filled. > > > Zhen Lei (2): > iommu/arm-smmu-v3: Properly handle the return value of > arm_smmu_cmdq_build_cmd() > iommu/arm-smmu-v3: Simplify useless instructions in > arm_smmu_cmdq_build_cmd() > > [...] Applied first patch only to will (for-joerg/arm-smmu/updates), thanks! [1/2] iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd() https://git.kernel.org/will/c/59d9bd727495 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dt-bindings: arm-smmu: Add compatible for SM6350 SoC
On Fri, 20 Aug 2021 22:29:04 +0200, Konrad Dybcio wrote: > Add the SoC specific compatible for SM6350 implementing > arm,mmu-500. > > Applied to will (for-joerg/arm-smmu/updates), thanks! [1/2] dt-bindings: arm-smmu: Add compatible for SM6350 SoC https://git.kernel.org/will/c/e4a40f15b031 [2/2] iommu/arm-smmu-qcom: Add SM6350 SMMU compatible https://git.kernel.org/will/c/bc53c8b8b087 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu: arm-smmu-qcom: Add compatible for qcm2290
On Fri, 1 Oct 2021 16:00:31 +0200, Loic Poulain wrote: > Applied to will (for-joerg/arm-smmu/updates), thanks! [1/2] iommu: arm-smmu-qcom: Add compatible for qcm2290 https://git.kernel.org/will/c/756a622c8f06 [2/2] dt-bindings: arm-smmu: Add qcm2290 compatible strings https://git.kernel.org/will/c/f1edce3db543 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu: Some IOVA code reorganisation
On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote: > The IOVA domain structure is a bit overloaded, holding: > - IOVA tree management > - FQ control > - IOVA rcache memories > > Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c > uses the FQ feature. > > This series separates out that structure. In addition, it moves the FQ > code into dma-iommu.c . This is not strictly necessary, but it does make > it easier for the FQ domain lookup the rcache domain. > > The rcache code stays where it is, as it may be reworked in future, so > there is not much point in relocating and then discarding. > > This topic was initially discussed and suggested (I think) by Robin here: > https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/ It would be useful to have Robin's Ack on patches 2-4. The implementation looks straightforward to me, but the thread above isn't very clear about what is being suggested. To play devil's advocate: there aren't many direct users of the iovad code: either they'll die out entirely (and everybody will use the dma-iommu code) and it's fine having the flush queue code where it is, or we'll get more users and the likelihood of somebody else wanting flush queues increases. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] iommu/iova: Avoid double-negatives in magazine helpers
On Fri, Sep 24, 2021 at 06:01:57PM +0800, John Garry wrote: > A similar crash to the following could be observed if initial CPU rcache > magazine allocations fail in init_iova_rcaches(): > > Unable to handle kernel NULL pointer dereference at virtual address > > Mem abort info: > > free_iova_fast+0xfc/0x280 > iommu_dma_free_iova+0x64/0x70 > __iommu_dma_unmap+0x9c/0xf8 > iommu_dma_unmap_sg+0xa8/0xc8 > dma_unmap_sg_attrs+0x28/0x50 > cq_thread_v3_hw+0x2dc/0x528 > irq_thread_fn+0x2c/0xa0 > irq_thread+0x130/0x1e0 > kthread+0x154/0x158 > ret_from_fork+0x10/0x34 > > The issue is that expression !iova_magazine_full(NULL) evaluates true; this > falls over in __iova_rcache_insert() when we attempt to cache a mag and > cpu_rcache->loaded == NULL: > > if (!iova_magazine_full(cpu_rcache->loaded)) { > can_insert = true; > ... > > if (can_insert) > iova_magazine_push(cpu_rcache->loaded, iova_pfn); > > As above, can_insert is evaluated true, which it shouldn't be, and we try > to insert pfns in a NULL mag, which is not safe. > > To avoid this, stop using double-negatives, like !iova_magazine_full() and > !iova_magazine_empty(), and use positive tests, like > iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these > can safely deal with cpu_rcache->{loaded, prev} = NULL. I don't understand why you're saying that things like !iova_magazine_empty() are double-negatives. What about e.g. !list_empty() elsewhre in the kernel? The crux of the fix seems to be: > @@ -783,8 +787,9 @@ static bool __iova_rcache_insert(struct > iova_caching_domain *rcached, > if (new_mag) { > spin_lock(&rcache->lock); > if (rcache->depot_size < MAX_GLOBAL_MAGS) { > - rcache->depot[rcache->depot_size++] = > - cpu_rcache->loaded; > + if (cpu_rcache->loaded) > + rcache->depot[rcache->depot_size++] = > + cpu_rcache->loaded; Which could be independent of the renaming? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iova: Move fast alloc size roundup into alloc_iova_fast()
On Fri, Sep 24, 2021 at 06:01:53PM +0800, John Garry wrote: > It really is a property of the IOVA rcache code that we need to alloc a > power-of-2 size, so relocate the functionality to resize into > alloc_iova_fast(), rather than the callsites. > > Signed-off-by: John Garry > --- > drivers/iommu/dma-iommu.c| 8 > drivers/iommu/iova.c | 9 + > drivers/vdpa/vdpa_user/iova_domain.c | 8 > 3 files changed, 9 insertions(+), 16 deletions(-) Acked-by: Will Deacon Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/amd: Recover from event log overflow
The AMD IOMMU logs I/O page faults and such to a ring buffer in system memory, and this ring buffer can overflow. The AMD IOMMU spec has the following to say about the interrupt status bit that signals this overflow condition: EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU event log overflow has occurred. This bit is set when a new event is to be written to the event log and there is no usable entry in the event log, causing the new event information to be discarded. An interrupt is generated when EventOverflow = 1b and MMIO Offset 0018h[EventIntEn] = 1b. No new event log entries are written while this bit is set. Software Note: To resume logging, clear EventOverflow (W1C), and write a 1 to MMIO Offset 0018h[EventLogEn]. The AMD IOMMU driver doesn't currently implement this recovery sequence, meaning that if a ring buffer overflow occurs, logging of EVT/PPR/GA events will cease entirely. This patch implements the spec-mandated reset sequence, with the minor tweak that the hardware seems to want to have a 0 written to MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this field, or the IOMMU won't actually resume logging events. Signed-off-by: Lennert Buytenhek --- Tested on v5.15-rc4 on a Ryzen 3700X. Changes for v2: - Rate limit event log overflow messages. drivers/iommu/amd/amd_iommu.h | 1 + drivers/iommu/amd/amd_iommu_types.h | 1 + drivers/iommu/amd/init.c| 10 ++ drivers/iommu/amd/iommu.c | 10 -- 4 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h index 416815a525d6..bb95edf74415 100644 --- a/drivers/iommu/amd/amd_iommu.h +++ b/drivers/iommu/amd/amd_iommu.h @@ -14,6 +14,7 @@ extern irqreturn_t amd_iommu_int_thread(int irq, void *data); extern irqreturn_t amd_iommu_int_handler(int irq, void *data); extern void amd_iommu_apply_erratum_63(u16 devid); +extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu); extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu); extern int amd_iommu_init_devices(void); extern void amd_iommu_uninit_devices(void); diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 8dbe61e2b3c1..095076029f8d 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -110,6 +110,7 @@ #define PASID_MASK 0x /* MMIO status bits */ +#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK (1 << 0) #define MMIO_STATUS_EVT_INT_MASK (1 << 1) #define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2) #define MMIO_STATUS_PPR_INT_MASK (1 << 6) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 2a822b229bd0..f1c5eb023bda 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -654,6 +654,16 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu) return iommu->cmd_buf ? 0 : -ENOMEM; } +/* + * This function restarts event logging in case the IOMMU experienced + * an event log buffer overflow. + */ +void amd_iommu_restart_event_logging(struct amd_iommu *iommu) +{ + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); + iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN); +} + /* * This function resets the command buffer if the IOMMU stopped fetching * commands from it. diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 1722bb161841..f46eb7397021 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -742,7 +742,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { } #endif /* !CONFIG_IRQ_REMAP */ #define AMD_IOMMU_INT_MASK \ - (MMIO_STATUS_EVT_INT_MASK | \ + (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \ +MMIO_STATUS_EVT_INT_MASK | \ MMIO_STATUS_PPR_INT_MASK | \ MMIO_STATUS_GALOG_INT_MASK) @@ -752,7 +753,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); while (status & AMD_IOMMU_INT_MASK) { - /* Enable EVT and PPR and GA interrupts again */ + /* Enable interrupt sources again */ writel(AMD_IOMMU_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET); @@ -773,6 +774,11 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data) } #endif + if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) { + pr_info_ratelimited("IOMMU event log overflow\n"); + amd_iommu_restart_event_logging(iommu); + } + /* * Hardware bug: ERBT1312 * When re-enabling interrupt (by writing 1 -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mail