Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation
Hi Robin, Will, On 2021-07-12 09:39, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-23 19:12, Sai Prakash Ranjan wrote: Currently for iommu_unmap() of large scatter-gather list with page size elements, the majority of time is spent in flushing of partial walks in __arm_lpae_unmap() which is a VA based TLB invalidation invalidating page-by-page on iommus like arm-smmu-v2 (TLBIVA). For example: to unmap a 32MB scatter-gather list with page size elements (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge overhead. On qcom implementation, there are several performance improvements for TLB cache invalidations in HW like wait-for-safe (for realtime clients such as camera and display) and few others to allow for cache lookups/updates when TLBI is in progress for the same context bank. So the cost of over-invalidation is less compared to the unmap latency on several usecases like camera which deals with large buffers. So, ASID based TLB invalidations (TLBIASID) can be used to invalidate the entire context for partial walk flush thereby improving the unmap latency. Non-strict mode can use this by default for all platforms given its all about over-invalidation saving time on individual unmaps and non-deterministic generally. For this example of 32MB scatter-gather list unmap, this change results in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 TLBIVAs thereby increasing the performance of unmaps drastically. Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap: (average over 10 iterations) Before this optimization: sizeiommu_map_sg iommu_unmap 4K2.067 us 1.854 us 64K9.598 us 8.802 us 1M 148.890 us 130.718 us 2M 305.864 us67.291 us 12M 1793.604 us 390.838 us 16M 2386.848 us 518.187 us 24M 3563.296 us 775.989 us 32M 4747.171 us 1033.364 us After this optimization: sizeiommu_map_sg iommu_unmap 4K1.723 us 1.765 us 64K9.880 us 8.869 us 1M 155.364 us 135.223 us 2M 303.906 us 5.385 us 12M 1786.557 us21.250 us 16M 2391.890 us27.437 us 24M 3570.895 us39.937 us 32M 4755.234 us51.797 us This is further reduced once the map/unmap_pages() support gets in which will result in just 1 TLBIASID as compared to 16 TLBIASIDs. Real world data also shows big difference in unmap performance as below: There were reports of camera frame drops because of high overhead in iommu unmap without this optimization because of frequent unmaps issued by camera of about 100MB/s taking more than 100ms thereby causing frame drops. Signed-off-by: Sai Prakash Ranjan --- Changes in v3: * Move the logic to arm-smmu driver from io-pgtable (Robin) * Use a new set of iommu_flush_ops->arm_smmu_s1_tlb_impl_ops and use it for qcom impl Changes in v2: * Add a quirk to choose tlb_flush_all in partial walk flush * Set the quirk for QTI SoC implementation --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 + drivers/iommu/arm/arm-smmu/arm-smmu.c | 17 - 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7771d40176de..218c71465819 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -10,6 +10,8 @@ #include "arm-smmu.h" +extern const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops; + struct qcom_smmu { struct arm_smmu_device smmu; bool bypass_quirk; @@ -146,6 +148,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, { struct adreno_smmu_priv *priv; + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; + /* Only enable split pagetables for the GPU device (SID 0) */ if (!qcom_adreno_smmu_is_gpu_device(dev)) return 0; @@ -185,6 +189,14 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { } }; +static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) +{ + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; + + return 0; +} + static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu) { unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 1); @@ -308,6 +320,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu) } static const struct arm_smmu_impl qcom_smmu_impl = { + .init_context = qcom_smmu_init_conte
RE: [RFC v2] /dev/iommu uAPI proposal
> From: Shenming Lu > Sent: Friday, July 16, 2021 8:20 PM > > On 2021/7/16 9:20, Tian, Kevin wrote: > > To summarize, for vIOMMU we can work with the spec owner to > > define a proper interface to feedback such restriction into the guest > > if necessary. For the kernel part, it's clear that IOMMU fd should > > disallow two devices attached to a single [RID] or [RID, PASID] slot > > in the first place. > > > > Then the next question is how to communicate such restriction > > to the userspace. It sounds like a group, but different in concept. > > An iommu group describes the minimal isolation boundary thus all > > devices in the group can be only assigned to a single user. But this > > case is opposite - the two mdevs (both support ENQCMD submission) > > with the same parent have problem when assigned to a single VM > > (in this case vPASID is vm-wide translated thus a same pPASID will be > > used cross both mdevs) while they instead work pretty well when > > assigned to different VMs (completely different vPASID spaces thus > > different pPASIDs). > > > > One thought is to have vfio device driver deal with it. In this proposal > > it is the vfio device driver to define the PASID virtualization policy and > > report it to userspace via VFIO_DEVICE_GET_INFO. The driver understands > > the restriction thus could just hide the vPASID capability when the user > > calls GET_INFO on the 2nd mdev in above scenario. In this way the > > user even doesn't need to know such restriction at all and both mdevs > > can be assigned to a single VM w/o any problem. > > > > The restriction only probably happens when two mdevs are assigned to one > VM, > how could the vfio device driver get to know this info to accurately hide > the vPASID capability for the 2nd mdev when VFIO_DEVICE_GET_INFO? > There is no > need to do this in other cases. > I suppose the driver can detect it via whether two mdevs are opened by a single process. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v2] /dev/iommu uAPI proposal
> From: Jason Gunthorpe > Sent: Saturday, July 17, 2021 2:30 AM > > On Fri, Jul 16, 2021 at 01:20:15AM +, Tian, Kevin wrote: > > > One thought is to have vfio device driver deal with it. In this proposal > > it is the vfio device driver to define the PASID virtualization policy and > > report it to userspace via VFIO_DEVICE_GET_INFO. The driver understands > > the restriction thus could just hide the vPASID capability when the user > > calls GET_INFO on the 2nd mdev in above scenario. In this way the > > user even doesn't need to know such restriction at all and both mdevs > > can be assigned to a single VM w/o any problem. > > I think it makes more sense to expose some kind of "pasid group" to > qemu that identifies that each PASID must be unique across the > group. For vIOMMUs that are doing funky things with the RID This means > a single PASID group must not be exposed as two RIDs to the guest. > It's an interesting idea. Some aspects are still unclear to me now e.g. how to describe such restriction in a way that it's applied only to a single user owning the group (not the case when assigned to different users), whether it can be generalized cross subsystems (vPASID being a vfio-managed resource), etc. Let's refine it when working on the actual implementation. > If the kernel blocks it then it can never be fixed by updating the > vIOMMU design. > But the mdev driver can choose to do so. Should we prevent it? btw just be curious whether you have got a chance to have a full review on v2. I wonder when might be a good time to discuss the execution plan following this proposal, if no major open remains... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/5] iommu/vt-d: Disallow SVA if devices don't support 64-bit address
Hi Robin, Thanks a lot for reviewing my patch! On 7/20/21 5:27 PM, Robin Murphy wrote: On 2021-07-20 02:38, Lu Baolu wrote: When the device and CPU share an address space (such as SVA), the device must support the same addressing capability as the CPU. The CPU does not consider the addressing ability of any device when managing the page table of a process, so the device must have enough addressing ability to bind the page table of the process. Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f45c80ce2381..f3cca1dd384d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5372,6 +5372,9 @@ static int intel_iommu_enable_sva(struct device *dev) if (!(iommu->flags & VTD_FLAG_SVM_CAPABLE)) return -ENODEV; + if (!dev->dma_mask || *dev->dma_mask != DMA_BIT_MASK(64)) Careful - VFIO doesn't set DMA masks (since it doesn't use the DMA API), SVA doesn't work through the VFIO framework. so this appears to be relying on another driver having bound previously, Yes. You are right. otherwise the mask would still be the default 32-bit one from pci_setup_device(). I'm not sure that's an entirely robust assumption. Currently SVA implementation always requires a native kernel driver. The assumption is that the drivers should check and set 64-bit addressing capability before calling iommu_sva_xxx() APIs. Robin. + return -ENODEV; + if (intel_iommu_enable_pasid(iommu, dev)) return -ENODEV; Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node
On Mon, Jul 19, 2021 at 05:14:28PM +0100, John Garry wrote: > On 09/07/2021 15:24, Ming Lei wrote: > > > associated compromises. > > Follows the log of 'perf report' > > > > 1) good(run fio from cpus in the nvme's numa node) > > Hi Ming, > > If you're still interested in this issue, as an experiment only you can try > my rebased patches here: > > https://github.com/hisilicon/kernel-dev/commits/private-topic-smmu-5.14-cmdq-4 > > I think that you should see a significant performance boost. There is build issue, please check your tree: MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.btf ld: Unexpected GOT/PLT entries detected! ld: Unexpected run-time procedure linkages detected! ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.o: in function `smmu_test_store': /root/git/linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3892: undefined reference to `smmu_test_core' BTF .btf.vmlinux.bin.o pahole: .tmp_vmlinux.btf: No such file or directory LD .tmp_vmlinux.kallsyms1 .btf.vmlinux.bin.o: file not recognized: file format not recognized make: *** [Makefile:1177: vmlinux] Error 1 Thanks, Ming ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next] iommu/arm-smmu-v3: Add suspend and resume support
Add suspend and resume support for arm-smmu-v3 by low-power mode. When the smmu is suspended, it is powered off and the registers are cleared. So saves the msi_msg context during msi interrupt initialization of smmu. When resume happens it calls arm_smmu_device_reset() to restore the registers. Signed-off-by: Bixuan Cui Reviewed-by: Wei Yongjun Reviewed-by: Zhen Lei Reviewed-by: Ding Tianhong Reviewed-by: Hanjun Guo --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 72 ++--- 1 file changed, 64 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 235f9bdaeaf2..bf1163acbcb1 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -40,6 +40,7 @@ MODULE_PARM_DESC(disable_bypass, static bool disable_msipolling; module_param(disable_msipolling, bool, 0444); +static bool bypass; MODULE_PARM_DESC(disable_msipolling, "Disable MSI-based polling for CMD_SYNC completion."); @@ -3129,11 +3130,37 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo; doorbell &= MSI_CFG0_ADDR_MASK; + /* Saves the msg context for resume if desc->msg is empty */ + if (desc->msg.address_lo == 0 && desc->msg.address_hi == 0) { + desc->msg.address_lo = msg->address_lo; + desc->msg.address_hi = msg->address_hi; + desc->msg.data = msg->data; + } + writeq_relaxed(doorbell, smmu->base + cfg[0]); writel_relaxed(msg->data, smmu->base + cfg[1]); writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]); } +static void arm_smmu_resume_msis(struct arm_smmu_device *smmu) +{ + struct msi_desc *desc; + struct device *dev = smmu->dev; + + for_each_msi_entry(desc, dev) { + switch (desc->platform.msi_index) { + case EVTQ_MSI_INDEX: + case GERROR_MSI_INDEX: + case PRIQ_MSI_INDEX: + arm_smmu_write_msi_msg(desc, &(desc->msg)); + break; + default: + continue; + + } + } +} + static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { struct msi_desc *desc; @@ -3184,11 +3211,17 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) devm_add_action(dev, arm_smmu_free_msis, dev); } -static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) +static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu, bool resume_mode) { int irq, ret; - arm_smmu_setup_msis(smmu); + if (!resume_mode) + arm_smmu_setup_msis(smmu); + else { + /* The irq doesn't need to be re-requested during resume */ + arm_smmu_resume_msis(smmu); + return; + } /* Request interrupt lines */ irq = smmu->evtq.q.irq; @@ -3230,7 +3263,7 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) } } -static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) +static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu, bool resume_mode) { int ret, irq; u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN; @@ -3257,7 +3290,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) if (ret < 0) dev_warn(smmu->dev, "failed to enable combined irq\n"); } else - arm_smmu_setup_unique_irqs(smmu); + arm_smmu_setup_unique_irqs(smmu, resume_mode); if (smmu->features & ARM_SMMU_FEAT_PRI) irqen_flags |= IRQ_CTRL_PRIQ_IRQEN; @@ -3282,7 +3315,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu) return ret; } -static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) +static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool resume_mode) { int ret; u32 reg, enables; @@ -3392,7 +3425,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) } } - ret = arm_smmu_setup_irqs(smmu); + ret = arm_smmu_setup_irqs(smmu, resume_mode); if (ret) { dev_err(smmu->dev, "failed to setup irqs\n"); return ret; @@ -3749,6 +3782,24 @@ static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start, return devm_ioremap_resource(dev, &res); } +static int __maybe_unused arm_smmu_suspend(struct device *dev) +{ + /* +* The smmu is powered off and related registers are automatically +* cleared when suspend. No need to do anything. +*/ + return 0; +} + +static int __maybe_unused arm_smmu_resume(struct device *dev) +{ + struct arm_smmu_de
Re: [PATCH] iommu/amd: Fix I/O page fault logging ratelimit test
Hi Lennert, On 7/18/2021 7:47 PM, Lennert Buytenhek wrote: On an AMD system, I/O page faults are usually logged like this: diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..7ae426b092f2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (dev_data && __ratelimit(&dev_data->rs)) { pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", domain_id, address, flags); - } else if (printk_ratelimit()) { + } else if (!dev_data && printk_ratelimit()) { This seems a bit confusing. Also, according to the following comment in include/linux/printk.h: /* * Please don't use printk_ratelimit(), because it shares ratelimiting state * with all other unrelated printk_ratelimit() callsites. Instead use * printk_ratelimited() or plain old __ratelimit(). */ We probably should move away from using printk_ratelimit() here. What about the following change instead? diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 811a49a95d04..8eb5d3519743 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -480,11 +480,12 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, if (pdev) dev_data = dev_iommu_priv_get(&pdev->dev); - if (dev_data && __ratelimit(&dev_data->rs)) { - pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", - domain_id, address, flags); - } else if (printk_ratelimit()) { - pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", + if (dev_data) { + if (__ratelimit(&dev_data->rs)) + pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n", + domain_id, address, flags); + } else { + pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), domain_id, address, flags); } Note also that there might be other places in this file that would need similar modification as well. Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: More dropping redundant minItems/maxItems
Quoting Rob Herring (2021-07-13 12:34:53) > Another round of removing redundant minItems/maxItems from new schema in > the recent merge window. > > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > > Cc: Stephen Boyd > Cc: Joerg Roedel > Cc: Will Deacon > Cc: Krzysztof Kozlowski > Cc: Miquel Raynal > Cc: Richard Weinberger > Cc: Vignesh Raghavendra > Cc: Alessandro Zummo > Cc: Alexandre Belloni > Cc: Greg Kroah-Hartman > Cc: Sureshkumar Relli > Cc: Brian Norris > Cc: Kamal Dasu > Cc: Linus Walleij > Cc: Sebastian Siewior > Cc: Laurent Pinchart > Cc: linux-...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: linux-...@lists.infradead.org > Cc: linux-...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Signed-off-by: Rob Herring > --- Reviewed-by: Stephen Boyd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax
On Tue, 13 Jul 2021 at 10:27, Krzysztof Kozlowski wrote: > > On Mon, 12 Jul 2021 at 16:14, Rob Herring wrote: > > > > On Tue, Jun 22, 2021 at 11:56 PM Krzysztof Kozlowski > > wrote: > > > > > > On Mon, 21 Jun 2021 16:00:36 +0200, Thierry Reding wrote: > > > > Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible > > > > string") introduced a jsonschema syntax error as a result of a rebase > > > > gone wrong. Fix it. > > > > > > Applied, thanks! > > > > > > [1/1] dt-bindings: arm-smmu: Fix json-schema syntax > > > commit: bf3ec9deaa33889630722c47f7bb86ba58872ea7 > > > > Applied where? Now Linus's master is broken. > > To memory controller drivers tree. Pushed to soc folks some time ago: > https://lore.kernel.org/lkml/20210625073604.13562-1-krzysztof.kozlow...@canonical.com/ Hi Rob, The patch landed in the Linus' tree in v5.14-rc2. Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V4 09/13] x86/Swiotlb/HV: Add Swiotlb bounce buffer remap function for HV IVM
Please split the swiotlb changes into a separate patch from the consumer. > } > + > +/* > + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. > + */ > +unsigned long hv_map_memory(unsigned long addr, unsigned long size) > +{ > + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, > + sizeof(unsigned long), > +GFP_KERNEL); > + unsigned long vaddr; > + int i; > + > + if (!pfns) > + return (unsigned long)NULL; > + > + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) > + pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) + > + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); > + > + vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, > + PAGE_KERNEL_IO); > + kfree(pfns); > + > + return vaddr; This seems to miss a 'select VMAP_PFN'. But more importantly I don't think this actually works. Various DMA APIs do expect a struct page backing, so how is this going to work with say dma_mmap_attrs or dma_get_sgtable_attrs? > +static unsigned long __map_memory(unsigned long addr, unsigned long size) > +{ > + if (hv_is_isolation_supported()) > + return hv_map_memory(addr, size); > + > + return addr; > +} > + > +static void __unmap_memory(unsigned long addr) > +{ > + if (hv_is_isolation_supported()) > + hv_unmap_memory(addr); > +} > + > +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long > size) > +{ > + if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false)) > + return (unsigned long)NULL; > + > + return __map_memory(addr, size); > +} > + > +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size) > +{ > + __unmap_memory(addr); > + return __set_memory_enc_dec(addr, size / PAGE_SIZE, true); > +} Why this obsfucation into all kinds of strange helpers? Also I think we want an ops vectors (or alternative calls) instead of the random if checks here. > + * @vstart: The virtual start address of the swiotlb memory pool. The > swiotlb > + * memory pool may be remapped in the memory encrypted case and > store Normall we'd call this vaddr or cpu_addr. > - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); > - memset(vaddr, 0, bytes); > + mem->vstart = (void *)set_memory_decrypted_map((unsigned long)vaddr, > bytes); Please always pass kernel virtual addresses as pointers. And I think these APIs might need better names, e.g. arch_dma_map_decrypted and arch_dma_unmap_decrypted. Also these will need fallback versions for non-x86 architectures that currently use memory encryption. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/4] swiotlb: Free tbl memory in swiotlb_exit()
Although swiotlb_exit() frees the 'slots' metadata array referenced by 'io_tlb_default_mem', it leaves the underlying buffer pages allocated despite no longer being usable. Extend swiotlb_exit() to free the buffer pages as well as the slots array. Cc: Claire Chang Cc: Christoph Hellwig Cc: Robin Murphy Cc: Konrad Rzeszutek Wilk Tested-by: Nathan Chancellor Tested-by: Claire Chang Reviewed-by: Christoph Hellwig Signed-off-by: Will Deacon --- kernel/dma/swiotlb.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b3c793ed9e64..87c40517e822 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -328,18 +328,27 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) void __init swiotlb_exit(void) { - size_t size; struct io_tlb_mem *mem = &io_tlb_default_mem; + unsigned long tbl_vaddr; + size_t tbl_size, slots_size; if (!mem->nslabs) return; pr_info("tearing down default memory pool\n"); - size = array_size(sizeof(*mem->slots), mem->nslabs); - if (mem->late_alloc) - free_pages((unsigned long)mem->slots, get_order(size)); - else - memblock_free_late(__pa(mem->slots), PAGE_ALIGN(size)); + tbl_vaddr = (unsigned long)phys_to_virt(mem->start); + tbl_size = PAGE_ALIGN(mem->end - mem->start); + slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs)); + + set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT); + if (mem->late_alloc) { + free_pages(tbl_vaddr, get_order(tbl_size)); + free_pages((unsigned long)mem->slots, get_order(slots_size)); + } else { + memblock_free_late(mem->start, tbl_size); + memblock_free_late(__pa(mem->slots), slots_size); + } + memset(mem, 0, sizeof(*mem)); } -- 2.32.0.402.g57bb445576-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/4] swiotlb: Emit diagnostic in swiotlb_exit()
A recent debugging session would have been made a little bit easier if we had noticed sooner that swiotlb_exit() was being called during boot. Add a simple diagnostic message to swiotlb_exit() to complement the one from swiotlb_print_info() during initialisation. Cc: Claire Chang Cc: Christoph Hellwig Cc: Robin Murphy Link: https://lore.kernel.org/r/20210705190352.GA19461@willie-the-truck Suggested-by: Konrad Rzeszutek Wilk Tested-by: Nathan Chancellor Tested-by: Claire Chang Reviewed-by: Christoph Hellwig Signed-off-by: Will Deacon --- kernel/dma/swiotlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 7948f274f9bb..b3c793ed9e64 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -334,6 +334,7 @@ void __init swiotlb_exit(void) if (!mem->nslabs) return; + pr_info("tearing down default memory pool\n"); size = array_size(sizeof(*mem->slots), mem->nslabs); if (mem->late_alloc) free_pages((unsigned long)mem->slots, get_order(size)); -- 2.32.0.402.g57bb445576-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] swiotlb: Convert io_default_tlb_mem to static allocation
Since commit 69031f500865 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used"), 'struct device' may hold a copy of the global 'io_default_tlb_mem' pointer if the device is using swiotlb for DMA. A subsequent call to swiotlb_exit() will therefore leave dangling pointers behind in these device structures, resulting in KASAN splats such as: | BUG: KASAN: use-after-free in __iommu_dma_unmap_swiotlb+0x64/0xb0 | Read of size 8 at addr 8881d783 by task swapper/0/0 | | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc3-debug #1 | Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.12 12/17/2020 | Call Trace: | | dump_stack+0x9c/0xcf | print_address_description.constprop.0+0x18/0x130 | kasan_report.cold+0x7f/0x111 | __iommu_dma_unmap_swiotlb+0x64/0xb0 | nvme_pci_complete_rq+0x73/0x130 | blk_complete_reqs+0x6f/0x80 | __do_softirq+0xfc/0x3be Convert 'io_default_tlb_mem' to a static structure, so that the per-device pointers remain valid after swiotlb_exit() has been invoked. All users are updated to reference the static structure directly, using the 'nslabs' field to determine whether swiotlb has been initialised. The 'slots' array is still allocated dynamically and referenced via a pointer rather than a flexible array member. Cc: Claire Chang Cc: Christoph Hellwig Cc: Robin Murphy Cc: Konrad Rzeszutek Wilk Fixes: 69031f500865 ("swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used") Reported-by: Nathan Chancellor Tested-by: Nathan Chancellor Tested-by: Claire Chang Reviewed-by: Christoph Hellwig Signed-off-by: Will Deacon --- drivers/base/core.c | 2 +- drivers/xen/swiotlb-xen.c | 4 +-- include/linux/swiotlb.h | 4 +-- kernel/dma/swiotlb.c | 66 +-- 4 files changed, 41 insertions(+), 35 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index ea5b85354526..b49824001cfa 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2848,7 +2848,7 @@ void device_initialize(struct device *dev) dev->dma_coherent = dma_default_coherent; #endif #ifdef CONFIG_SWIOTLB - dev->dma_io_tlb_mem = io_tlb_default_mem; + dev->dma_io_tlb_mem = &io_tlb_default_mem; #endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 785ec7e8be01..f06d9b4f1e0f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -164,7 +164,7 @@ int __ref xen_swiotlb_init(void) int rc = -ENOMEM; char *start; - if (io_tlb_default_mem != NULL) { + if (io_tlb_default_mem.nslabs) { pr_warn("swiotlb buffer already initialized\n"); return -EEXIST; } @@ -547,7 +547,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl, static int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { - return xen_phys_to_dma(hwdev, io_tlb_default_mem->end - 1) <= mask; + return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask; } const struct dma_map_ops xen_swiotlb_dma_ops = { diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 39284ff2a6cd..b0cb2a9973f4 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -103,9 +103,9 @@ struct io_tlb_mem { phys_addr_t orig_addr; size_t alloc_size; unsigned int list; - } slots[]; + } *slots; }; -extern struct io_tlb_mem *io_tlb_default_mem; +extern struct io_tlb_mem io_tlb_default_mem; static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1a9ae7fad8f..7948f274f9bb 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -70,7 +70,7 @@ enum swiotlb_force swiotlb_force; -struct io_tlb_mem *io_tlb_default_mem; +struct io_tlb_mem io_tlb_default_mem; /* * Max segment that we can provide which (if pages are contingous) will @@ -101,7 +101,7 @@ early_param("swiotlb", setup_io_tlb_npages); unsigned int swiotlb_max_segment(void) { - return io_tlb_default_mem ? max_segment : 0; + return io_tlb_default_mem.nslabs ? max_segment : 0; } EXPORT_SYMBOL_GPL(swiotlb_max_segment); @@ -134,9 +134,9 @@ void __init swiotlb_adjust_size(unsigned long size) void swiotlb_print_info(void) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = &io_tlb_default_mem; - if (!mem) { + if (!mem->nslabs) { pr_warn("No low mem\n"); return; } @@ -163,11 +163,11 @@ static inline unsigned long nr_slots(u64 val) */ void __init swiotlb_update_mem_attributes(void) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = &io_tlb_default_mem; void *vaddr; unsigned long bytes; - if (!mem || mem->late_alloc) + if (!mem->nslabs || mem
[PATCH v2 1/4] of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
When CONFIG_OF_ADDRESS=n, of_dma_set_restricted_buffer() returns -ENODEV and breaks the boot for sparc[64] machines. Return 0 instead, since the function is essentially a glorified NOP in this configuration. Cc: Claire Chang Cc: Konrad Rzeszutek Wilk Reported-by: Guenter Roeck Suggested-by: Robin Murphy Tested-by: Guenter Roeck Tested-by: Claire Chang Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20210702030807.ga2685...@roeck-us.net Fixes: fec9b625095f ("of: Add plumbing for restricted DMA pool") Signed-off-by: Will Deacon --- drivers/of/of_private.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index 376462798f7e..f557bd22b0cf 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -173,7 +173,8 @@ static inline int of_dma_get_range(struct device_node *np, static inline int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np) { - return -ENODEV; + /* Do nothing, successfully. */ + return 0; } #endif -- 2.32.0.402.g57bb445576-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/4] Fix restricted DMA vs swiotlb_exit()
Hi again, folks, This is version two of the patch series I posted yesterday: https://lore.kernel.org/r/20210719123054.6844-1-w...@kernel.org The only changes since v1 are: * Squash patches 2 and 3, amending the commit message accordingly * Add Reviewed-by and Tested-by tags from Christoph and Claire (thanks!) I'd usually leave it a bit longer between postings, but since this fixes issues with patches in -next I thought I'd spin a new version immediately. Cheers, Will Cc: Guenter Roeck Cc: Claire Chang Cc: Christoph Hellwig Cc: Robin Murphy Cc: Konrad Rzeszutek Wilk Cc: Nathan Chancellor --->8 Will Deacon (4): of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS swiotlb: Convert io_default_tlb_mem to static allocation swiotlb: Emit diagnostic in swiotlb_exit() swiotlb: Free tbl memory in swiotlb_exit() drivers/base/core.c | 2 +- drivers/of/of_private.h | 3 +- drivers/xen/swiotlb-xen.c | 4 +- include/linux/swiotlb.h | 4 +- kernel/dma/swiotlb.c | 82 +++ 5 files changed, 56 insertions(+), 39 deletions(-) -- 2.32.0.402.g57bb445576-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V4 09/13] x86/Swiotlb/HV: Add Swiotlb bounce buffer remap function for HV IVM
Hi Christoph & Konrad: Could you review this patch and make sure this is right way to resolve the memory remap request from AMD SEV-SNP vTOM case? Thanks. On 7/7/2021 11:46 PM, Tianyu Lan wrote: From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Introduce set_memory_decrypted_map() function to decrypt memory and remap memory with platform callback. Use set_memory_decrypted_ map() in the swiotlb code, store remap address returned by the new API and use the remap address to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- arch/x86/hyperv/ivm.c | 30 ++ arch/x86/include/asm/mshyperv.h | 2 ++ arch/x86/include/asm/set_memory.h | 2 ++ arch/x86/mm/pat/set_memory.c | 28 include/linux/swiotlb.h | 4 kernel/dma/swiotlb.c | 11 --- 6 files changed, 74 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 8a6f4e9e3d6c..ea33935e0c17 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -267,3 +267,33 @@ int hv_set_mem_enc(unsigned long addr, int numpages, bool enc) enc ? VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE); } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +unsigned long hv_map_memory(unsigned long addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE, + sizeof(unsigned long), + GFP_KERNEL); + unsigned long vaddr; + int i; + + if (!pfns) + return (unsigned long)NULL; + + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++) + pfns[i] = virt_to_hvpfn((void *)addr + i * HV_HYP_PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT); + + vaddr = (unsigned long)vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, + PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(unsigned long addr) +{ + vunmap((void *)addr); +} diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index fe03e3e833ac..ba3cb9e32fdb 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -253,6 +253,8 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility); int hv_set_mem_enc(unsigned long addr, int numpages, bool enc); +unsigned long hv_map_memory(unsigned long addr, unsigned long size); +void hv_unmap_memory(unsigned long addr); void hv_sint_wrmsrl_ghcb(u64 msr, u64 value); void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value); void hv_signal_eom_ghcb(void); diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 43fa081a1adb..7a2117931830 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -49,6 +49,8 @@ int set_memory_decrypted(unsigned long addr, int numpages); int set_memory_np_noalias(unsigned long addr, int numpages); int set_memory_nonglobal(unsigned long addr, int numpages); int set_memory_global(unsigned long addr, int numpages); +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size); +int set_memory_encrypted_unmap(unsigned long addr, unsigned long size); int set_pages_array_uc(struct page **pages, int addrinarray); int set_pages_array_wc(struct page **pages, int addrinarray); diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 6cc83c57383d..5d4d3963f4a2 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2039,6 +2039,34 @@ int set_memory_decrypted(unsigned long addr, int numpages) } EXPORT_SYMBOL_GPL(set_memory_decrypted); +static unsigned long __map_memory(unsigned long addr, unsigned long size) +{ + if (hv_is_isolation_supported()) + return hv_map_memory(addr, size); + + return addr; +} + +static void __unmap_memory(unsigned long addr) +{ + if (hv_is_isolation_supported()) + hv_unmap_memory(addr); +} + +unsigned long set_memory_decrypted_map(unsigned long addr, unsigned long size) +{ + if (__set_memory_enc_dec(addr, size / PAGE_SIZE, false)) + return (unsigned long)NULL;
Re: [PATCH 4/5] iommu/vt-d: Disallow SVA if devices don't support 64-bit address
On 2021-07-20 02:38, Lu Baolu wrote: When the device and CPU share an address space (such as SVA), the device must support the same addressing capability as the CPU. The CPU does not consider the addressing ability of any device when managing the page table of a process, so the device must have enough addressing ability to bind the page table of the process. Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f45c80ce2381..f3cca1dd384d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5372,6 +5372,9 @@ static int intel_iommu_enable_sva(struct device *dev) if (!(iommu->flags & VTD_FLAG_SVM_CAPABLE)) return -ENODEV; + if (!dev->dma_mask || *dev->dma_mask != DMA_BIT_MASK(64)) Careful - VFIO doesn't set DMA masks (since it doesn't use the DMA API), so this appears to be relying on another driver having bound previously, otherwise the mask would still be the default 32-bit one from pci_setup_device(). I'm not sure that's an entirely robust assumption. Robin. + return -ENODEV; + if (intel_iommu_enable_pasid(iommu, dev)) return -ENODEV; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] swiotlb: Point io_default_tlb_mem at static allocation
On Tue, Jul 20, 2021 at 09:51:12AM +0200, Christoph Hellwig wrote: > I'd prefer if the next patch is merged into this one, to avoid the > confusing state inbetween entirely. Of course. For some reason, I half thought this patch would need to go to stable, but the restricted DMA stuff didn't land so there's no backporting to worry about. I'll merge 'em for v2. Thanks for having a look! Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] swiotlb: Free tbl memory in swiotlb_exit()
On Tue, Jul 20, 2021 at 11:36:19AM +0800, Claire Chang wrote: > On Mon, Jul 19, 2021 at 8:31 PM Will Deacon wrote: > > > > Although swiotlb_exit() frees the 'slots' metadata array referenced by > > 'io_tlb_default_mem', it leaves the underlying buffer pages allocated > > despite no longer being usable. > > > > Extend swiotlb_exit() to free the buffer pages as well as the slots > > array. > > > > Cc: Claire Chang > > Cc: Christoph Hellwig > > Cc: Robin Murphy > > Cc: Konrad Rzeszutek Wilk > > Tested-by: Nathan Chancellor > > Tested-by: Claire Chang Thanks, Claire! Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] Fix restricted DMA vs swiotlb_exit()
Looks fine except for the patch split nitpick mentioned: Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] swiotlb: Point io_default_tlb_mem at static allocation
I'd prefer if the next patch is merged into this one, to avoid the confusing state inbetween entirely. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu