Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags
Hi Jean-Philippe, On Fri, 9 Apr 2021 12:22:21 +0200, Jean-Philippe Brucker wrote: > On Thu, Apr 08, 2021 at 10:08:55AM -0700, Jacob Pan wrote: > > The void* drvdata parameter isn't really used in iommu_sva_bind_device() > > API, > > Right, it used to be a cookie passed to the device driver in the exit_mm() > callback, but that went away with edcc40d2ab5f ("iommu: Remove > iommu_sva_ops::mm_exit()") > > > the current IDXD code "borrows" the drvdata for a VT-d private flag > > for supervisor SVA usage. > > > > Supervisor/Privileged mode request is a generic feature. It should be > > promoted from the VT-d vendor driver to the generic code. > > > > This patch replaces void* drvdata with a unsigned int flags parameter > > and adjusts callers accordingly. > > Thanks for cleaning this up. Making flags unsigned long seems more common > (I suggested int without thinking). But it doesn't matter much, we won't > get to 32 flags. > I was just thinking unsigned int is 32 bit for both 32 and 64 bit machine. > > > > Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/ > > Suggested-by: Jean-Philippe Brucker > > Signed-off-by: Jacob Pan > > --- > > drivers/dma/idxd/cdev.c | 2 +- > > drivers/dma/idxd/init.c | 6 +++--- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++-- > > drivers/iommu/intel/Kconfig | 1 + > > drivers/iommu/intel/svm.c | 18 ++ > > drivers/iommu/iommu.c | 9 ++--- > > drivers/misc/uacce/uacce.c | 2 +- > > include/linux/intel-iommu.h | 2 +- > > include/linux/intel-svm.h | 17 ++--- > > include/linux/iommu.h | 19 > > --- 11 files changed, 40 insertions(+), 42 deletions(-) > > > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > > index 0db9b82..21ec82b 100644 > > --- a/drivers/dma/idxd/cdev.c > > +++ b/drivers/dma/idxd/cdev.c > > @@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, > > struct file *filp) filp->private_data = ctx; > > > > if (device_pasid_enabled(idxd)) { > > - sva = iommu_sva_bind_device(dev, current->mm, NULL); > > + sva = iommu_sva_bind_device(dev, current->mm, 0); > > if (IS_ERR(sva)) { > > rc = PTR_ERR(sva); > > dev_err(dev, "pasid allocation failed: %d\n", > > rc); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c > > index 085a0c3..cdc85f1 100644 > > --- a/drivers/dma/idxd/init.c > > +++ b/drivers/dma/idxd/init.c > > @@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct > > pci_dev *pdev) > > static int idxd_enable_system_pasid(struct idxd_device *idxd) > > { > > - int flags; > > + unsigned int flags; > > unsigned int pasid; > > struct iommu_sva *sva; > > > > - flags = SVM_FLAG_SUPERVISOR_MODE; > > + flags = IOMMU_SVA_BIND_SUPERVISOR; > > > > - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags); > > + sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags); > > if (IS_ERR(sva)) { > > dev_warn(&idxd->pdev->dev, > > "iommu sva bind failed: %ld\n", PTR_ERR(sva)); > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index > > bb251ca..23e287e 100644 --- > > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,7 +354,7 @@ > > __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) } > > > > struct iommu_sva * > > -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void > > *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, > > unsigned int flags) > > Could you add a check on flags: > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index > bb251cab61f3..145ceb2fc5da 100644 --- > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,12 +354,15 @@ > __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) } > > struct iommu_sva * > -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void > *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, > unsigned int flags) { > struct iommu_sva *handle; > struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > + if (flags) > + return ERR_PTR(-EINVAL); > + yes, will do. > if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) > return ERR_PTR(-EINVAL); > > > > > { > > struct iommu_sva *handle; > > struct iommu_domain *domain = io
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 09/04/2021 09:58, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Thanks for explaining the 'context', Suravee. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || - (val != val2)) - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, I'll test your revert + update IOMMU patch on my Ryzen 2400G and 4700U most likely over the weekend. Very interesting! Please be aware that your original IOMMU patch has already reached the imminent release of Ubuntu 21.04 (Hirsute). I've taken the liberty of adding Alex Hung (lead kernel developer at Ubuntu) to the circulation list. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation
On 4/9/2021 12:32 PM, Konrad Rzeszutek Wilk wrote: > On Thu, Apr 08, 2021 at 08:13:07PM -0700, Florian Fainelli wrote: >> >> >> On 3/24/2021 1:42 AM, Christoph Hellwig wrote: >>> On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote: When SWIOTLB_NO_FORCE is used, there should really be no allocations of default_nslabs to occur since we are not going to use those slabs. If a platform was somehow setting swiotlb_no_force and a later call to swiotlb_init() was to be made we would still be proceeding with allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce was set on the kernel command line we would have only allocated 2KB. This would be inconsistent and the point of initializing default_nslabs to 1, was intended to allocate the minimum amount of memory possible, so simply remove that minimal allocation period. Signed-off-by: Florian Fainelli >>> >>> Looks good, >>> >>> Reviewed-by: Christoph Hellwig >>> >> >> Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch >> if you are also happy with it? > > It should be now visible? Not seeing it here: https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/log/?h=devel/for-linus-5.13 -- Florian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:00 PM, Shuah Khan wrote: On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Tested-by: Shuah Khan Revert + this patch - same as my test on Ryzen 5 On AMD Ryzen 7 4700G with Radeon Graphics These look real odd to me. Let me know if I should look further. sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10 Performance counter stats for 'system wide': 17,761,952,514,865,374 amd_iommu_0/cmd_processed/ (33.28%) 18,582,155,570,607,472 amd_iommu_0/cmd_processed_inv/ (33.32%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%) 5,056,087,645,262,255 amd_iommu_0/int_dte_hit/ (33.40%) 32,831,106,446,308,888 amd_iommu_0/int_dte_mis/ (33.44%) 13,461,819,655,591,296 amd_iommu_0/mem_dte_hit/ (33.45%) 208,555,436,221,050,464 amd_iommu_0/mem_dte_mis/ (33.47%) 196,824,154,635,609,888 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.46%) 193,552,630,440,410,144 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.45%) 176,936,647,809,098,368 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.41%) 184,737,401,623,626,464 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.37%) 0 amd_iommu_0/mem_pass_excl/ (33.33%) 0 amd_iommu_0/mem_pass_pretrans/ (33.30%) 0 amd_iommu_0/mem_pass_untrans/ (33.28%) 0 amd_iommu_0/mem_target_abort/ (33.27%) 245,383,212,924,004,288 amd_iommu_0/mem_trans_total/ (33.27%) 0 amd_iommu_0/page_tbl_read_gst/ (33.28%) 262,267,045,917,967,264 amd_iommu_0/page_tbl_read_nst/ (33.27%) 256,308,216,913,137,600 amd_iommu_0/page_tbl_read_tot/ (33.28%) 0 amd_iommu_0/smi_blk/ (33.27%) 0 amd_iommu_0/smi_recv/ (33.27%) 0 amd_iommu_0/tlb_inv/ (33.27%) 0 amd_iommu_0/vapic_int_guest/ (33.26%) 38,913,544,420,579,888 amd_iommu_0/vapic_int_non_guest/ (33.27%) 10.003967760 seconds time elapsed thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Tested-by: Shuah Khan thanks, -- Shuah Results with this patch on AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics sudo ./perf stat -e 'amd_iommu_0/cmd_processed/, amd_iommu_0/cmd_processed_inv/, amd_iommu_0/ign_rd_wr_mmio_1ff8h/, amd_iommu_0/int_dte_hit/, amd_iommu_0/int_dte_mis/, amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/, amd_iommu_0/mem_iommu_tlb_pte_mis/, amd_iommu_0/mem_pass_excl/, amd_iommu_0/mem_pass_pretrans/, amd_iommu_0/mem_pass_untrans/, amd_iommu_0/mem_target_abort/, amd_iommu_0/mem_trans_total/, amd_iommu_0/page_tbl_read_gst/, amd_iommu_0/page_tbl_read_nst/, amd_iommu_0/page_tbl_read_tot/, amd_iommu_0/smi_blk/, amd_iommu_0/smi_recv/, amd_iommu_0/tlb_inv/, amd_iommu_0/vapic_int_guest/, amd_iommu_0/vapic_int_non_guest/' sleep 10 Performance counter stats for 'system wide': 156 amd_iommu_0/cmd_processed/ (33.30%) 80 amd_iommu_0/cmd_processed_inv/ (33.38%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.40%) 0 amd_iommu_0/int_dte_hit/ (33.43%) 325 amd_iommu_0/int_dte_mis/ (33.44%) 1,951 amd_iommu_0/mem_dte_hit/ (33.45%) 7,589 amd_iommu_0/mem_dte_mis/ (33.49%) 325 amd_iommu_0/mem_iommu_tlb_pde_hit/ (33.45%) 2,460 amd_iommu_0/mem_iommu_tlb_pde_mis/ (33.41%) 2,510 amd_iommu_0/mem_iommu_tlb_pte_hit/ (33.38%) 5,526 amd_iommu_0/mem_iommu_tlb_pte_mis/ (33.33%) 0 amd_iommu_0/mem_pass_excl/ (33.29%) 0 amd_iommu_0/mem_pass_pretrans/ (33.28%) 1,556 amd_iommu_0/mem_pass_untrans/ (33.27%) 0 amd_iommu_0/mem_target_abort/ (33.26%) 3,112 amd_iommu_0/mem_trans_total/ (33.29%) 0 amd_iommu_0/page_tbl_read_gst/ (33.29%) 1,813 amd_iommu_0/page_tbl_read_nst/ (33.25%) 2,242 amd_iommu_0/page_tbl_read_tot/ (33.27%) 0 amd_iommu_0/smi_blk/ (33.29%) 0 amd_iommu_0/smi_recv/ (33.28%) 0 amd_iommu_0/tlb_inv/ (33.28%) 0 amd_iommu_0/vapic_int_guest/ (33.25%) 0 amd_iommu_0/vapic_int_non_guest/ (33.26%) 10.003200316 seconds time elapsed ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] iommu: io-pgtable: add DART pagetable format
On Fri, Apr 9, 2021 at 6:56 PM Sven Peter wrote: > On Wed, Apr 7, 2021, at 12:44, Will Deacon wrote: > > On Sun, Mar 28, 2021 at 09:40:07AM +0200, Sven Peter wrote: > > > > > + cfg->pgsize_bitmap &= SZ_16K; > > > + if (!cfg->pgsize_bitmap) > > > + return NULL; > > > > This is worrying (and again, I don't think this belongs here). How is this > > thing supposed to work if the CPU is using 4k pages? > > This SoC is just full of fun surprises! > I didn't even think about that case since I've always been using 16k pages so > far. > > I've checked again and wasn't able to find any way to configure the pagesize > of the IOMMU. There seem to be variants of this IP in older iPhones which > support a 4k pagesize but to the best of my knowledge this is hard wired > and not configurable in software. > > When booting with 4k pages I hit the BUG_ON in iova.c that ensures that the > iommu pagesize has to be <= the cpu page size. > > I see two options here and I'm not sure I like either of them: > > 1) Just don't support 4k CPU pages together with IOMMU translations and only >allow full bypass mode there. >This would however mean that PCIe (i.e. ethernet, usb ports on the Mac >mini) and possibly Thunderbolt support would not be possible since these >devices don't seem to like iommu bypass mode at all. It should be possible to do a fake bypass mode by just programming a static page table for as much address space as you can, and then use swiotlb to address any memory beyond that. This won't perform well because it requires bounce buffers for any high memory, but it can be a last resort if a dart instance cannot do normal bypass mode. > 2) I've had a brief discussion on IRC with Arnd about this [1] and he pointed >out that the dma_map_sg API doesn't make any guarantees about the returned >iovas and that it might be possible to make this work at least for devices >that go through the normal DMA API. > >I've then replaced the page size check with a WARN_ON in iova.c just to see >what happens. At least normal devices that go through the DMA API seem to >work with my configuration. iommu_dma_alloc took the iommu_dma_alloc_remap >path which was called with the cpu page size but then used >domain->pgsize_bitmap to increase that to 16k. So this kinda works out, but >there are other functions in dma-iommu.c that I believe rely on the fact > that >the iommu can map single cpu pages. This feels very fragile right now and >would probably require some rather invasive changes. The other second-to-last resort here would be to duplicate the code from the dma-iommu code and implement the dma-mapping API directly on top of the dart hardware instead of the iommu layer. This would probably be much faster than the swiotlb on top of a bypass or a linear map, but it's a really awful abstraction that would require adding special cases into a lot of generic code. >Any driver that tries to use the iommu API directly could be trouble >as well if they make similar assumptions. I think pretty much all drivers using the iommu API directly already depends on having a matching page size. I don't see any way to use e.g. PCI device assignment using vfio, or a GPU driver with per-process contexts when the iotlb page size is larger than the CPU's. >Is this something you would even want to support in the iommu subsytem >and is it even possible to do this in a sane way? I don't know how hard it is to do adjust the dma-iommu implementation to allow this, but as long as we can work out the DT binding to support both normal dma-iommu mode with 16KB pages and some kind of passthrough mode (emulated or not) with 4KB pages, it can be left as a possible optimization for later. Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation
On Thu, Apr 08, 2021 at 08:13:07PM -0700, Florian Fainelli wrote: > > > On 3/24/2021 1:42 AM, Christoph Hellwig wrote: > > On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote: > >> When SWIOTLB_NO_FORCE is used, there should really be no allocations of > >> default_nslabs to occur since we are not going to use those slabs. If a > >> platform was somehow setting swiotlb_no_force and a later call to > >> swiotlb_init() was to be made we would still be proceeding with > >> allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce > >> was set on the kernel command line we would have only allocated 2KB. > >> > >> This would be inconsistent and the point of initializing default_nslabs > >> to 1, was intended to allocate the minimum amount of memory possible, so > >> simply remove that minimal allocation period. > >> > >> Signed-off-by: Florian Fainelli > > > > Looks good, > > > > Reviewed-by: Christoph Hellwig > > > > Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch > if you are also happy with it? It should be now visible? > -- > Florian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API
Hi Lu, On Fri, 9 Apr 2021 20:45:22 +0800, Lu Baolu wrote: > > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t > > max) +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max) > > { > > int ret = 0; > > ioasid_t pasid; > > + struct mm_struct *mm; > > > > if (min == INVALID_IOASID || max == INVALID_IOASID || > > min == 0 || max < min) > > return -EINVAL; > > > > mutex_lock(&iommu_sva_lock); > > + mm = get_task_mm(current); > > How could we allocate a supervisor PASID through iommu_sva_alloc_pasid() > if we always use current->mm here? I don't think you can. But I guess the current callers of this function do not need supervisor PASID. In reply to Jean, I suggest we split this function into mm->pasid assignment and keep using ioasid_alloc() directly, then supervisor PASID is caller's bind choice. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API
Hi Jean-Philippe, On Fri, 9 Apr 2021 12:11:47 +0200, Jean-Philippe Brucker wrote: > On Thu, Apr 08, 2021 at 10:08:56AM -0700, Jacob Pan wrote: > > diff --git a/drivers/iommu/iommu-sva-lib.c > > b/drivers/iommu/iommu-sva-lib.c index bd41405..bd99f6b 100644 > > --- a/drivers/iommu/iommu-sva-lib.c > > +++ b/drivers/iommu/iommu-sva-lib.c > > @@ -12,27 +12,33 @@ static DECLARE_IOASID_SET(iommu_sva_pasid); > > > > /** > > * iommu_sva_alloc_pasid - Allocate a PASID for the mm > > - * @mm: the mm > > * @min: minimum PASID value (inclusive) > > * @max: maximum PASID value (inclusive) > > * > > - * Try to allocate a PASID for this mm, or take a reference to the > > existing one > > - * provided it fits within the [@min, @max] range. On success the > > PASID is > > - * available in mm->pasid, and must be released with > > iommu_sva_free_pasid(). > > + * Try to allocate a PASID for the current mm, or take a reference to > > the > > + * existing one provided it fits within the [@min, @max] range. On > > success > > + * the PASID is available in the current mm->pasid, and must be > > released with > > + * iommu_sva_free_pasid(). > > * @min must be greater than 0, because 0 indicates an unused > > mm->pasid. * > > * Returns 0 on success and < 0 on error. > > */ > > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t > > max) +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max) > > { > > int ret = 0; > > ioasid_t pasid; > > + struct mm_struct *mm; > > > > if (min == INVALID_IOASID || max == INVALID_IOASID || > > min == 0 || max < min) > > return -EINVAL; > > > > mutex_lock(&iommu_sva_lock); > > + mm = get_task_mm(current); > > + if (!mm) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > I still think it would be more elegant to keep the choice of context in > iommu_sva_bind_device() and pass it down to leaf functions such as > iommu_sva_alloc_pasid(). The patch is trying to solve two separate I agree if iommu_sva_alloc_pasid() is a leaf function, but it is a public function, e.g. called by smmu code: /* Allocate a PASID for this mm if necessary */ ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1); If we give mm as parameter, it will give callers the illusion that this mm doesn't have to be current->mm. Should we make it into a leaf function by splitting iommu_sva_alloc_pasid() into two parts? 1. iommu_sva_assign_pasid() //a new leaf helper function does mm->pasid assignment 2. ioasid_alloc() in iommu_sva_bind_device(), we do: 1. handle = driver ops->sva_bind(dev, mm, flags); 2. pasid = sva_get_pasid(handle); 3. iommu_sva_assign_pasid(mm, pasid) In vendor driver sva_bind(), it just use ioasid_alloc directly with custom range. e.g. arm-smmu-v3-sva.c - ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1); + ret = ioasid_alloc(&iommu_sva_pasid, 1, (1U << master->ssid_bits); > problems: > > * We don't have a use-case for binding the mm of a remote process (and > it's supposedly difficult for device drivers to do it securely). So OK, > we remove the mm argument from iommu_sva_bind_device() and use the > current mm. But the IOMMU driver isn't going to do get_task_mm(current) > every time it needs the mm being bound, it will take it from > iommu_sva_bind_device(). Likewise iommu_sva_alloc_pasid() shouldn't need > to bother with get_task_mm(). > > * cgroup accounting for IOASIDs needs to be on the current task. Removing > the mm parameter from iommu_sva_alloc_pasid() doesn't help with that. > Sure it indicates that iommu_sva_alloc_pasid() needs a specific task > context but that's only for cgroup purpose, and I'd rather pass the > cgroup down from iommu_sva_bind_device() anyway (but am fine with > keeping it within ioasid_alloc() for now). Plus it's an internal helper, > easy for us to check that the callers are doing the right thing. > With the above split, we really just have one allocation function: ioasid_alloc(), so it can manage current cgroup accounting within. Would this work? > > if (mm->pasid) { > > if (mm->pasid >= min && mm->pasid <= max) > > ioasid_get(mm->pasid); > > @@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, > > ioasid_t min, ioasid_t max) else > > mm->pasid = pasid; > > } > > + mmput(mm); > > +out_unlock: > > mutex_unlock(&iommu_sva_lock); > > return ret; > > } > > EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid); > > > > /** > > - * iommu_sva_free_pasid - Release the mm's PASID > > + * iommu_sva_free_pasid - Release the current mm's PASID > > * @mm: the mm > > * > > * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid() > > */ > > -void iommu_sva_free_pasid(struct mm_struct *mm) > > +void iommu_sva_free_pasid(void) > > { > > + struct mm_struct *mm; > > + > > mutex_lock(&
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 10:37 AM, Shuah Khan wrote: On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. unnecessarily Unfortunatly, there is no documentation of since which generation Unfortunately, Rephrase suggestion: Unfortunately, it is unclear when the PMC HW bug fixed. of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume Based that the buggy platforms are less likely to be in used, and it should in use be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; Why not leave this u64 val here? Having the pdev assignment as the first line makes it easier to read/follow. if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || - (val != val2)) Aha - this is going away anyway. Please ignore my comment on 1/2 about parenthesis around (val != val2) being unnecessary. - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: From: Paul Menzel This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b. The original commit tries to address an issue, where PMC power-gating causing the IOMMU PMC pre-init test to fail on certain desktop/mobile platforms where the power-gating is normally enabled. There have been several reports that the workaround still does not guarantee to work, and can add up to 100 ms (on the worst case) to the boot process on certain platforms such as the MSI B350M MORTAR with AMD Ryzen 3 2200G. Therefore, revert this commit as a prelude to removing the pre-init test. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Signed-off-by: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Note: I have revised the commit message to add more detail and remove uncessary information. drivers/iommu/amd/init.c | 45 ++-- 1 file changed, 11 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 321f5906e6ed..648cdfd03074 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE; static int amd_iommu_enable_interrupts(void); static int __init iommu_go_to_state(enum iommu_init_state state); static void init_device_table_dma(void); -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); static bool amd_iommu_pre_enabled = true; @@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) +static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, + u8 fxn, u64 *value, bool is_write); + +static void init_iommu_perf_ctr(struct amd_iommu *iommu) { - int retry; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg, save_src; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; @@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) amd_iommu_pc_present = true; /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, false)) - goto pc_false; - - /* -* Disable power gating by programing the performance counter -* source to 20 (i.e. counts the reads and writes from/to IOMMU -* Reserved Register [MMIO Offset 1FF8h] that are ignored.), -* which never get incremented during this init phase. -* (Note: The event is also deprecated.) -*/ - val = 20; - if (iommu_pc_get_set_reg(iommu, 0, 0, 8, &val, true)) + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) goto pc_false; /* Check if the performance counters can be written to */ - val = 0xabcd; - for (retry = 5; retry; retry--) { - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false) || - val2) - break; - - /* Wait about 20 msec for power gating to disable and retry. */ - msleep(20); - } - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, true)) + if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || + (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || + (val != val2)) Probably don't need parentheses around 'val != val2' goto pc_false; - if (val != val2) + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: dart: fix call_kern.cocci warnings
On Sun, Apr 4, 2021, at 17:26, Julia Lawall wrote: > From: kernel test robot > > Function apple_dart_attach_stream called on line 519 inside > lock on line 509 but uses GFP_KERNEL Thanks! Fixed for v3. Best, Sven ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/3] iommu: io-pgtable: add DART pagetable format
On Wed, Apr 7, 2021, at 12:44, Will Deacon wrote: > On Sun, Mar 28, 2021 at 09:40:07AM +0200, Sven Peter wrote: [...] > > > > +static struct io_pgtable * > > +apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) > > +{ > > + struct arm_lpae_io_pgtable *data; > > + > > + if (cfg->ias > 36) > > + return NULL; > > + if (cfg->oas > 36) > > + return NULL; > > + > > + if (!cfg->coherent_walk) > > + return NULL; > > This all feels like IOMMU-specific limitations leaking into the page-table > code here; it doesn't feel so unlikely that future implementations of this > IP might have greater addressing capabilities, for example, and so I don't > see why the page-table code needs to police this. That's true, this really doesn't belong here. I'll fix it for the next version and make sure to keep iommu-specific limitations inside the driver itself. > > > + cfg->pgsize_bitmap &= SZ_16K; > > + if (!cfg->pgsize_bitmap) > > + return NULL; > > This is worrying (and again, I don't think this belongs here). How is this > thing supposed to work if the CPU is using 4k pages? This SoC is just full of fun surprises! I didn't even think about that case since I've always been using 16k pages so far. I've checked again and wasn't able to find any way to configure the pagesize of the IOMMU. There seem to be variants of this IP in older iPhones which support a 4k pagesize but to the best of my knowledge this is hard wired and not configurable in software. When booting with 4k pages I hit the BUG_ON in iova.c that ensures that the iommu pagesize has to be <= the cpu page size. I see two options here and I'm not sure I like either of them: 1) Just don't support 4k CPU pages together with IOMMU translations and only allow full bypass mode there. This would however mean that PCIe (i.e. ethernet, usb ports on the Mac mini) and possibly Thunderbolt support would not be possible since these devices don't seem to like iommu bypass mode at all. 2) I've had a brief discussion on IRC with Arnd about this [1] and he pointed out that the dma_map_sg API doesn't make any guarantees about the returned iovas and that it might be possible to make this work at least for devices that go through the normal DMA API. I've then replaced the page size check with a WARN_ON in iova.c just to see what happens. At least normal devices that go through the DMA API seem to work with my configuration. iommu_dma_alloc took the iommu_dma_alloc_remap path which was called with the cpu page size but then used domain->pgsize_bitmap to increase that to 16k. So this kinda works out, but there are other functions in dma-iommu.c that I believe rely on the fact that the iommu can map single cpu pages. This feels very fragile right now and would probably require some rather invasive changes. Any driver that tries to use the iommu API directly could be trouble as well if they make similar assumptions. Is this something you would even want to support in the iommu subsytem and is it even possible to do this in a sane way? Best, Sven [1] https://freenode.irclog.whitequark.org/asahi/2021-04-07#29609786; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/3] iommu: dart: Add DART iommu driver
On Wed, Apr 7, 2021, at 12:42, Will Deacon wrote: > On Sun, Mar 28, 2021 at 09:40:09AM +0200, Sven Peter wrote: > > Apple's new SoCs use iommus for almost all peripherals. These Device > > Address Resolution Tables must be setup before these peripherals can > > act as DMA masters. > > > > Signed-off-by: Sven Peter > > --- > > MAINTAINERS | 1 + > > drivers/iommu/Kconfig| 14 + > > drivers/iommu/Makefile | 1 + > > drivers/iommu/apple-dart-iommu.c | 858 +++ > > 4 files changed, 874 insertions(+) > > create mode 100644 drivers/iommu/apple-dart-iommu.c > > [...] > > > +/* must be called with held domain->lock */ > > +static int apple_dart_attach_stream(struct apple_dart_domain *domain, > > + struct apple_dart *dart, u32 sid) > > +{ > > + unsigned long flags; > > + struct apple_dart_stream *stream; > > + struct io_pgtable_cfg *pgtbl_cfg; > > + int ret; > > + > > + list_for_each_entry(stream, &domain->streams, stream_head) { > > + if (stream->dart == dart && stream->sid == sid) { > > + stream->num_devices++; > > + return 0; > > + } > > + } > > + > > + spin_lock_irqsave(&dart->lock, flags); > > + > > + if (WARN_ON(dart->used_sids & BIT(sid))) { > > + ret = -EINVAL; > > + goto error; > > + } > > + > > + stream = kzalloc(sizeof(*stream), GFP_KERNEL); > > + if (!stream) { > > + ret = -ENOMEM; > > + goto error; > > + } > > Just in case you missed it, a cocci bot noticed that you're using GFP_KERNEL > to allocate while holding a spinlock here: > > https://lore.kernel.org/r/alpine.DEB.2.22.394.2104041724340.2958@hadrien > Thanks for the reminder! I haven't replied yet because that one was found later when the bot picked up a (slightly earlier) version that Marc was using to bring up pcie I believe. I'll fix it for the next version. Sven ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
On 4/9/21 2:58 AM, Suravee Suthikulpanit wrote: In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. unnecessarily Unfortunatly, there is no documentation of since which generation Unfortunately, Rephrase suggestion: Unfortunately, it is unclear when the PMC HW bug fixed. of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume Based that the buggy platforms are less likely to be in used, and it should in use be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || - (val != val2)) - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, I will test this patch and the revert on my two AMD systems and update you in a day or two. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
On 09.04.21 15:35, Arnd Bergmann wrote: On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand wrote: Random drivers should not override a user configuration of core knobs (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, which depends on CMA, if possible; however, these drivers also have to tolerate if DMA_CMA is not available/functioning, for example, if no CMA area for DMA_CMA use has been setup via "cma=X". In the worst case, the driver cannot do it's job properly in some configurations. For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if available") documents While this is no build dependency, etnaviv will only work correctly on most systems if CMA and DMA_CMA are enabled. Select both options if available to avoid users ending up with a non-working GPU due to a lacking kernel config. So etnaviv really wants to have DMA_CMA, however, can deal with some cases where it is not available. Let's introduce WANT_DMA_CMA and use it in most cases where drivers select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because of recursive dependency issues). We'll assume that any driver that selects DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible. With this change, distributions can disable CONFIG_CMA or CONFIG_DMA_CMA, without it silently getting enabled again by random drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA and CONFIG_DMA_CMA if they are unspecified and any driver is around that selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER. For example, if any driver selects WANT_DMA_CMA and we do a "make olddefconfig": 1. With "# CONFIG_CMA is not set" and no specification of "CONFIG_DMA_CMA" -> CONFIG_DMA_CMA won't be part of .config 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA Contiguous Memory Allocator (CMA) [Y/n/?] (NEW) DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW) 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set" -> CONFIG_DMA_CMA will be removed from .config Note: drivers/remoteproc seems to be special; commit c51e882cd711 ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that there is a real dependency to DMA_CMA for it to work; leave that dependency in place and don't convert it to a soft dependency. I don't think this dependency is fundamentally different from the others, though davinci machines tend to have less memory than a lot of the other machines, so it's more likely to fail without CMA. I was also unsure - and Lucas had similar thoughts. If you want, I can send a v4 also taking care of this. Thanks! Regardless of this, Reviewed-by: Arnd Bergmann -- Thanks, David / dhildenb ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
On Fri, Apr 9, 2021 at 1:21 PM David Hildenbrand wrote: > > Random drivers should not override a user configuration of core knobs > (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, > which depends on CMA, if possible; however, these drivers also have to > tolerate if DMA_CMA is not available/functioning, for example, if no CMA > area for DMA_CMA use has been setup via "cma=X". In the worst case, the > driver cannot do it's job properly in some configurations. > > For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if > available") documents > While this is no build dependency, etnaviv will only work correctly > on most systems if CMA and DMA_CMA are enabled. Select both options > if available to avoid users ending up with a non-working GPU due to > a lacking kernel config. > So etnaviv really wants to have DMA_CMA, however, can deal with some cases > where it is not available. > > Let's introduce WANT_DMA_CMA and use it in most cases where drivers > select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because > of recursive dependency issues). > > We'll assume that any driver that selects DRM_GEM_CMA_HELPER or > DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible. > > With this change, distributions can disable CONFIG_CMA or > CONFIG_DMA_CMA, without it silently getting enabled again by random > drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA > and CONFIG_DMA_CMA if they are unspecified and any driver is around that > selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or > DRM_KMS_CMA_HELPER. > > For example, if any driver selects WANT_DMA_CMA and we do a > "make olddefconfig": > > 1. With "# CONFIG_CMA is not set" and no specification of >"CONFIG_DMA_CMA" > > -> CONFIG_DMA_CMA won't be part of .config > > 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA > > Contiguous Memory Allocator (CMA) [Y/n/?] (NEW) > DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW) > > 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set" > > -> CONFIG_DMA_CMA will be removed from .config > > Note: drivers/remoteproc seems to be special; commit c51e882cd711 > ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that > there is a real dependency to DMA_CMA for it to work; leave that dependency > in place and don't convert it to a soft dependency. I don't think this dependency is fundamentally different from the others, though davinci machines tend to have less memory than a lot of the other machines, so it's more likely to fail without CMA. Regardless of this, Reviewed-by: Arnd Bergmann ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients
08.04.2021 17:07, Dmitry Osipenko пишет: >> Whatever happened to the idea of creating identity mappings based on the >> obscure tegra_fb_mem (or whatever it was called) command-line option? Is >> that command-line not universally passed to the kernel from bootloaders >> that initialize display? > This is still a good idea! The command-line isn't universally passed > just because it's up to a user to override the cmdline and then we get a > hang (a very slow boot in reality) on T30 since display client takes out > the whole memory bus with the constant SMMU faults. For example I don't > have that cmdline option in my current setups. > Thinking a bit more about the identity.. have you consulted with the memory h/w people about whether it's always safe to enable ASID in a middle of DMA request? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API
Hi, On 2021/4/9 1:08, Jacob Pan wrote: /** * iommu_sva_alloc_pasid - Allocate a PASID for the mm - * @mm: the mm * @min: minimum PASID value (inclusive) * @max: maximum PASID value (inclusive) * - * Try to allocate a PASID for this mm, or take a reference to the existing one - * provided it fits within the [@min, @max] range. On success the PASID is - * available in mm->pasid, and must be released with iommu_sva_free_pasid(). + * Try to allocate a PASID for the current mm, or take a reference to the + * existing one provided it fits within the [@min, @max] range. On success + * the PASID is available in the current mm->pasid, and must be released with + * iommu_sva_free_pasid(). * @min must be greater than 0, because 0 indicates an unused mm->pasid. * * Returns 0 on success and < 0 on error. */ -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max) +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max) { int ret = 0; ioasid_t pasid; + struct mm_struct *mm; if (min == INVALID_IOASID || max == INVALID_IOASID || min == 0 || max < min) return -EINVAL; mutex_lock(&iommu_sva_lock); + mm = get_task_mm(current); How could we allocate a supervisor PASID through iommu_sva_alloc_pasid() if we always use current->mm here? + if (!mm) { + ret = -EINVAL; + goto out_unlock; + } if (mm->pasid) { if (mm->pasid >= min && mm->pasid <= max) ioasid_get(mm->pasid); @@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max) else mm->pasid = pasid; } + mmput(mm); +out_unlock: mutex_unlock(&iommu_sva_lock); return ret; } EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
On Fri, Apr 9, 2021 at 1:20 PM David Hildenbrand wrote: > Random drivers should not override a user configuration of core knobs > (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, > which depends on CMA, if possible; however, these drivers also have to > tolerate if DMA_CMA is not available/functioning, for example, if no CMA > area for DMA_CMA use has been setup via "cma=X". In the worst case, the > driver cannot do it's job properly in some configurations. Looks good to me. At least a lot better than what we have. Reviewed-by: Linus Walleij > Let's see if this approach is better for soft dependencies (and if we > actually have some hard dependencies in there). This is the follow-up > of > https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com > https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com You can just add these to the commit message with Link: when applying so people can easily find the discussion from the commit. > I was wondering if it would make sense in some drivers to warn if either > CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured > properly - just to give people a heads up that something might more likely > go wrong; that would, however, be future work. I think the frameworks (DRM_*_CMA_HELPER) should pr_info something about it so the individual drivers don't have to sanity check their entire world. Yours, Linus Walleij ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
Am Freitag, dem 09.04.2021 um 13:20 +0200 schrieb David Hildenbrand: > Random drivers should not override a user configuration of core knobs > (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, > which depends on CMA, if possible; however, these drivers also have to > tolerate if DMA_CMA is not available/functioning, for example, if no CMA > area for DMA_CMA use has been setup via "cma=X". In the worst case, the > driver cannot do it's job properly in some configurations. > > For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if > available") documents > While this is no build dependency, etnaviv will only work correctly > on most systems if CMA and DMA_CMA are enabled. Select both options > if available to avoid users ending up with a non-working GPU due to > a lacking kernel config. > So etnaviv really wants to have DMA_CMA, however, can deal with some cases > where it is not available. > > Let's introduce WANT_DMA_CMA and use it in most cases where drivers > select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because > of recursive dependency issues). > > We'll assume that any driver that selects DRM_GEM_CMA_HELPER or > DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible. > > With this change, distributions can disable CONFIG_CMA or > CONFIG_DMA_CMA, without it silently getting enabled again by random > drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA > and CONFIG_DMA_CMA if they are unspecified and any driver is around that > selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or > DRM_KMS_CMA_HELPER. > > For example, if any driver selects WANT_DMA_CMA and we do a > "make olddefconfig": > > 1. With "# CONFIG_CMA is not set" and no specification of > "CONFIG_DMA_CMA" > > -> CONFIG_DMA_CMA won't be part of .config > > 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA > > Contiguous Memory Allocator (CMA) [Y/n/?] (NEW) > DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW) > > 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set" > > -> CONFIG_DMA_CMA will be removed from .config > > Note: drivers/remoteproc seems to be special; commit c51e882cd711 > ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that > there is a real dependency to DMA_CMA for it to work; leave that dependency > in place and don't convert it to a soft dependency. Hm, to me this sounds much like the reasoning for the etnaviv dependency. There is no actual build dependency, as the allocations are done through the DMA API, but for the allocations to succeed you most likely want CMA to be enabled. But that's just an observation from the outside, I have no real clue about the remoteproc drivers. As far as the etnaviv changes are concerned: Acked-by: Lucas Stach > Cc: Maarten Lankhorst > Cc: Maxime Ripard > Cc: Thomas Zimmermann > Cc: David Airlie > Cc: Daniel Vetter > Cc: Joel Stanley > Cc: Andrew Jeffery > Cc: Lucas Stach > Cc: Russell King > Cc: Christian Gmeiner > Cc: Paul Cercueil > Cc: Linus Walleij > Cc: Christoph Hellwig > Cc: Marek Szyprowski > Cc: Robin Murphy > Cc: Andrew Morton > Cc: Mike Rapoport > Cc: Arnd Bergmann > Cc: Bartlomiej Zolnierkiewicz > Cc: Eric Anholt > Cc: Michal Simek > Cc: Masahiro Yamada > Cc: "Alexander A. Klimov" > Cc: Peter Collingbourne > Cc: Suman Anna > Cc: Jason Gunthorpe > Cc: dri-de...@lists.freedesktop.org > Cc: linux-asp...@lists.ozlabs.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: etna...@lists.freedesktop.org > Cc: linux-m...@vger.kernel.org > Cc: linux-fb...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Signed-off-by: David Hildenbrand > --- > > Let's see if this approach is better for soft dependencies (and if we > actually have some hard dependencies in there). This is the follow-up > of > https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com > https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com > > I was wondering if it would make sense in some drivers to warn if either > CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured > properly - just to give people a heads up that something might more likely > go wrong; that would, however, be future work. > > v2 -> v3: > - Don't use "imply" but instead use a new WANT_DMA_CMA and make the default > of CMA and DMA_CMA depend on it. > - Also adjust ingenic, mcde, tve200; these sound like soft dependencies as > well (although DMA_CMA is really desired) > > v1 -> v2: > - Fix DRM_CMA -> DMA_CMA > > --- > drivers/gpu/drm/Kconfig | 2 ++ > drivers/gpu/drm/aspeed/Kconfig | 2 -- > drivers/gpu/drm/etnaviv/Kconfig | 3 +-- > drivers/gpu/drm/ingenic/Kconfig | 1 - > drivers/gpu/drm/mcde/Kconfig| 1 - > drivers/gpu/drm/tve200/Kconfig | 1 - > drivers/video/fbdev/Kconfig | 2 +- > kernel/dma/Kconfig | 7 +++ > mm/Kconfig | 1 + > 9 fil
Re: [PATCH] iommu: exynos: remove unneeded local variable initialization
On 08.04.2021 22:16, Krzysztof Kozlowski wrote: > The initialization of 'fault_addr' local variable is not needed as it is > shortly after overwritten. > > Addresses-Coverity: Unused value > Signed-off-by: Krzysztof Kozlowski Acked-by: Marek Szyprowski > --- > drivers/iommu/exynos-iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c > index de324b4eedfe..8fa9a591fb96 100644 > --- a/drivers/iommu/exynos-iommu.c > +++ b/drivers/iommu/exynos-iommu.c > @@ -407,7 +407,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void > *dev_id) > struct sysmmu_drvdata *data = dev_id; > const struct sysmmu_fault_info *finfo; > unsigned int i, n, itype; > - sysmmu_iova_t fault_addr = -1; > + sysmmu_iova_t fault_addr; > unsigned short reg_status, reg_clear; > int ret = -ENOSYS; > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags
Hi Jacob, On 2021/4/9 1:08, Jacob Pan wrote: The void* drvdata parameter isn't really used in iommu_sva_bind_device() API, the current IDXD code "borrows" the drvdata for a VT-d private flag for supervisor SVA usage. Supervisor/Privileged mode request is a generic feature. It should be promoted from the VT-d vendor driver to the generic code. This patch replaces void* drvdata with a unsigned int flags parameter and adjusts callers accordingly. Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/ Suggested-by: Jean-Philippe Brucker Signed-off-by: Jacob Pan --- drivers/dma/idxd/cdev.c | 2 +- drivers/dma/idxd/init.c | 6 +++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++-- drivers/iommu/intel/Kconfig | 1 + drivers/iommu/intel/svm.c | 18 ++ drivers/iommu/iommu.c | 9 ++--- drivers/misc/uacce/uacce.c | 2 +- include/linux/intel-iommu.h | 2 +- include/linux/intel-svm.h | 17 ++--- include/linux/iommu.h | 19 --- 11 files changed, 40 insertions(+), 42 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 0db9b82..21ec82b 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp) filp->private_data = ctx; if (device_pasid_enabled(idxd)) { - sva = iommu_sva_bind_device(dev, current->mm, NULL); + sva = iommu_sva_bind_device(dev, current->mm, 0); if (IS_ERR(sva)) { rc = PTR_ERR(sva); dev_err(dev, "pasid allocation failed: %d\n", rc); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 085a0c3..cdc85f1 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev) static int idxd_enable_system_pasid(struct idxd_device *idxd) { - int flags; + unsigned int flags; unsigned int pasid; struct iommu_sva *sva; - flags = SVM_FLAG_SUPERVISOR_MODE; + flags = IOMMU_SVA_BIND_SUPERVISOR; With SVM_FLAG_SUPERVISOR_MODE removed, I guess #include in this file could be removed as well. - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags); + sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags); if (IS_ERR(sva)) { dev_warn(&idxd->pdev->dev, "iommu sva bind failed: %ld\n", PTR_ERR(sva)); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index bb251ca..23e287e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) } struct iommu_sva * -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags) { struct iommu_sva *handle; struct iommu_domain *domain = iommu_get_domain_for_dev(dev); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f985817..b971d4d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master); int arm_smmu_master_enable_sva(struct arm_smmu_master *master); int arm_smmu_master_disable_sva(struct arm_smmu_master *master); struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, - void *drvdata); + unsigned int flags); void arm_smmu_sva_unbind(struct iommu_sva *handle); u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle); void arm_smmu_sva_notifier_synchronize(void); @@ -742,7 +742,7 @@ static inline int arm_smmu_master_disable_sva(struct arm_smmu_master *master) } static inline struct iommu_sva * -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags) { return ERR_PTR(-ENODEV); } diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig index 28a3d15..5415052 100644 --- a/drivers/iommu/intel/Kconfig +++ b/drivers/iommu/intel/Kconfig @@ -41,6 +41,7 @@ config INTEL_IOMMU_SVM select PCI_PRI select MMU_NOTIFIER select IOASID + select IOMMU_SVA_LIB Why? help Shared Virtual Memory (SVM) provides a facility for devices
[PATCH v3] drivers: introduce and use WANT_DMA_CMA for soft dependencies on DMA_CMA
Random drivers should not override a user configuration of core knobs (e.g., CONFIG_DMA_CMA=n). Applicable drivers would like to use DMA_CMA, which depends on CMA, if possible; however, these drivers also have to tolerate if DMA_CMA is not available/functioning, for example, if no CMA area for DMA_CMA use has been setup via "cma=X". In the worst case, the driver cannot do it's job properly in some configurations. For example, commit 63f5677544b3 ("drm/etnaviv: select CMA and DMA_CMA if available") documents While this is no build dependency, etnaviv will only work correctly on most systems if CMA and DMA_CMA are enabled. Select both options if available to avoid users ending up with a non-working GPU due to a lacking kernel config. So etnaviv really wants to have DMA_CMA, however, can deal with some cases where it is not available. Let's introduce WANT_DMA_CMA and use it in most cases where drivers select CMA/DMA_CMA, or depend on DMA_CMA (in a wrong way via CMA because of recursive dependency issues). We'll assume that any driver that selects DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER would like to use DMA_CMA if possible. With this change, distributions can disable CONFIG_CMA or CONFIG_DMA_CMA, without it silently getting enabled again by random drivers. Also, we'll now automatically try to enabled both, CONFIG_CMA and CONFIG_DMA_CMA if they are unspecified and any driver is around that selects WANT_DMA_CMA -- also implicitly via DRM_GEM_CMA_HELPER or DRM_KMS_CMA_HELPER. For example, if any driver selects WANT_DMA_CMA and we do a "make olddefconfig": 1. With "# CONFIG_CMA is not set" and no specification of "CONFIG_DMA_CMA" -> CONFIG_DMA_CMA won't be part of .config 2. With no specification of CONFIG_CMA or CONFIG_DMA_CMA Contiguous Memory Allocator (CMA) [Y/n/?] (NEW) DMA Contiguous Memory Allocator (DMA_CMA) [Y/n/?] (NEW) 3. With "# CONFIG_CMA is not set" and "# CONFIG_DMA_CMA is not set" -> CONFIG_DMA_CMA will be removed from .config Note: drivers/remoteproc seems to be special; commit c51e882cd711 ("remoteproc/davinci: Update Kconfig to depend on DMA_CMA") explains that there is a real dependency to DMA_CMA for it to work; leave that dependency in place and don't convert it to a soft dependency. Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: Joel Stanley Cc: Andrew Jeffery Cc: Lucas Stach Cc: Russell King Cc: Christian Gmeiner Cc: Paul Cercueil Cc: Linus Walleij Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Andrew Morton Cc: Mike Rapoport Cc: Arnd Bergmann Cc: Bartlomiej Zolnierkiewicz Cc: Eric Anholt Cc: Michal Simek Cc: Masahiro Yamada Cc: "Alexander A. Klimov" Cc: Peter Collingbourne Cc: Suman Anna Cc: Jason Gunthorpe Cc: dri-de...@lists.freedesktop.org Cc: linux-asp...@lists.ozlabs.org Cc: linux-arm-ker...@lists.infradead.org Cc: etna...@lists.freedesktop.org Cc: linux-m...@vger.kernel.org Cc: linux-fb...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Signed-off-by: David Hildenbrand --- Let's see if this approach is better for soft dependencies (and if we actually have some hard dependencies in there). This is the follow-up of https://lkml.kernel.org/r/20210408092011.52763-1-da...@redhat.com https://lkml.kernel.org/r/20210408100523.63356-1-da...@redhat.com I was wondering if it would make sense in some drivers to warn if either CONFIG_DMA_CMA is not available or if DRM_CMA has not been configured properly - just to give people a heads up that something might more likely go wrong; that would, however, be future work. v2 -> v3: - Don't use "imply" but instead use a new WANT_DMA_CMA and make the default of CMA and DMA_CMA depend on it. - Also adjust ingenic, mcde, tve200; these sound like soft dependencies as well (although DMA_CMA is really desired) v1 -> v2: - Fix DRM_CMA -> DMA_CMA --- drivers/gpu/drm/Kconfig | 2 ++ drivers/gpu/drm/aspeed/Kconfig | 2 -- drivers/gpu/drm/etnaviv/Kconfig | 3 +-- drivers/gpu/drm/ingenic/Kconfig | 1 - drivers/gpu/drm/mcde/Kconfig| 1 - drivers/gpu/drm/tve200/Kconfig | 1 - drivers/video/fbdev/Kconfig | 2 +- kernel/dma/Kconfig | 7 +++ mm/Kconfig | 1 + 9 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 85b79a7fee63..6f9989adfa93 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -201,12 +201,14 @@ config DRM_TTM_HELPER config DRM_GEM_CMA_HELPER bool depends on DRM + select WANT_DMA_CMA help Choose this if you need the GEM CMA helper functions config DRM_KMS_CMA_HELPER bool depends on DRM + select WANT_DMA_CMA select DRM_GEM_CMA_HELPER help Choose this if you need the KMS CMA helper functions diff --git a/drivers/gpu/drm/aspeed/Kconfig b/drivers/gpu/drm/aspeed/Kconfig i
Re: [PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags
On Thu, Apr 08, 2021 at 10:08:55AM -0700, Jacob Pan wrote: > The void* drvdata parameter isn't really used in iommu_sva_bind_device() > API, Right, it used to be a cookie passed to the device driver in the exit_mm() callback, but that went away with edcc40d2ab5f ("iommu: Remove iommu_sva_ops::mm_exit()") > the current IDXD code "borrows" the drvdata for a VT-d private flag > for supervisor SVA usage. > > Supervisor/Privileged mode request is a generic feature. It should be > promoted from the VT-d vendor driver to the generic code. > > This patch replaces void* drvdata with a unsigned int flags parameter > and adjusts callers accordingly. Thanks for cleaning this up. Making flags unsigned long seems more common (I suggested int without thinking). But it doesn't matter much, we won't get to 32 flags. > > Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/ > Suggested-by: Jean-Philippe Brucker > Signed-off-by: Jacob Pan > --- > drivers/dma/idxd/cdev.c | 2 +- > drivers/dma/idxd/init.c | 6 +++--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++-- > drivers/iommu/intel/Kconfig | 1 + > drivers/iommu/intel/svm.c | 18 ++ > drivers/iommu/iommu.c | 9 ++--- > drivers/misc/uacce/uacce.c | 2 +- > include/linux/intel-iommu.h | 2 +- > include/linux/intel-svm.h | 17 ++--- > include/linux/iommu.h | 19 --- > 11 files changed, 40 insertions(+), 42 deletions(-) > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > index 0db9b82..21ec82b 100644 > --- a/drivers/dma/idxd/cdev.c > +++ b/drivers/dma/idxd/cdev.c > @@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct > file *filp) > filp->private_data = ctx; > > if (device_pasid_enabled(idxd)) { > - sva = iommu_sva_bind_device(dev, current->mm, NULL); > + sva = iommu_sva_bind_device(dev, current->mm, 0); > if (IS_ERR(sva)) { > rc = PTR_ERR(sva); > dev_err(dev, "pasid allocation failed: %d\n", rc); > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c > index 085a0c3..cdc85f1 100644 > --- a/drivers/dma/idxd/init.c > +++ b/drivers/dma/idxd/init.c > @@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev > *pdev) > > static int idxd_enable_system_pasid(struct idxd_device *idxd) > { > - int flags; > + unsigned int flags; > unsigned int pasid; > struct iommu_sva *sva; > > - flags = SVM_FLAG_SUPERVISOR_MODE; > + flags = IOMMU_SVA_BIND_SUPERVISOR; > > - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags); > + sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, flags); > if (IS_ERR(sva)) { > dev_warn(&idxd->pdev->dev, >"iommu sva bind failed: %ld\n", PTR_ERR(sva)); > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index bb251ca..23e287e 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct > *mm) > } > > struct iommu_sva * > -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) > +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int > flags) Could you add a check on flags: diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index bb251cab61f3..145ceb2fc5da 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -354,12 +354,15 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) } struct iommu_sva * -arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) +arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags) { struct iommu_sva *handle; struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + if (flags) + return ERR_PTR(-EINVAL); + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1) return ERR_PTR(-EINVAL); > { > struct iommu_sva *handle; > struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index f985817..b971d4d 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(st
Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API
On Thu, Apr 08, 2021 at 10:08:56AM -0700, Jacob Pan wrote: > diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c > index bd41405..bd99f6b 100644 > --- a/drivers/iommu/iommu-sva-lib.c > +++ b/drivers/iommu/iommu-sva-lib.c > @@ -12,27 +12,33 @@ static DECLARE_IOASID_SET(iommu_sva_pasid); > > /** > * iommu_sva_alloc_pasid - Allocate a PASID for the mm > - * @mm: the mm > * @min: minimum PASID value (inclusive) > * @max: maximum PASID value (inclusive) > * > - * Try to allocate a PASID for this mm, or take a reference to the existing > one > - * provided it fits within the [@min, @max] range. On success the PASID is > - * available in mm->pasid, and must be released with iommu_sva_free_pasid(). > + * Try to allocate a PASID for the current mm, or take a reference to the > + * existing one provided it fits within the [@min, @max] range. On success > + * the PASID is available in the current mm->pasid, and must be released with > + * iommu_sva_free_pasid(). > * @min must be greater than 0, because 0 indicates an unused mm->pasid. > * > * Returns 0 on success and < 0 on error. > */ > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max) > +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max) > { > int ret = 0; > ioasid_t pasid; > + struct mm_struct *mm; > > if (min == INVALID_IOASID || max == INVALID_IOASID || > min == 0 || max < min) > return -EINVAL; > > mutex_lock(&iommu_sva_lock); > + mm = get_task_mm(current); > + if (!mm) { > + ret = -EINVAL; > + goto out_unlock; > + } I still think it would be more elegant to keep the choice of context in iommu_sva_bind_device() and pass it down to leaf functions such as iommu_sva_alloc_pasid(). The patch is trying to solve two separate problems: * We don't have a use-case for binding the mm of a remote process (and it's supposedly difficult for device drivers to do it securely). So OK, we remove the mm argument from iommu_sva_bind_device() and use the current mm. But the IOMMU driver isn't going to do get_task_mm(current) every time it needs the mm being bound, it will take it from iommu_sva_bind_device(). Likewise iommu_sva_alloc_pasid() shouldn't need to bother with get_task_mm(). * cgroup accounting for IOASIDs needs to be on the current task. Removing the mm parameter from iommu_sva_alloc_pasid() doesn't help with that. Sure it indicates that iommu_sva_alloc_pasid() needs a specific task context but that's only for cgroup purpose, and I'd rather pass the cgroup down from iommu_sva_bind_device() anyway (but am fine with keeping it within ioasid_alloc() for now). Plus it's an internal helper, easy for us to check that the callers are doing the right thing. > if (mm->pasid) { > if (mm->pasid >= min && mm->pasid <= max) > ioasid_get(mm->pasid); > @@ -45,22 +51,32 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t > min, ioasid_t max) > else > mm->pasid = pasid; > } > + mmput(mm); > +out_unlock: > mutex_unlock(&iommu_sva_lock); > return ret; > } > EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid); > > /** > - * iommu_sva_free_pasid - Release the mm's PASID > + * iommu_sva_free_pasid - Release the current mm's PASID > * @mm: the mm > * > * Drop one reference to a PASID allocated with iommu_sva_alloc_pasid() > */ > -void iommu_sva_free_pasid(struct mm_struct *mm) > +void iommu_sva_free_pasid(void) > { > + struct mm_struct *mm; > + > mutex_lock(&iommu_sva_lock); > + mm = get_task_mm(current); > + if (!mm) > + goto out_unlock; > + More importantly, could we at least dissociate free_pasid() from the current process? Otherwise drivers can't clean up from a workqueue (as amdkfd does) or from an rcu callback. Given that iommu_sva_unbind_device() takes the SVA handle owned by whomever did bind(), there shouldn't be any security issue. For the cgroup problem, ioasid.c could internally keep track of the cgroup used during allocation rather than assuming the context of ioasid_put() is the same as ioasid_get() > if (ioasid_put(mm->pasid)) > mm->pasid = 0; > + mmput(mm); > +out_unlock: > mutex_unlock(&iommu_sva_lock); > } > EXPORT_SYMBOL_GPL(iommu_sva_free_pasid); > diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h > index b40990a..278b8b4 100644 > --- a/drivers/iommu/iommu-sva-lib.h > +++ b/drivers/iommu/iommu-sva-lib.h > @@ -8,8 +8,8 @@ > #include > #include > > -int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); > -void iommu_sva_free_pasid(struct mm_struct *mm); > +int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max); > +void iommu_sva_free_pasid(void); > struct mm_struct *iommu_sva_find(ioasid_t pasid); > > #endif /* _IOMMU_SVA_LIB_H */ > diff --git a/drivers/
RE: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 09 April 2021 10:51 > To: Shameerali Kolothum Thodi ; > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > iommu@lists.linux-foundation.org; de...@acpica.org > Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun > (Hanjun Guo) ; sami.muja...@arm.com; > robin.mur...@arm.com; wanghuiqiang > Subject: Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node > > Hi Shameer, > > On 11/19/20 1:11 PM, Shameer Kolothum wrote: > > RFC v1 --> v2: > > - Added a generic interface for IOMMU drivers to retrieve all the > > RMR info associated with a given IOMMU. > > - SMMUv3 driver gets the RMR list during probe() and installs > > bypass STEs for all the SIDs in the RMR list. This is to keep > > the ongoing traffic alive(if any) during SMMUv3 reset. This is > >based on the suggestions received for v1 to take care of the > >EFI framebuffer use case. Only sanity tested for now. > > - During the probe/attach device, SMMUv3 driver reserves any > > RMR region associated with the device such that there is a unity > > mapping for them in SMMU. > > --- > > > > The series adds support to IORT RMR nodes specified in IORT > > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory > > ranges that are used by endpoints and require a unity mapping > > in SMMU. > > > > We have faced issues with 3408iMR RAID controller cards which > > fail to boot when SMMU is enabled. This is because these controllers > > make use of host memory for various caching related purposes and when > > SMMU is enabled the iMR firmware fails to access these memory regions > > as there is no mapping for them. IORT RMR provides a way for UEFI to > > describe and report these memory regions so that the kernel can make > > a unity mapping for these in SMMU. > > > > RFC because, Patch #1 is to update the actbl2.h and should be done > > through acpica update. I have send out a pull request[1] for that. > > What is the state of this series? I wondered if I should consider using > it for nested SMMU to avoid handling nested binding for MSI, as > suggested by Jean. Are there any blocker? There were few issues with the revision E spec and those are now sorted with an updated E.b. The ACPICA pull request for E.b is now merged[1] and the Linux header update patch[2] is also out there(I think it is now queued for 5.13). I will soon respin this series. Thanks, Shameer 1. https://github.com/acpica/acpica/pull/682 2. https://lore.kernel.org/linux-acpi/20210406213028.718796-22-erik.kan...@intel.com/ > > Thanks > > Eric > > > > Tests: > > > > With a UEFI, that reports the RMR for the dev, > > > > [16F0h 5872 1] Type : 06 > > [16F1h 5873 2] Length : 007C > > [16F3h 5875 1] Revision : 00 > > [1038h 0056 2] Reserved : > > [1038h 0056 2] Identifier : > > [16F8h 5880 4]Mapping Count : 0001 > > [16FCh 5884 4] Mapping Offset : 0040 > > > > [1700h 5888 4]Number of RMR Descriptors : 0002 > > [1704h 5892 4]RMR Descriptor Offset : 0018 > > > > [1708h 5896 8] Base Address of RMR : E640 > > [1710h 5904 8]Length of RMR : 0010 > > [1718h 5912 4] Reserved : > > > > [171Ch 5916 8] Base Address of RMR : 27B0 > > [1724h 5924 8]Length of RMR : 00C0 > > [172Ch 5932 4] Reserved : > > > > [1730h 5936 4] Input base : > > [1734h 5940 4] ID Count : 0001 > > [1738h 5944 4] Output Base : 0003 > > [173Ch 5948 4] Output Reference : 0064 > > [1740h 5952 4]Flags (decoded below) : 0001 > >Single Mapping : 1 > > ... > > > > Without the series the RAID controller initialization fails as > > below, > > > > ... > > [ 12.631117] megaraid_sas :03:00.0: FW supports sync > cache: Yes > > [ 12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x4009 > > [ 18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED > for SCSI host 0 > > [ 23.019383] megaraid_sas :03:00.0: Waiting for FW to come to > ready state > > [ 106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault > code:0x1 subcode:0x0 func:megasas_transition_to_ready > > [ 106.695186] megaraid_sas :03:00.0: System Register set: > > [ 106.889787] megaraid_sas :03:00.0: Failed to transition controller to > ready for scsi0. > > [ 106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw > 6407 > > estuary:/$ > > > > With the series, now the kernel has direct mapping for the dev as > > below, > >
Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
Hi Shameer, On 11/19/20 1:11 PM, Shameer Kolothum wrote: > RFC v1 --> v2: > - Added a generic interface for IOMMU drivers to retrieve all the > RMR info associated with a given IOMMU. > - SMMUv3 driver gets the RMR list during probe() and installs > bypass STEs for all the SIDs in the RMR list. This is to keep > the ongoing traffic alive(if any) during SMMUv3 reset. This is >based on the suggestions received for v1 to take care of the >EFI framebuffer use case. Only sanity tested for now. > - During the probe/attach device, SMMUv3 driver reserves any > RMR region associated with the device such that there is a unity > mapping for them in SMMU. > --- > > The series adds support to IORT RMR nodes specified in IORT > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory > ranges that are used by endpoints and require a unity mapping > in SMMU. > > We have faced issues with 3408iMR RAID controller cards which > fail to boot when SMMU is enabled. This is because these controllers > make use of host memory for various caching related purposes and when > SMMU is enabled the iMR firmware fails to access these memory regions > as there is no mapping for them. IORT RMR provides a way for UEFI to > describe and report these memory regions so that the kernel can make > a unity mapping for these in SMMU. > > RFC because, Patch #1 is to update the actbl2.h and should be done > through acpica update. I have send out a pull request[1] for that. What is the state of this series? I wondered if I should consider using it for nested SMMU to avoid handling nested binding for MSI, as suggested by Jean. Are there any blocker? Thanks Eric > > Tests: > > With a UEFI, that reports the RMR for the dev, > > [16F0h 5872 1] Type : 06 > [16F1h 5873 2] Length : 007C > [16F3h 5875 1] Revision : 00 > [1038h 0056 2] Reserved : > [1038h 0056 2] Identifier : > [16F8h 5880 4]Mapping Count : 0001 > [16FCh 5884 4] Mapping Offset : 0040 > > [1700h 5888 4]Number of RMR Descriptors : 0002 > [1704h 5892 4]RMR Descriptor Offset : 0018 > > [1708h 5896 8] Base Address of RMR : E640 > [1710h 5904 8]Length of RMR : 0010 > [1718h 5912 4] Reserved : > > [171Ch 5916 8] Base Address of RMR : 27B0 > [1724h 5924 8]Length of RMR : 00C0 > [172Ch 5932 4] Reserved : > > [1730h 5936 4] Input base : > [1734h 5940 4] ID Count : 0001 > [1738h 5944 4] Output Base : 0003 > [173Ch 5948 4] Output Reference : 0064 > [1740h 5952 4]Flags (decoded below) : 0001 >Single Mapping : 1 > ... > > Without the series the RAID controller initialization fails as > below, > > ... > [ 12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes > > [ 12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x4009 > > [ 18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for > SCSI host 0 > > [ 23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready > state > [ 106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault > code:0x1 subcode:0x0 func:megasas_transition_to_ready > > [ 106.695186] megaraid_sas :03:00.0: System Register set: > > [ 106.889787] megaraid_sas :03:00.0: Failed to transition controller to > ready for scsi0. > > [ 106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407 > > estuary:/$ > > With the series, now the kernel has direct mapping for the dev as > below, > > estuary:/$ cat /sys/kernel/iommu_groups/0/reserved_regions > > 0x0800 0x080f msi > > 0x27b0 0x286f direct > > 0xe640 0xe64f direct > > estuary:/$ > > > [ 12.254318] megaraid_sas :03:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x4009 > > [ 12.739089] megaraid_sas :03:00.0: FW provided supportMaxExtLDs: 0 > max_lds: 32 > > [ 12.746628] megaraid_sas :03:00.0: controller type :
Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
On 2021/4/9 16:31, Auger Eric wrote: Hi Kunkun, On 4/9/21 6:48 AM, Kunkun Jiang wrote: Hi Eric, On 2021/4/8 20:30, Auger Eric wrote: Hi Kunkun, On 4/1/21 2:37 PM, Kunkun Jiang wrote: Hi Eric, On 2021/2/24 4:56, Eric Auger wrote: With nested stage support, soon we will need to invalidate S1 contexts and ranges tagged with an unmanaged asid, this latter being managed by the guest. So let's introduce 2 helpers that allow to invalidate with externally managed ASIDs Signed-off-by: Eric Auger --- v13 -> v14 - Actually send the NH_ASID command (reported by Xingang Wang) --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 - 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5579ec4fccc8..4c19a1114de4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, } /* IO_PGTABLE API */ -static void arm_smmu_tlb_inv_context(void *cookie) +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain, + int ext_asid) { - struct arm_smmu_domain *smmu_domain = cookie; struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cmdq_ent cmd; @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void *cookie) * insertion to guarantee those are observed before the TLBI. Do be * careful, 007. */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + if (ext_asid >= 0) { /* guest stage 1 invalidation */ + cmd.opcode = CMDQ_OP_TLBI_NH_ASID; + cmd.tlbi.asid = ext_asid; + cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid); } else { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie) arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0); } +static void arm_smmu_tlb_inv_context(void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + + __arm_smmu_tlb_inv_context(smmu_domain, -1); +} + static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, unsigned long iova, size_t size, size_t granule, @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, arm_smmu_cmdq_batch_submit(smmu, &cmds); } Here is the part of code in __arm_smmu_tlb_inv_range(): if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { /* Get the leaf page size */ tg = __ffs(smmu_domain->domain.pgsize_bitmap); /* Convert page size of 12,14,16 (log2) to 1,2,3 */ cmd->tlbi.tg = (tg - 10) / 2; /* Determine what level the granule is at */ cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); num_pages = size >> tg; } When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain-> domain.pgsize_bitmap). In nested mode, it is determined by host PAGE_SIZE. If the host kernel and guest kernel has different translation granule (e.g. host 16K, guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command. Do you have any idea about this issue? I think this is the same issue as the one reported by Chenxiang https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/ In case RIL is not supported by the host, next version will use the smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range Thanks Eric I think they are different. In normal cases, when we want to invalidate the cache of stage 1, we should use the granule size supported by vSMMU to implement and issue an tlbi command if pSMMU supports RIL. But in the current __arm_smmu_tlb_inv_range(), it always uses the granule size supported by host. (tg = __ffs(smmu_domain->domain.pgsize_bitmap);) Let me explain more clearly. Preconditions of this issue: 1. pSMMU supports RIL 2. host and guest use different translation granule (e.g. host 16K, guest 4K) this is not clear to me. See below. Guest wants to invalidate 4K, so info->granule_size = 4K. In __arm_smmu_tlb_inv_range(), if pSMMU supports RIL and host 16K, tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is an incorrect tlbi command. If the guest uses 4K granule, this means the pSMMU also supports 4K granule. Otherwise the corresponding CD is invalid (TG0/TG1 field desc). So in that case isn't it valid to send a RIL invalidation with tg = 12, right? Dose "tg = 12" come from the smallest pSMM
[PATCH 1/2] Revert "iommu/amd: Fix performance counter initialization"
From: Paul Menzel This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b. The original commit tries to address an issue, where PMC power-gating causing the IOMMU PMC pre-init test to fail on certain desktop/mobile platforms where the power-gating is normally enabled. There have been several reports that the workaround still does not guarantee to work, and can add up to 100 ms (on the worst case) to the boot process on certain platforms such as the MSI B350M MORTAR with AMD Ryzen 3 2200G. Therefore, revert this commit as a prelude to removing the pre-init test. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Signed-off-by: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- Note: I have revised the commit message to add more detail and remove uncessary information. drivers/iommu/amd/init.c | 45 ++-- 1 file changed, 11 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 321f5906e6ed..648cdfd03074 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -257,8 +256,6 @@ static enum iommu_init_state init_state = IOMMU_START_STATE; static int amd_iommu_enable_interrupts(void); static int __init iommu_go_to_state(enum iommu_init_state state); static void init_device_table_dma(void); -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); static bool amd_iommu_pre_enabled = true; @@ -1717,11 +1714,13 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) +static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, + u8 fxn, u64 *value, bool is_write); + +static void init_iommu_perf_ctr(struct amd_iommu *iommu) { - int retry; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg, save_src; + u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; @@ -1729,39 +1728,17 @@ static void __init init_iommu_perf_ctr(struct amd_iommu *iommu) amd_iommu_pc_present = true; /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, false)) - goto pc_false; - - /* -* Disable power gating by programing the performance counter -* source to 20 (i.e. counts the reads and writes from/to IOMMU -* Reserved Register [MMIO Offset 1FF8h] that are ignored.), -* which never get incremented during this init phase. -* (Note: The event is also deprecated.) -*/ - val = 20; - if (iommu_pc_get_set_reg(iommu, 0, 0, 8, &val, true)) + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) goto pc_false; /* Check if the performance counters can be written to */ - val = 0xabcd; - for (retry = 5; retry; retry--) { - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false) || - val2) - break; - - /* Wait about 20 msec for power gating to disable and retry. */ - msleep(20); - } - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true) || - iommu_pc_get_set_reg(iommu, 0, 0, 8, &save_src, true)) + if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || + (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || + (val != val2)) goto pc_false; - if (val != val2) + /* restore */ + if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) goto pc_false; pci_info(pdev, "IOMMU performance counters supported\n"); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.lnx.3.20.13.2006030935570.3...@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) Cc: Shuah Khan Cc: Alexander Monakov Cc: David Coe Cc: Paul Menzel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd/init.c | 24 +--- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 648cdfd03074..247cdda5d683 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1714,33 +1714,16 @@ static int __init init_iommu_all(struct acpi_table_header *table) return 0; } -static int iommu_pc_get_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, - u8 fxn, u64 *value, bool is_write); - static void init_iommu_perf_ctr(struct amd_iommu *iommu) { + u64 val; struct pci_dev *pdev = iommu->dev; - u64 val = 0xabcd, val2 = 0, save_reg = 0; if (!iommu_feature(iommu, FEATURE_PC)) return; amd_iommu_pc_present = true; - /* save the value to restore, if writable */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, false)) - goto pc_false; - - /* Check if the performance counters can be written to */ - if ((iommu_pc_get_set_reg(iommu, 0, 0, 0, &val, true)) || - (iommu_pc_get_set_reg(iommu, 0, 0, 0, &val2, false)) || - (val != val2)) - goto pc_false; - - /* restore */ - if (iommu_pc_get_set_reg(iommu, 0, 0, 0, &save_reg, true)) - goto pc_false; - pci_info(pdev, "IOMMU performance counters supported\n"); val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET); @@ -1748,11 +1731,6 @@ static void init_iommu_perf_ctr(struct amd_iommu *iommu) iommu->max_counters = (u8) ((val >> 7) & 0xf); return; - -pc_false: - pci_err(pdev, "Unable to read/write to IOMMU perf counter.\n"); - amd_iommu_pc_present = false; - return; } static ssize_t amd_iommu_show_cap(struct device *dev, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] iommu/amd: Revert and remove failing PMC test
This has prevented PMC to work on more recent desktop/mobile platforms, where the PMC power-gating is normally enabled. After consulting with HW designers and IOMMU maintainer, we have decide to remove the legacy test altogether to avoid future PMC enabling issues. Thanks the community for helping to test, investigate, provide data and report issues on several platforms in the field. Regards, Suravee Paul Menzel (1): Revert "iommu/amd: Fix performance counter initialization" Suravee Suthikulpanit (1): iommu/amd: Remove performance counter pre-initialization test drivers/iommu/amd/init.c | 49 ++-- 1 file changed, 2 insertions(+), 47 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
Hi Kunkun, On 4/9/21 6:48 AM, Kunkun Jiang wrote: > Hi Eric, > > On 2021/4/8 20:30, Auger Eric wrote: >> Hi Kunkun, >> >> On 4/1/21 2:37 PM, Kunkun Jiang wrote: >>> Hi Eric, >>> >>> On 2021/2/24 4:56, Eric Auger wrote: With nested stage support, soon we will need to invalidate S1 contexts and ranges tagged with an unmanaged asid, this latter being managed by the guest. So let's introduce 2 helpers that allow to invalidate with externally managed ASIDs Signed-off-by: Eric Auger --- v13 -> v14 - Actually send the NH_ASID command (reported by Xingang Wang) --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 - 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 5579ec4fccc8..4c19a1114de4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, } /* IO_PGTABLE API */ -static void arm_smmu_tlb_inv_context(void *cookie) +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain *smmu_domain, + int ext_asid) { - struct arm_smmu_domain *smmu_domain = cookie; struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cmdq_ent cmd; @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void *cookie) * insertion to guarantee those are observed before the TLBI. Do be * careful, 007. */ - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + if (ext_asid >= 0) { /* guest stage 1 invalidation */ + cmd.opcode = CMDQ_OP_TLBI_NH_ASID; + cmd.tlbi.asid = ext_asid; + cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid); } else { cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie) arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0); } +static void arm_smmu_tlb_inv_context(void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + + __arm_smmu_tlb_inv_context(smmu_domain, -1); +} + static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, unsigned long iova, size_t size, size_t granule, @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, arm_smmu_cmdq_batch_submit(smmu, &cmds); } >>> Here is the part of code in __arm_smmu_tlb_inv_range(): if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { /* Get the leaf page size */ tg = __ffs(smmu_domain->domain.pgsize_bitmap); /* Convert page size of 12,14,16 (log2) to 1,2,3 */ cmd->tlbi.tg = (tg - 10) / 2; /* Determine what level the granule is at */ cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); num_pages = size >> tg; } >>> When pSMMU supports RIL, we get the leaf page size by >>> __ffs(smmu_domain-> >>> domain.pgsize_bitmap). In nested mode, it is determined by host >>> PAGE_SIZE. If >>> the host kernel and guest kernel has different translation granule (e.g. >>> host 16K, >>> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi >>> command. >>> >>> Do you have any idea about this issue? >> I think this is the same issue as the one reported by Chenxiang >> >> https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/ >> >> >> In case RIL is not supported by the host, next version will use the >> smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range >> >> Thanks >> >> Eric > I think they are different. In normal cases, when we want to invalidate the > cache of stage 1, we should use the granule size supported by vSMMU to > implement and issue an tlbi command if pSMMU supports RIL. > > But in the current __arm_smmu_tlb_inv_range(), it always uses the granule > size supported by host. > (tg = __ffs(smmu_domain->domain.pgsize_bitmap);) > > Let me explain more clearly. > Preconditions of this issue: > 1. pSMMU supports RIL > 2. host and guest use different translation granule (e.g. host 16K, > guest 4K) this is not clear to me. See below. > > Guest wants to invalidate 4K, so info->granule_s