Re: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted
On 2019/10/15 下午3:35, Christoph Hellwig wrote: On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote: From: Thiago Jung Bauermann Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must be set by both device and guest driver. However, as a hack, when DMA API returns physical addresses, guest driver can use the DMA API; even though device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical addresses. Sorry, but this is a complete bullshit hack. Driver must always use the DMA API if they do DMA, and if virtio devices use physical addresses that needs to be returned through the platform firmware interfaces for the dma setup. If you don't do that yet (which based on previous informations you don't), you need to fix it, and we can then quirk old implementations that already are out in the field. In other words: we finally need to fix that virtio mess and not pile hacks on top of hacks. I agree, the only reason for IOMMU_PLATFORM is to make sure guest works for some old and buggy qemu when vIOMMU is enabled which seems useless (note we don't even support vIOMMU migration in that case). Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
Currently hyperv-iommu is implemented in a x86 specific way, for example, apic is used. So make the HYPERV_IOMMU Kconfig depend on X86 as a preparation for enabling HyperV on architecture other than x86. Cc: Lan Tianyu Cc: Michael Kelley Cc: linux-hyp...@vger.kernel.org Signed-off-by: Boqun Feng (Microsoft) --- Without this patch, I could observe compile error: | drivers/iommu/hyperv-iommu.c:17:10: fatal error: asm/apic.h: No such | file or directory | 17 | #include | | ^~~~ , after apply Michael's ARM64 on HyperV enablement patchset. drivers/iommu/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e3842eabcfdd..f1086eaed41c 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -467,7 +467,7 @@ config QCOM_IOMMU config HYPERV_IOMMU bool "Hyper-V x2APIC IRQ Handling" - depends on HYPERV + depends on HYPERV && X86 select IOMMU_API default HYPERV help -- 2.23.0
Re: [PATCH -next] iommu/amd: fix a warning in increase_address_space
On Wed Oct 16 19, Jerry Snitselaar wrote: On Wed Oct 16 19, Qian Cai wrote: BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks suspicious. Not sure what the purpose of it. *updated = increase_address_space(domain, gfp) || *updated; Looking at it again I think that isn't an issue really, it would just not lose updated being set in a previous loop iteration, but now I'm wondering about the loop itself. In the cases where it would return false, how does the evaluation of the condition for the while loop change? I guess the mode level 6 check is really for other potential callers increase_address_space, none exist at the moment, and the condition of the while loop in alloc_pte should fail if the mode level is 6.
Re: [PATCH -next] iommu/amd: fix a warning in increase_address_space
On Wed Oct 16 19, Qian Cai wrote: After the commit 754265bcab78 ("iommu/amd: Fix race in increase_address_space()"), it could still possible trigger a race condition under some heavy memory pressure below. The race to trigger a warning is, CPU0: CPU1: in alloc_pte(): in increase_address_space(): while (address > PM_LEVEL_SIZE(domain->mode)) [1] spin_lock_irqsave(>lock domain->mode+= 1; spin_unlock_irqrestore(>lock in increase_address_space(): spin_lock_irqsave(>lock if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL)) [1] domain->mode = 5 It is unclear the triggering of the warning is the root cause of the smartpqi offline yet, but let's fix it first by lifting the locking. WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474 iommu_map_page+0x718/0x7e0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec flags=0x0010] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1000 flags=0x0010] CPU: 57 PID: 124314 Comm: oom01 Tainted: G O Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 RIP: 0010:iommu_map_page+0x718/0x7e0 Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8 08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48 8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48 RSP: 0018:888da4816cb8 EFLAGS: 00010046 RAX: RBX: 8885fe689000 RCX: 96f4a6c4 RDX: 0007 RSI: dc00 RDI: 8885fe689124 RBP: 888da4816da8 R08: ed10bfcd120e R09: ed10bfcd120e R10: ed10bfcd120d R11: 8885fe68906b R12: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1a00 flags=0x0010] R13: 8885fe689068 R14: 8885fe689124 R15: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1e00 flags=0x0010] FS: 7f29722ba700() GS:88902f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f27f82d8000 CR3: 00102ed9c000 CR4: 003406e0 Call Trace: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2000 flags=0x0010] map_sg+0x1ce/0x2f0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2400 flags=0x0010] scsi_dma_map+0xd7/0x160 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2800 flags=0x0010] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2c00 flags=0x0010] scsi_queue_rq+0xd19/0x1360 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec3000 flags=0x0010] __blk_mq_try_issue_directly+0x295/0x3f0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec3400 flags=0x0010] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x address=0xffec3800 flags=0x0010] blk_mq_request_issue_directly+0xb5/0x100 AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x address=0xffec3c00 flags=0x0010] blk_mq_try_issue_list_directly+0xa9/0x160 blk_mq_sched_insert_requests+0x228/0x380 blk_mq_flush_plug_list+0x448/0x7e0 blk_flush_plug_list+0x1eb/0x230 blk_finish_plug+0x43/0x5d shrink_node_memcg+0x9c5/0x1550 smartpqi :23:00.0: controller is offline: status code 0x14803 smartpqi :23:00.0: controller offline Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()") Signed-off-by: Qian Cai --- BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks suspicious. Not sure what the purpose of it. *updated = increase_address_space(domain, gfp) || *updated; Looking at it again I think that isn't an issue really, it would just not lose updated being set in a previous loop iteration, but now I'm wondering about the loop itself. In the cases where it would return false, how does the evaluation of the condition for the while loop change? drivers/iommu/amd_iommu.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 2369b8af81f3..a5754068aa29 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain *domain) static bool increase_address_space(struct protection_domain *domain, gfp_t gfp) { - unsigned long flags; bool ret = false; u64 *pte; - spin_lock_irqsave(>lock, flags); - if (WARN_ON_ONCE(domain->mode ==
[PATCH -next] iommu/amd: fix a warning in increase_address_space
After the commit 754265bcab78 ("iommu/amd: Fix race in increase_address_space()"), it could still possible trigger a race condition under some heavy memory pressure below. The race to trigger a warning is, CPU0: CPU1: in alloc_pte(): in increase_address_space(): while (address > PM_LEVEL_SIZE(domain->mode)) [1] spin_lock_irqsave(>lock domain->mode+= 1; spin_unlock_irqrestore(>lock in increase_address_space(): spin_lock_irqsave(>lock if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL)) [1] domain->mode = 5 It is unclear the triggering of the warning is the root cause of the smartpqi offline yet, but let's fix it first by lifting the locking. WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474 iommu_map_page+0x718/0x7e0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec flags=0x0010] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1000 flags=0x0010] CPU: 57 PID: 124314 Comm: oom01 Tainted: G O Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 RIP: 0010:iommu_map_page+0x718/0x7e0 Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8 08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48 8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48 RSP: 0018:888da4816cb8 EFLAGS: 00010046 RAX: RBX: 8885fe689000 RCX: 96f4a6c4 RDX: 0007 RSI: dc00 RDI: 8885fe689124 RBP: 888da4816da8 R08: ed10bfcd120e R09: ed10bfcd120e R10: ed10bfcd120d R11: 8885fe68906b R12: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1a00 flags=0x0010] R13: 8885fe689068 R14: 8885fe689124 R15: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec1e00 flags=0x0010] FS: 7f29722ba700() GS:88902f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f27f82d8000 CR3: 00102ed9c000 CR4: 003406e0 Call Trace: smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2000 flags=0x0010] map_sg+0x1ce/0x2f0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2400 flags=0x0010] scsi_dma_map+0xd7/0x160 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2800 flags=0x0010] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec2c00 flags=0x0010] scsi_queue_rq+0xd19/0x1360 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec3000 flags=0x0010] __blk_mq_try_issue_directly+0x295/0x3f0 smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x address=0xffec3400 flags=0x0010] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x address=0xffec3800 flags=0x0010] blk_mq_request_issue_directly+0xb5/0x100 AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x address=0xffec3c00 flags=0x0010] blk_mq_try_issue_list_directly+0xa9/0x160 blk_mq_sched_insert_requests+0x228/0x380 blk_mq_flush_plug_list+0x448/0x7e0 blk_flush_plug_list+0x1eb/0x230 blk_finish_plug+0x43/0x5d shrink_node_memcg+0x9c5/0x1550 smartpqi :23:00.0: controller is offline: status code 0x14803 smartpqi :23:00.0: controller offline Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()") Signed-off-by: Qian Cai --- BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks suspicious. Not sure what the purpose of it. *updated = increase_address_space(domain, gfp) || *updated; drivers/iommu/amd_iommu.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 2369b8af81f3..a5754068aa29 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain *domain) static bool increase_address_space(struct protection_domain *domain, gfp_t gfp) { - unsigned long flags; bool ret = false; u64 *pte; - spin_lock_irqsave(>lock, flags); - if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL)) /* address space already 64 bit large */ goto out; @@ -1487,8 +1484,6 @@ static bool increase_address_space(struct protection_domain *domain, ret = true; out: - spin_unlock_irqrestore(>lock, flags); - return ret; } @@ -1499,14
[bug report] iommu/amd: Pass gfp-flags to iommu_map_page()
Hello Joerg Roedel, The patch b911b89b6d01: "iommu/amd: Pass gfp-flags to iommu_map_page()" from Jul 5, 2016, leads to the following static checker warning: drivers/iommu/amd_iommu.c:2545 amd_iommu_map() warn: use 'gfp' here instead of GFP_XXX? drivers/iommu/amd_iommu.c 2529 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, 2530 phys_addr_t paddr, size_t page_size, int iommu_prot, 2531 gfp_t gfp) ^ I don't know why I'm suddenly getting this warning even though the code is three years old... 2532 { 2533 struct protection_domain *domain = to_pdomain(dom); 2534 int prot = 0; 2535 int ret; 2536 2537 if (domain->mode == PAGE_MODE_NONE) 2538 return -EINVAL; 2539 2540 if (iommu_prot & IOMMU_READ) 2541 prot |= IOMMU_PROT_IR; 2542 if (iommu_prot & IOMMU_WRITE) 2543 prot |= IOMMU_PROT_IW; 2544 2545 ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); ^^ But it does seem like maybe gfp was intended here? 2546 2547 domain_flush_np_cache(domain, iova, page_size); 2548 2549 return ret; 2550 } See also: drivers/iommu/dma-iommu.c:625 iommu_dma_alloc_remap() warn: use 'gfp' here instead of GFP_XXX? regards, dan carpenter ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[linux-next:master 4470/4908] drivers/iommu/iommu.c:1857:5: sparse: sparse: symbol '__iommu_map' was not declared. Should it be static?
tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master head: 78662bffde37ccbb66ac3311fa709d8960435e98 commit: 781ca2de89bae1b1d2c96df9ef33e9a324415995 [4470/4908] iommu: Add gfp parameter to iommu_ops::map reproduce: # apt-get install sparse # sparse version: v0.6.1-dirty git checkout 781ca2de89bae1b1d2c96df9ef33e9a324415995 make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' If you fix the issue, kindly add following tag Reported-by: kbuild test robot sparse warnings: (new ones prefixed by >>) drivers/iommu/iommu.c:292:5: sparse: sparse: symbol 'iommu_insert_resv_region' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH linux-next] iommu: Add gfp parameter to iommu_ops:: __iommu_map() can be static
Fixes: 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map") Signed-off-by: kbuild test robot --- iommu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f8853dbf1c4eb..89bfdbbfaa11b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1854,7 +1854,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain, return pgsize; } -int __iommu_map(struct iommu_domain *domain, unsigned long iova, +static int __iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { const struct iommu_ops *ops = domain->ops;
Re: [PATCH] iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved
On 10/16/2019 12:51 AM, Joerg Roedel wrote: Hi, On Tue, Oct 15, 2019 at 09:49:32AM -0700, Yian Chen wrote: VT-d RMRR (Reserved Memory Region Reporting) regions are reserved for device use only and should not be part of allocable memory pool of OS. BIOS e820_table reports complete memory map to OS, including OS usable memory ranges and BIOS reserved memory ranges etc. x86 BIOS may not be trusted to include RMRR regions as reserved type of memory in its e820 memory map, hence validate every RMRR entry with the e820 memory map to make sure the RMRR regions will not be used by OS for any other purposes. Are there real systems in the wild where this is a problem? Firmware reports e820 and RMRR in separate structure. The system will not work stably if RMRR is not in the e820 table as reserved type mem and can be used for other purposes. In system engineering phase, I practiced with some kind bugs from BIOS, but not yet exactly same as the one here. Please consider this is a generic patch to avoid subsequent failure at runtime. +static inline int __init +arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) +{ + u64 start = rmrr->base_address; + u64 end = rmrr->end_address + 1; + + if (e820__mapped_all(start, end, E820_TYPE_RESERVED)) + return 0; + + pr_err(FW_BUG "No firmware reserved region can cover this RMRR [%#018Lx-%#018Lx], contact BIOS vendor for fixes\n", + start, end - 1); + return -EFAULT; +} Why -EFAULT, there is no fault involved? Usibg -EINVAL seems to be a better choice. -EFAULT could be used for address related errors. For this case, I agree, -EINVAL seems better while consider it as an input problem from firmware. I will make change. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu/dmar: collect fault statistics
Hi Yuri, On Tue, Oct 15, 2019 at 05:11:11PM +0200, Yuri Volchkov wrote: > Currently dmar_fault handler only prints a message in the dmesg. This > commit introduces counters - how many faults have happened, and > exposes them via sysfs. Each pci device will have an entry > 'dmar_faults' reading from which will give user 3 lines > remap: xxx > read: xxx > write: xxx I think you should have three files instead of putting all these in a single file. See https://lore.kernel.org/r/20190621072911.ga21...@kroah.com They should also be documented in Documentation/ABI/ I'm not sure this should be DMAR-specific. Couldn't we count similar events for other IOMMUs as well? > This functionality is targeted for health monitoring daemons. > > Signed-off-by: Yuri Volchkov > --- > drivers/iommu/dmar.c| 133 +++- > drivers/pci/pci-sysfs.c | 20 ++ > include/linux/intel-iommu.h | 3 + > include/linux/pci.h | 11 +++ > 4 files changed, 150 insertions(+), 17 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index eecd6a421667..0749873e3e41 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1107,6 +1107,7 @@ static void free_iommu(struct intel_iommu *iommu) > } > > if (iommu->irq) { > + destroy_workqueue(iommu->fault_wq); > if (iommu->pr_irq) { > free_irq(iommu->pr_irq, iommu); > dmar_free_hwirq(iommu->pr_irq); > @@ -1672,9 +1673,46 @@ void dmar_msi_read(int irq, struct msi_msg *msg) > raw_spin_unlock_irqrestore(>register_lock, flag); > } > > -static int dmar_fault_do_one(struct intel_iommu *iommu, int type, > - u8 fault_reason, int pasid, u16 source_id, > - unsigned long long addr) > +struct dmar_fault_info { > + struct work_struct work; > + struct intel_iommu *iommu; > + int type; > + int pasid; > + u16 source_id; > + unsigned long long addr; > + u8 fault_reason; > +}; > + > +static struct kmem_cache *dmar_fault_info_cache; > +int __init dmar_fault_info_cache_init(void) > +{ > + int ret = 0; > + > + dmar_fault_info_cache = > + kmem_cache_create("dmar_fault_info", > + sizeof(struct dmar_fault_info), 0, > + SLAB_HWCACHE_ALIGN, NULL); > + if (!dmar_fault_info_cache) { > + pr_err("Couldn't create dmar_fault_info cache\n"); > + ret = -ENOMEM; > + } > + > + return ret; > +} > + > +static inline void *alloc_dmar_fault_info(void) > +{ > + return kmem_cache_alloc(dmar_fault_info_cache, GFP_ATOMIC); > +} > + > +static inline void free_dmar_fault_info(void *vaddr) > +{ > + kmem_cache_free(dmar_fault_info_cache, vaddr); > +} > + > +static int dmar_fault_dump_one(struct intel_iommu *iommu, int type, > +u8 fault_reason, int pasid, u16 source_id, > +unsigned long long addr) > { > const char *reason; > int fault_type; > @@ -1695,6 +1733,57 @@ static int dmar_fault_do_one(struct intel_iommu > *iommu, int type, > return 0; > } > > +static int dmar_fault_handle_one(struct dmar_fault_info *info) > +{ > + struct pci_dev *pdev; > + u8 devfn; > + atomic_t *pcnt; > + > + devfn = PCI_DEVFN(PCI_SLOT(info->source_id), PCI_FUNC(info->source_id)); > + pdev = pci_get_domain_bus_and_slot(info->iommu->segment, > +PCI_BUS_NUM(info->source_id), devfn); I'm sure you've considered this already, but it's not completely clear to me whether these counters should be in the pci_dev (as in this patch) or in something IOMMU-related. The pci_dev is nice because you automatically have counters for each PCI device, and most faults can be tied back to a device. But on the other hand, it's not the PCI device actually detecting and reporting the error. It's the IOMMU reporting the fault, and while it's *likely* there's a pci_dev corresponding to the bus/device/function info in the IOMMU error registers, there's no guarantee: the device may have been hot-removed between reading the IOMMU registers and doing the pci_get_domain_bus_and_slot(), or (I suspect) faults could be caused by corrupted or malicious TLPs. Another possible issue is that if the counts are in the pci_dev, they're lost if the device is removed, which might happen while diagnosing faulty hardware. So I tend to think this is really IOMMU error information and the IOMMU driver should handle it itself, including logging and exposing it via sysfs. > + if (!pdev) > + return -ENXIO; > + > + if (info->fault_reason == INTR_REMAP) > + pcnt = >faults_cnt.remap; > + else if (info->type) > + pcnt = >faults_cnt.read; > + else > + pcnt = >faults_cnt.write; > + > + atomic_inc(pcnt); pci_get_domain_bus_and_slot() increments pdev's
[PATCH] iommu/dma: Relax locking in iommu_dma_prepare_msi()
Since commit ece6e6f0218b ("iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts"), iommu_dma_prepare_msi() should no longer have to worry about preempting itself, nor being called in atomic context at all. Thus we can downgrade the IRQ-safe locking to a simple mutex to avoid angering the new might_sleep() check in iommu_map(). Reported-by: Qian Cai Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f321279baf9e..4ba3c49de017 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -43,7 +44,6 @@ struct iommu_dma_cookie { dma_addr_t msi_iova; }; struct list_headmsi_page_list; - spinlock_t msi_lock; /* Domain for flush queue callback; NULL if flush queue not in use */ struct iommu_domain *fq_domain; @@ -62,7 +62,6 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type) cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); if (cookie) { - spin_lock_init(>msi_lock); INIT_LIST_HEAD(>msi_page_list); cookie->type = type; } @@ -1150,7 +1149,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (msi_page->phys == msi_addr) return msi_page; - msi_page = kzalloc(sizeof(*msi_page), GFP_ATOMIC); + msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL); if (!msi_page) return NULL; @@ -1180,7 +1179,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie; struct iommu_dma_msi_page *msi_page; - unsigned long flags; + static DEFINE_MUTEX(msi_prepare_lock); if (!domain || !domain->iova_cookie) { desc->iommu_cookie = NULL; @@ -1190,13 +1189,12 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) cookie = domain->iova_cookie; /* -* We disable IRQs to rule out a possible inversion against -* irq_desc_lock if, say, someone tries to retarget the affinity -* of an MSI from within an IPI handler. +* In fact we should already be serialised by irq_domain_mutex +* here, but that's way subtle, so belt and braces... */ - spin_lock_irqsave(>msi_lock, flags); + mutex_lock(_prepare_lock); msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain); - spin_unlock_irqrestore(>msi_lock, flags); + mutex_unlock(_prepare_lock); msi_desc_set_iommu_cookie(desc, msi_page); -- 2.21.0.dirty
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
Hi Robin, On 16/10/2019 17:09, Robin Murphy wrote: On 16/10/2019 17:03, Joerg Roedel wrote: On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. Right, since the iommu_dma_prepare_msi() operation was broken out to happen at MSI domain setup time, I don't think the comment in there applies any more, so we can probably stop disabling irqs around the iommu_dma_get_msi_page() call. Yes, I agree that it does not need to be atomic anymore. Cheers, -- Julien Grall
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On 16/10/2019 17:11, Qian Cai wrote: On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote: On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be called from an atomic context as showed in the arm64 call traces, +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + might_sleep(); + return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); +} Also note that it's *only* the might_sleep() at issue here - none of the arm64 IOMMU drivers have been converted to look at the new gfp argument yet, so anything they actually allocate while mapping will still be GFP_ATOMIC anyway. (Carrying that flag down through the whole io-pgtable stack is on my to-do list...) Robin.
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote: > On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: > > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > > The x86 one might just be a mistake. > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index ad05484d0c80..63c4b894751d 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, > > unsigned long iova, > > if (iommu_prot & IOMMU_WRITE) > > prot |= IOMMU_PROT_IW; > > > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, > > GFP_KERNEL); > > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); > > Yeah, that is a bug, I spotted that too. > > > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page > > *iommu_dma_get_msi_page(struct device *dev, > > if (!iova) > > goto out_free_page; > > > > - if (iommu_map(domain, iova, msi_addr, size, prot)) > > + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) > > goto out_free_iova; > > Not so sure this is a bug, this code is only about setting up MSIs on > ARM. It probably doesn't need to be atomic. The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be called from an atomic context as showed in the arm64 call traces, +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + might_sleep(); + return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); +}
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On 16/10/2019 17:03, Joerg Roedel wrote: On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. Right, since the iommu_dma_prepare_msi() operation was broken out to happen at MSI domain setup time, I don't think the comment in there applies any more, so we can probably stop disabling irqs around the iommu_dma_get_msi_page() call. Robin.
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > The x86 one might just be a mistake. > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index ad05484d0c80..63c4b894751d 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, > unsigned long iova, > if (iommu_prot & IOMMU_WRITE) > prot |= IOMMU_PROT_IW; > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, > GFP_KERNEL); > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page > *iommu_dma_get_msi_page(struct device *dev, > if (!iova) > goto out_free_page; > > - if (iommu_map(domain, iova, msi_addr, size, prot)) > + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) > goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. Regards, Joerg
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > Hi Qian, > > thanks for the report! > > On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > > > Today's linux-next generates a lot of warnings on multiple servers during > > > boot > > > due to the series "iommu/amd: Convert the AMD iommu driver to the > > > dma-iommu api" > > > [1]. Reverted the whole things fixed them. > > > > > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/ > > > > > > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add > > gfp > > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting > > the > > correct warning. The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); The arm64 -- does it forget to do this? diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ecc08aef9b58..8dd0ef0656f4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; INIT_LIST_HEAD(_page->list); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > correct warning. Can you please test this small fix: diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 78a2cca3ac5c..e7a4464e8594 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2562,7 +2562,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
Hi Qian, thanks for the report! On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > > Today's linux-next generates a lot of warnings on multiple servers during > > boot > > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu > > api" > > [1]. Reverted the whole things fixed them. > > > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/ > > > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > correct warning. I am looking into it and may send you fixes to test. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy
On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > Today's linux-next generates a lot of warnings on multiple servers during boot > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu > api" > [1]. Reverted the whole things fixed them. > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/ > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the correct warning. [ 564.365768][ T6222] BUG: sleeping function called from invalid context at mm/page_alloc.c:4692 [ 564.374447][ T6222] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6222, name: git [ 564.382969][ T6222] INFO: lockdep is turned off. [ 564.387644][ T6222] CPU: 25 PID: 6222 Comm: git Tainted: GW 5.4.0-rc3-next-20191016 #6 [ 564.397011][ T6222] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 564.406291][ T6222] Call Trace: [ 564.409470][ T6222] dump_stack+0x86/0xca [ 564.413517][ T6222] ___might_sleep.cold.92+0xd2/0x122 [ 564.418694][ T6222] __might_sleep+0x73/0xe0 [ 564.422999][ T6222] __alloc_pages_nodemask+0x442/0x720 [ 564.428265][ T6222] ? __alloc_pages_slowpath+0x18d0/0x18d0 [ 564.433883][ T6222] ? arch_stack_walk+0x7f/0xf0 [ 564.438534][ T6222] ? create_object+0x4a2/0x540 [ 564.443188][ T6222] alloc_pages_current+0x9c/0x110 [ 564.448098][ T6222] __get_free_pages+0x12/0x60 [ 564.452659][ T6222] get_zeroed_page+0x16/0x20 [ 564.457137][ T6222] amd_iommu_map+0x504/0x850 [ 564.461612][ T6222] ? amd_iommu_domain_direct_map+0x60/0x60 [ 564.467312][ T6222] ? lockdep_hardirqs_on+0x16/0x2a0 [ 564.472400][ T6222] ? alloc_iova+0x189/0x210 [ 564.476790][ T6222] __iommu_map+0x1c1/0x4e0 [ 564.481090][ T6222] ? iommu_get_dma_domain+0x40/0x40 [ 564.486181][ T6222] ? alloc_iova_fast+0x258/0x3d1 [ 564.491009][ T6222] ? create_object+0x4a2/0x540 [ 564.495656][ T6222] __iommu_map_sg+0xa5/0x130 [ 564.500135][ T6222] iommu_map_sg_atomic+0x14/0x20 [ 564.504958][ T6222] iommu_dma_map_sg+0x2c3/0x450 [ 564.509699][ T6222] scsi_dma_map+0xd7/0x160 [ 564.514010][ T6222] pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420 [smartpqi] [ 564.521811][ T6222] ? pqi_alloc_io_request+0x127/0x140 [smartpqi] [ 564.528037][ T6222] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] [ 564.534264][ T6222] ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi] [ 564.541100][ T6222] ? sd_init_command+0xa25/0x1346 [sd_mod] [ 564.546802][ T6222] scsi_queue_rq+0xd19/0x1360 [ 564.551372][ T6222] __blk_mq_try_issue_directly+0x295/0x3f0 [ 564.557071][ T6222] ? blk_mq_request_bypass_insert+0xd0/0xd0 [ 564.562860][ T6222] ? debug_lockdep_rcu_enabled+0x27/0x60 [ 564.568384][ T6222] blk_mq_try_issue_directly+0xad/0x130 [ 564.573821][ T6222] ? __blk_mq_try_issue_directly+0x3f0/0x3f0 [ 564.579693][ T6222] ? blk_add_rq_to_plug+0xcd/0x110 [ 564.584693][ T6222] blk_mq_make_request+0xcee/0x1120 [ 564.589777][ T6222] ? lock_downgrade+0x3c0/0x3c0 [ 564.594517][ T6222] ? blk_mq_try_issue_directly+0x130/0x130 [ 564.600218][ T6222] ? blk_queue_enter+0x78d/0x810 [ 564.605041][ T6222] ? generic_make_request_checks+0xf30/0xf30 [ 564.610915][ T6222] ? lock_downgrade+0x3c0/0x3c0 [ 564.615655][ T6222] ? __srcu_read_unlock+0x24/0x50 [ 564.620565][ T6222] ? generic_make_request+0x150/0x650 [ 564.625833][ T6222] generic_make_request+0x196/0x650 [ 564.630921][ T6222] ? blk_queue_enter+0x810/0x810 [ 564.635747][ T6222] submit_bio+0xaa/0x270 [ 564.639873][ T6222] ? submit_bio+0xaa/0x270 [ 564.644172][ T6222] ? generic_make_request+0x650/0x650 [ 564.649437][ T6222] ? iomap_readpage+0x260/0x260 [ 564.654173][ T6222] iomap_readpages+0x154/0x3d0 [ 564.658823][ T6222] ? iomap_zero_range_actor+0x330/0x330 [ 564.664257][ T6222] ? __might_sleep+0x73/0xe0 [ 564.668836][ T6222] xfs_vm_readpages+0xaf/0x1f0 [xfs] [ 564.674016][ T6222] read_pages+0xe2/0x3b0 [ 564.678142][ T6222] ? read_cache_pages+0x350/0x350 [ 564.683057][ T6222] ? __page_cache_alloc+0x12c/0x230 [ 564.688148][ T6222] __do_page_cache_readahead+0x346/0x3a0 [ 564.693670][ T6222] ? read_pages+0x3b0/0x3b0 [ 564.698059][ T6222] ? lockdep_hardirqs_on+0x16/0x2a0 [ 564.703247][ T6222] ? __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 564.708947][ T6222] filemap_fault+0xa13/0xe70 [ 564.713528][ T6222] __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 564.719059][ T6222] ? kmemleak_alloc+0x57/0x90 [ 564.723724][ T6222] ? xfs_file_read_iter+0x3c0/0x3c0 [xfs] [ 564.729337][ T6222] ? debug_check_no_locks_freed+0x2c/0xe0 [ 564.734946][ T6222] ? lockdep_init_map+0x8b/0x2b0 [ 564.739872][ T6222] xfs_filemap_fault+0x68/0x70 [xfs] [ 564.745046][ T6222] __do_fault+0x83/0x220 [ 564.749172][ T6222] __handle_mm_fault+0xd76/0x1f40 [ 564.754084][ T6222] ? __pmd_alloc+0x280/0x280 [ 564.758559][ T6222] ? debug_lockdep_
"Convert the AMD iommu driver to the dma-iommu api" is buggy
Today's linux-next generates a lot of warnings on multiple servers during boot due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api" [1]. Reverted the whole things fixed them. [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/ [ 257.785749][ T6184] BUG: sleeping function called from invalid context at mm/page_alloc.c:4692 [ 257.794886][ T6184] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6184, name: readelf [ 257.803574][ T6184] INFO: lockdep is turned off. [ 257.808233][ T6184] CPU: 86 PID: 6184 Comm: readelf Tainted: GW 5.4.0-rc3-next-20191016+ #7 [ 257.818035][ T6184] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 257.827313][ T6184] Call Trace: [ 257.830487][ T6184] dump_stack+0x86/0xca [ 257.834530][ T6184] ___might_sleep.cold.92+0xd2/0x122 [ 257.839708][ T6184] __might_sleep+0x73/0xe0 [ 257.844011][ T6184] __alloc_pages_nodemask+0x442/0x720 [ 257.849274][ T6184] ? __alloc_pages_slowpath+0x18d0/0x18d0 [ 257.854886][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60 [ 257.860415][ T6184] ? lock_downgrade+0x3c0/0x3c0 [ 257.865156][ T6184] alloc_pages_current+0x9c/0x110 [ 257.870071][ T6184] __get_free_pages+0x12/0x60 [ 257.874634][ T6184] get_zeroed_page+0x16/0x20 [ 257.879112][ T6184] amd_iommu_map+0x504/0x850 [ 257.883588][ T6184] ? amd_iommu_domain_direct_map+0x60/0x60 [ 257.889286][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0 [ 257.894373][ T6184] ? alloc_iova+0x189/0x210 [ 257.898765][ T6184] ? trace_hardirqs_on+0x3a/0x160 [ 257.903677][ T6184] iommu_map+0x1b3/0x4d0 [ 257.907802][ T6184] ? iommu_unmap+0xf0/0xf0 [ 257.912104][ T6184] ? alloc_iova_fast+0x258/0x3d1 [ 257.916929][ T6184] ? create_object+0x4a2/0x540 [ 257.921579][ T6184] iommu_map_sg+0x9d/0x120 [ 257.925882][ T6184] iommu_dma_map_sg+0x2c3/0x450 [ 257.930627][ T6184] scsi_dma_map+0xd7/0x160 [ 257.934936][ T6184] pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420 [smartpqi] [ 257.942735][ T6184] ? pqi_alloc_io_request+0x127/0x140 [smartpqi] [ 257.948962][ T6184] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] [ 257.955192][ T6184] ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi] [ 257.962029][ T6184] ? sd_init_command+0xa25/0x1346 [sd_mod] [ 257.967730][ T6184] scsi_queue_rq+0xd19/0x1360 [ 257.972298][ T6184] __blk_mq_try_issue_directly+0x295/0x3f0 [ 257.977999][ T6184] ? blk_mq_request_bypass_insert+0xd0/0xd0 [ 257.983787][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60 [ 257.989312][ T6184] blk_mq_request_issue_directly+0xb5/0x100 [ 257.995098][ T6184] ? blk_mq_flush_plug_list+0x7e0/0x7e0 [ 258.000537][ T6184] ? blk_mq_sched_insert_requests+0xd6/0x380 [ 258.006409][ T6184] ? lock_downgrade+0x3c0/0x3c0 [ 258.011147][ T6184] blk_mq_try_issue_list_directly+0xa9/0x160 [ 258.017023][ T6184] blk_mq_sched_insert_requests+0x228/0x380 [ 258.022810][ T6184] blk_mq_flush_plug_list+0x448/0x7e0 [ 258.028073][ T6184] ? blk_mq_insert_requests+0x380/0x380 [ 258.033516][ T6184] blk_flush_plug_list+0x1eb/0x230 [ 258.038515][ T6184] ? blk_insert_cloned_request+0x1b0/0x1b0 [ 258.044215][ T6184] blk_finish_plug+0x43/0x5d [ 258.048695][ T6184] read_pages+0xf6/0x3b0 [ 258.052823][ T6184] ? read_cache_pages+0x350/0x350 [ 258.057737][ T6184] ? __page_cache_alloc+0x12c/0x230 [ 258.062826][ T6184] __do_page_cache_readahead+0x346/0x3a0 [ 258.068348][ T6184] ? read_pages+0x3b0/0x3b0 [ 258.072738][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0 [ 258.077928][ T6184] ? __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 258.083625][ T6184] filemap_fault+0xa13/0xe70 [ 258.088201][ T6184] __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 258.093731][ T6184] ? kmemleak_alloc+0x57/0x90 [ 258.098397][ T6184] ? xfs_file_read_iter+0x3c0/0x3c0 [xfs] [ 258.104009][ T6184] ? debug_check_no_locks_freed+0x2c/0xe0 [ 258.109618][ T6184] ? lockdep_init_map+0x8b/0x2b0 [ 258.114543][ T6184] xfs_filemap_fault+0x68/0x70 [xfs] [ 258.119720][ T6184] __do_fault+0x83/0x220 [ 258.123847][ T6184] __handle_mm_fault+0xd76/0x1f40 [ 258.128757][ T6184] ? __pmd_alloc+0x280/0x280 [ 258.133231][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60 [ 258.138755][ T6184] ? handle_mm_fault+0x178/0x4c0 [ 258.143581][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0 [ 258.148674][ T6184] ? __do_page_fault+0x29c/0x640 [ 258.153501][ T6184] handle_mm_fault+0x205/0x4c0 [ 258.158151][ T6184] __do_page_fault+0x29c/0x640 [ 258.162800][ T6184] do_page_fault+0x50/0x37f [ 258.167189][ T6184] page_fault+0x34/0x40 [ 258.171228][ T6184] RIP: 0010:__clear_user+0x3b/0x70 [ 183.553150] BUG: sleeping function called from invalid context at drivers/iommu/iommu.c:1950 [ 183.562306] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1486, name: kworker/0:3 [ 183.571450] 5 locks held by kworker/0:3/1486: [ 183.576510] #0: 44ff0008000ce128 ((wq_completion)events){+.+.}, at: process_one_work+0
Re: [PATCH 1/5] iommu: Implement iommu_put_resv_regions_simple()
On Wed, Sep 18, 2019 at 04:37:38PM +0100, Will Deacon wrote: > On Thu, Aug 29, 2019 at 01:17:48PM +0200, Thierry Reding wrote: > > From: Thierry Reding > > > > Implement a generic function for removing reserved regions. This can be > > used by drivers that don't do anything fancy with these regions other > > than allocating memory for them. > > > > Signed-off-by: Thierry Reding > > --- > > drivers/iommu/iommu.c | 19 +++ > > include/linux/iommu.h | 2 ++ > > 2 files changed, 21 insertions(+) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 0f585b614657..73a2a6b13507 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -2170,6 +2170,25 @@ void iommu_put_resv_regions(struct device *dev, > > struct list_head *list) > > ops->put_resv_regions(dev, list); > > } > > > > +/** > > + * iommu_put_resv_regions_simple - Reserved region driver helper > > + * @dev: device for which to free reserved regions > > + * @list: reserved region list for device > > + * > > + * IOMMU drivers can use this to implement their .put_resv_regions() > > callback > > + * for simple reservations. Memory allocated for each reserved region will > > be > > + * freed. If an IOMMU driver allocates additional resources per region, it > > is > > + * going to have to implement a custom callback. > > + */ > > +void iommu_put_resv_regions_simple(struct device *dev, struct list_head > > *list) > > +{ > > + struct iommu_resv_region *entry, *next; > > + > > + list_for_each_entry_safe(entry, next, list, list) > > + kfree(entry); > > +} > > +EXPORT_SYMBOL(iommu_put_resv_regions_simple); > > Can you call this directly from iommu_put_resv_regions() if the function > pointer in ops is NULL? That would save having to plumb the default callback > into a bunch of drivers. I probably could, but I don't think that necessarily improves things. The reason is that that would cause the helper to get called even if the driver doesn't support reserved regions. That's likely harmless because in that case the list of regions passed to it would be empty. However, I think the current way to do this, where we have to implement both hooks for ->get_resv_regions() and ->put_resv_regions() is nicely symmetric. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support
Hi Robin, Two significant concerns right off the bat: - It seems more common than not for silicon designers to fail to implement IIDR correctly, so it's only a matter of time before inevitably needing to bring back some firmware-level identifier abstraction (if not already - does Hi161x have PMCGs?) Maybe there's a way that we can switch to this method, and leave the door open for an easy way to support firmware-level identifier again, if ever needed. I'm not too pushed - this was secondary to just allowing the PMCG driver know the associated SMMU model. But that's the part I'm not buying - there's no clear advantage to pushing that complexity down into the PMCG driver, vs. leaving the IORT code responsible for translating an SMMU model into a PMCG model, yet the aforementioned disadvantages jump out right away. One advantage is that the next piece of quirky hw with a properly implemented IIDR does not require a new IORT model. And today, this handling is only for hi1620, and since we can use hi1620 IIDR to id it, then it seems good to remove code outside the PMCG driver specifically to handle it. But if you think it's going to be needed again, then it makes sense not to remove it. And, no, hi161x does not have any PMCGs. Hooray, I guess :) - This seems like a step in entirely the wrong direction for supporting . So to support PMCGs that reference a Named Component or Root Complex, I thought that the IORT parsing code would have to do some secondary lookup to the associated SMMU, through the Named Component or Root Complex node. What was your idea here? The associated SMMU has no relevance in that context - the reason for the Node Reference to point to a non-SMMU node is for devices that implement their own embedded TLB (e.g. AMBA DTI masters) and expose a standard PMCG interface to monitor it. It isn't reasonable to expect any old PCIe controller or on-chip-accelerator driver to expose a fake SMMU IIDR just to keep some other driver happy. But won't there still be an SMMU associated with the AMBA DTI masters, in your example? It's this SMMU which the PMCG driver would reference as the "parent" device, and the IORT parsing would need to do the lookup for this reference. But then, this becomes something that the DT parsing would need to handle also. Note: I do acknowledge that an overall issue is that we assume all PMCG IMP DEF events are same for a given SMMU model. That assumption does technically fail already - I know MMU-600 has different IMP-DEF events for its TCU and TBUs, however as long as we can get as far as "this is some part of an MMU-600" the driver should be able to figure out the rest (annoyingly it looks like both PMCG types expose the same PMCG_ID_REGS information, but they should be distinguishable by PMCG_CEIDn). JFYI, PMCG_CEIDn contents for hi1620 are all zero, apart from PMDEVARCH and PMDEVTYPE, which are same as arm implementation according to the spec - sigh... Interpreting the Node Reference is definitely a welcome improvement over matching table headers, but absent a truly compelling argument to the contrary, I'd rather retain the "PMCG model" abstraction in between that and the driver itself (especially since those can trivially be hung off compatibles once it comes to DT support). For DT, I would assume that we just use compatible strings would allow us to identify the PMCG model. Right, that was largely my point - DT probing can start with a PMCG model, so it's a lot more logical for ACPI probing to do the same, with the actual PMCG model determination hidden away in the ACPI code. That's the basis of the current design. I have been nagging the architects that PMCGs not having their own IIDR is an unwelcome hole in the spec, so hopefully this might get a bit easier some day. For sure. The spec reads that the PMCGs may be "independently-designed", hence no general id method. I don't get this. On a related matter, is there still a need to deal with scenarios of the PMCG being located within the SMMU register map? As you may remember, we did have this issue but relocated the PMCG to outside the SMMU register map in a later chip rev. MMU-600 has its TCU PMCG page 0 in the middle of its SMMU page 0 space, but given that it's an Arm IP, I expect that when the heat gets turned up for making it work, it's most likely to be under me ;) OK, so this is another reason why I thought that having a reference to the SMMU device could be useful in terms of solving that problem. Robin. . Thanks again, John
[PATCH 1/3] iommu/tegra-smmu: Use non-secure register for flushing
From: Navneet Kumar Use PTB_ASID instead of SMMU_CONFIG to flush smmu. PTB_ASID can be accessed from non-secure mode, SMMU_CONFIG cannot be. Using SMMU_CONFIG could pose a problem when kernel doesn't have secure mode access enabled from boot. Signed-off-by: Navneet Kumar Reviewed-by: Dmitry Osipenko Tested-by: Dmitry Osipenko Signed-off-by: Thierry Reding --- drivers/iommu/tegra-smmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 99f85fb5a704..03e667480ec6 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -240,7 +240,7 @@ static inline void smmu_flush_tlb_group(struct tegra_smmu *smmu, static inline void smmu_flush(struct tegra_smmu *smmu) { - smmu_readl(smmu, SMMU_CONFIG); + smmu_readl(smmu, SMMU_PTB_ASID); } static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp) -- 2.23.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] iommu/tegra-smmu: Fix page tables in > 4 GiB memory
From: Thierry Reding Page tables that reside in physical memory beyond the 4 GiB boundary are currently not working properly. The reason is that when the physical address for page directory entries is read, it gets truncated at 32 bits and can cause crashes when passing that address to the DMA API. Fix this by first casting the PDE value to a dma_addr_t and then using the page frame number mask for the SMMU instance to mask out the invalid bits, which are typically used for mapping attributes, etc. Signed-off-by: Thierry Reding --- drivers/iommu/tegra-smmu.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 9425d01a95ac..63a147b623e6 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -159,9 +159,9 @@ static bool smmu_dma_addr_valid(struct tegra_smmu *smmu, dma_addr_t addr) return (addr & smmu->pfn_mask) == addr; } -static dma_addr_t smmu_pde_to_dma(u32 pde) +static dma_addr_t smmu_pde_to_dma(struct tegra_smmu *smmu, u32 pde) { - return pde << 12; + return (dma_addr_t)(pde & smmu->pfn_mask) << 12; } static void smmu_flush_ptc_all(struct tegra_smmu *smmu) @@ -554,6 +554,7 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, unsigned long iova, dma_addr_t *dmap) { unsigned int pd_index = iova_pd_index(iova); + struct tegra_smmu *smmu = as->smmu; struct page *pt_page; u32 *pd; @@ -562,7 +563,7 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, unsigned long iova, return NULL; pd = page_address(as->pd); - *dmap = smmu_pde_to_dma(pd[pd_index]); + *dmap = smmu_pde_to_dma(smmu, pd[pd_index]); return tegra_smmu_pte_offset(pt_page, iova); } @@ -604,7 +605,7 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova, } else { u32 *pd = page_address(as->pd); - *dmap = smmu_pde_to_dma(pd[pde]); + *dmap = smmu_pde_to_dma(smmu, pd[pde]); } return tegra_smmu_pte_offset(as->pts[pde], iova); @@ -629,7 +630,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova) if (--as->count[pde] == 0) { struct tegra_smmu *smmu = as->smmu; u32 *pd = page_address(as->pd); - dma_addr_t pte_dma = smmu_pde_to_dma(pd[pde]); + dma_addr_t pte_dma = smmu_pde_to_dma(smmu, pd[pde]); tegra_smmu_set_pde(as, iova, 0); -- 2.23.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] iommu/tegra-smmu: Fix client enablement order
From: Navneet Kumar Enable clients' translation only after setting up the swgroups. Signed-off-by: Navneet Kumar Signed-off-by: Thierry Reding --- drivers/iommu/tegra-smmu.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 03e667480ec6..9425d01a95ac 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -351,6 +351,20 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup, unsigned int i; u32 value; + group = tegra_smmu_find_swgroup(smmu, swgroup); + if (group) { + value = smmu_readl(smmu, group->reg); + value &= ~SMMU_ASID_MASK; + value |= SMMU_ASID_VALUE(asid); + value |= SMMU_ASID_ENABLE; + smmu_writel(smmu, value, group->reg); + } else { + pr_warn("%s group from swgroup %u not found\n", __func__, + swgroup); + /* No point moving ahead if group was not found */ + return; + } + for (i = 0; i < smmu->soc->num_clients; i++) { const struct tegra_mc_client *client = >soc->clients[i]; @@ -361,15 +375,6 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, unsigned int swgroup, value |= BIT(client->smmu.bit); smmu_writel(smmu, value, client->smmu.reg); } - - group = tegra_smmu_find_swgroup(smmu, swgroup); - if (group) { - value = smmu_readl(smmu, group->reg); - value &= ~SMMU_ASID_MASK; - value |= SMMU_ASID_VALUE(asid); - value |= SMMU_ASID_ENABLE; - smmu_writel(smmu, value, group->reg); - } } static void tegra_smmu_disable(struct tegra_smmu *smmu, unsigned int swgroup, -- 2.23.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support
On 2019-10-16 9:47 am, John Garry wrote: On 15/10/2019 19:00, Robin Murphy wrote: Hi John, On 30/09/2019 15:33, John Garry wrote: This patchset adds IMP DEF event support for the SMMUv3 PMCG. It is marked as an RFC as the method to identify the PMCG implementation may be a quite disliked. And, in general, the series is somewhat incomplete. So the background is that the PMCG supports IMP DEF events, yet we have no method to identify the PMCG to know the IMP DEF events. A method for identifying the PMCG implementation could be using PMDEVARCH, but we cannot rely on this being set properly, as whether this is implemented is not defined in SMMUv3 spec. Another method would be perf event aliasing, but this method of event matching is based on CPU id, which would not guarantee same uniqueness as PMCG implementation. Yet another method could be to continue using ACPI OEM ID in the IORT code, but this does not scale. And it is not suitable if we ever add DT support to the PMCG driver. The method used in this series is based on matching on the parent SMMUv3 IIDR. We store this IIDR contents in the arm smmu structure as the first element, which means that we don't have to expose SMMU APIs - this is the part which may be disliked. The final two patches switch the pre-existing PMCG model identification from ACPI OEM ID to the same parent SMMUv3 IIDR matching. For now, we only consider SMMUv3' nodes being the associated node for PMCG. Hi Robin, Two significant concerns right off the bat: - It seems more common than not for silicon designers to fail to implement IIDR correctly, so it's only a matter of time before inevitably needing to bring back some firmware-level identifier abstraction (if not already - does Hi161x have PMCGs?) Maybe there's a way that we can switch to this method, and leave the door open for an easy way to support firmware-level identifier again, if ever needed. I'm not too pushed - this was secondary to just allowing the PMCG driver know the associated SMMU model. But that's the part I'm not buying - there's no clear advantage to pushing that complexity down into the PMCG driver, vs. leaving the IORT code responsible for translating an SMMU model into a PMCG model, yet the aforementioned disadvantages jump out right away. And, no, hi161x does not have any PMCGs. Hooray, I guess :) - This seems like a step in entirely the wrong direction for supporting . So to support PMCGs that reference a Named Component or Root Complex, I thought that the IORT parsing code would have to do some secondary lookup to the associated SMMU, through the Named Component or Root Complex node. What was your idea here? The associated SMMU has no relevance in that context - the reason for the Node Reference to point to a non-SMMU node is for devices that implement their own embedded TLB (e.g. AMBA DTI masters) and expose a standard PMCG interface to monitor it. It isn't reasonable to expect any old PCIe controller or on-chip-accelerator driver to expose a fake SMMU IIDR just to keep some other driver happy. Note: I do acknowledge that an overall issue is that we assume all PMCG IMP DEF events are same for a given SMMU model. That assumption does technically fail already - I know MMU-600 has different IMP-DEF events for its TCU and TBUs, however as long as we can get as far as "this is some part of an MMU-600" the driver should be able to figure out the rest (annoyingly it looks like both PMCG types expose the same PMCG_ID_REGS information, but they should be distinguishable by PMCG_CEIDn). Interpreting the Node Reference is definitely a welcome improvement over matching table headers, but absent a truly compelling argument to the contrary, I'd rather retain the "PMCG model" abstraction in between that and the driver itself (especially since those can trivially be hung off compatibles once it comes to DT support). For DT, I would assume that we just use compatible strings would allow us to identify the PMCG model. Right, that was largely my point - DT probing can start with a PMCG model, so it's a lot more logical for ACPI probing to do the same, with the actual PMCG model determination hidden away in the ACPI code. That's the basis of the current design. I have been nagging the architects that PMCGs not having their own IIDR is an unwelcome hole in the spec, so hopefully this might get a bit easier some day. On a related matter, is there still a need to deal with scenarios of the PMCG being located within the SMMU register map? As you may remember, we did have this issue but relocated the PMCG to outside the SMMU register map in a later chip rev. MMU-600 has its TCU PMCG page 0 in the middle of its SMMU page 0 space, but given that it's an Arm IP, I expect that when the heat gets turned up for making it work, it's most likely to be under me ;) Robin.
Re: [PATCH] kernel: dma: Make CMA boot parameters __ro_after_init
Hi Nathan, On Mon, Oct 14, 2019 at 7:55 AM Nathan Chancellor wrote: > > On Sat, Oct 12, 2019 at 05:59:18PM +0530, Shyam Saini wrote: > > This parameters are not changed after early boot. > > By making them __ro_after_init will reduce any attack surface in the > > kernel. > > > > Link: https://lwn.net/Articles/676145/ > > Cc: Christoph Hellwig > > Cc: Marek Szyprowski > > Cc: Robin Murphy > > Cc: Matthew Wilcox > > Cc: Christopher Lameter > > Cc: Kees Cook > > Signed-off-by: Shyam Saini > > --- > > kernel/dma/contiguous.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c > > index 69cfb4345388..1b689b1303cd 100644 > > --- a/kernel/dma/contiguous.c > > +++ b/kernel/dma/contiguous.c > > @@ -42,10 +42,10 @@ struct cma *dma_contiguous_default_area; > > * Users, who want to set the size of global CMA area for their system > > * should use cma= kernel parameter. > > */ > > -static const phys_addr_t size_bytes = (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; > > -static phys_addr_t size_cmdline = -1; > > -static phys_addr_t base_cmdline; > > -static phys_addr_t limit_cmdline; > > +static const phys_addr_t __ro_after_init size_bytes = > > (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; > > The 0day bot reported an issue with this change with clang: > > https://groups.google.com/d/msgid/clang-built-linux/201910140334.nhultlt8%25lkp%40intel.com > > kernel/dma/contiguous.c:46:36: error: 'size_cmdline' causes a section type > conflict with 'size_bytes' > static phys_addr_t __ro_after_init size_cmdline = -1; >^ > kernel/dma/contiguous.c:45:42: note: declared here > static const phys_addr_t __ro_after_init size_bytes = > (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; > ^ > kernel/dma/contiguous.c:47:36: error: 'base_cmdline' causes a section type > conflict with 'size_bytes' > static phys_addr_t __ro_after_init base_cmdline; >^ > kernel/dma/contiguous.c:45:42: note: declared here > static const phys_addr_t __ro_after_init size_bytes = > (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; > ^ > kernel/dma/contiguous.c:48:36: error: 'limit_cmdline' causes a section type > conflict with 'size_bytes' > static phys_addr_t __ro_after_init limit_cmdline; >^ > kernel/dma/contiguous.c:45:42: note: declared here > static const phys_addr_t __ro_after_init size_bytes = > (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; > ^ > 3 errors generated. Thanks for your feedback and reporting this error. > The errors seem kind of cryptic at first but something that is const > should automatically be in the read only section, this part of the > commit seems unnecessary. Removing that part of the change fixes the error. I have overlooked size_bytes variable It shouldn't be const if it is declared as __ro_after_init. I will fix and resend it.
Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support
On 15/10/2019 19:00, Robin Murphy wrote: Hi John, On 30/09/2019 15:33, John Garry wrote: This patchset adds IMP DEF event support for the SMMUv3 PMCG. It is marked as an RFC as the method to identify the PMCG implementation may be a quite disliked. And, in general, the series is somewhat incomplete. So the background is that the PMCG supports IMP DEF events, yet we have no method to identify the PMCG to know the IMP DEF events. A method for identifying the PMCG implementation could be using PMDEVARCH, but we cannot rely on this being set properly, as whether this is implemented is not defined in SMMUv3 spec. Another method would be perf event aliasing, but this method of event matching is based on CPU id, which would not guarantee same uniqueness as PMCG implementation. Yet another method could be to continue using ACPI OEM ID in the IORT code, but this does not scale. And it is not suitable if we ever add DT support to the PMCG driver. The method used in this series is based on matching on the parent SMMUv3 IIDR. We store this IIDR contents in the arm smmu structure as the first element, which means that we don't have to expose SMMU APIs - this is the part which may be disliked. The final two patches switch the pre-existing PMCG model identification from ACPI OEM ID to the same parent SMMUv3 IIDR matching. For now, we only consider SMMUv3' nodes being the associated node for PMCG. Hi Robin, Two significant concerns right off the bat: - It seems more common than not for silicon designers to fail to implement IIDR correctly, so it's only a matter of time before inevitably needing to bring back some firmware-level identifier abstraction (if not already - does Hi161x have PMCGs?) Maybe there's a way that we can switch to this method, and leave the door open for an easy way to support firmware-level identifier again, if ever needed. I'm not too pushed - this was secondary to just allowing the PMCG driver know the associated SMMU model. And, no, hi161x does not have any PMCGs. - This seems like a step in entirely the wrong direction for supporting . So to support PMCGs that reference a Named Component or Root Complex, I thought that the IORT parsing code would have to do some secondary lookup to the associated SMMU, through the Named Component or Root Complex node. What was your idea here? Note: I do acknowledge that an overall issue is that we assume all PMCG IMP DEF events are same for a given SMMU model. Interpreting the Node Reference is definitely a welcome improvement over matching table headers, but absent a truly compelling argument to the contrary, I'd rather retain the "PMCG model" abstraction in between that and the driver itself (especially since those can trivially be hung off compatibles once it comes to DT support). For DT, I would assume that we just use compatible strings would allow us to identify the PMCG model. On a related matter, is there still a need to deal with scenarios of the PMCG being located within the SMMU register map? As you may remember, we did have this issue but relocated the PMCG to outside the SMMU register map in a later chip rev. Cheers, John Thanks, Robin. John Garry (6): ACPI/IORT: Set PMCG device parent iommu/arm-smmu-v3: Record IIDR in arm_smmu_device structure perf/smmuv3: Retrieve parent SMMUv3 IIDR perf/smmuv3: Support HiSilicon hip08 (hi1620) IMP DEF events perf/smmuv3: Match implementation options based on parent SMMU IIDR ACPI/IORT: Drop code to set the PMCG software-defined model drivers/acpi/arm64/iort.c | 69 ++-- drivers/iommu/arm-smmu-v3.c | 5 +++ drivers/perf/arm_smmuv3_pmu.c | 84 ++- include/linux/acpi_iort.h | 8 4 files changed, 112 insertions(+), 54 deletions(-) .
RE: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted
On Tue, Oct 15, 2019 at 09:35:01AM +0200, Christoph Hellwig wrote: > On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote: > > From: Thiago Jung Bauermann > > > > Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must > > be set by both device and guest driver. However, as a hack, when DMA API > > returns physical addresses, guest driver can use the DMA API; even though > > device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical > > addresses. > > Sorry, but this is a complete bullshit hack. Driver must always use > the DMA API if they do DMA, and if virtio devices use physical addresses > that needs to be returned through the platform firmware interfaces for > the dma setup. If you don't do that yet (which based on previous > informations you don't), you need to fix it, and we can then quirk > old implementations that already are out in the field. > > In other words: we finally need to fix that virtio mess and not pile > hacks on top of hacks. So force all virtio devices to use DMA API, except when VIRTIO_F_IOMMU_PLATFORM is not enabled? Any help detailing the idea, will enable us fix this issue once for all. Will something like below work? It removes the prior hacks, and always uses DMA API; except when VIRTIO_F_IOMMU_PLATFORM is not enabled. diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index c8be1c4..b593d3d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -240,22 +240,10 @@ static inline bool virtqueue_use_indirect(struct virtqueue *_vq, static bool vring_use_dma_api(struct virtio_device *vdev) { - if (!virtio_has_iommu_quirk(vdev)) - return true; - - /* Otherwise, we are left to guess. */ - /* -* In theory, it's possible to have a buggy QEMU-supposed -* emulated Q35 IOMMU and Xen enabled at the same time. On -* such a configuration, virtio has never worked and will -* not work without an even larger kludge. Instead, enable -* the DMA API if we're a Xen guest, which at least allows -* all of the sensible Xen configurations to work correctly. -*/ - if (xen_domain()) - return true; + if (virtio_has_iommu_quirk(vdev)) + return false; - return false; + return true; } size_t virtio_max_dma_size(struct virtio_device *vdev) -- Ram Pai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved
Hi, On Tue, Oct 15, 2019 at 09:49:32AM -0700, Yian Chen wrote: > VT-d RMRR (Reserved Memory Region Reporting) regions are reserved > for device use only and should not be part of allocable memory pool of OS. > > BIOS e820_table reports complete memory map to OS, including OS usable > memory ranges and BIOS reserved memory ranges etc. > > x86 BIOS may not be trusted to include RMRR regions as reserved type > of memory in its e820 memory map, hence validate every RMRR entry > with the e820 memory map to make sure the RMRR regions will not be > used by OS for any other purposes. Are there real systems in the wild where this is a problem? > +static inline int __init > +arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) > +{ > + u64 start = rmrr->base_address; > + u64 end = rmrr->end_address + 1; > + > + if (e820__mapped_all(start, end, E820_TYPE_RESERVED)) > + return 0; > + > + pr_err(FW_BUG "No firmware reserved region can cover this RMRR > [%#018Lx-%#018Lx], contact BIOS vendor for fixes\n", > +start, end - 1); > + return -EFAULT; > +} Why -EFAULT, there is no fault involved? Usibg -EINVAL seems to be a better choice.
Re: [PATCH] iommu: rockchip: Free domain on .domain_free
On Wed, Oct 02, 2019 at 02:29:23PM -0300, Ezequiel Garcia wrote: > IOMMU domain resource life is well-defined, managed > by .domain_alloc and .domain_free. > > Therefore, domain-specific resources shouldn't be tied to > the device life, but instead to its domain. > > Signed-off-by: Ezequiel Garcia > --- > drivers/iommu/rockchip-iommu.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 5/6] iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda wrote: > Since we will have changed memory mapping of the IPMMU in the future, > This patch adds helper functions ipmmu_utlb_reg() and > ipmmu_imu{asid,ctr}_write() for "uTLB" registers. No behavior change. > > Signed-off-by: Yoshihiro Shimoda Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/6] iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda wrote: > Since we will have changed memory mapping of the IPMMU in the future, > this patch uses ipmmu_features values instead of a macro to > calculate context registers offset. No behavior change. > > Signed-off-by: Yoshihiro Shimoda Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/6] iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda wrote: > Since we will have changed memory mapping of the IPMMU in the future, > This patch adds helper functions ipmmu_ctx_{reg,read,write}() > for MMU "context" registers. No behavior change. > > Signed-off-by: Yoshihiro Shimoda Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu