Re: [bug report] iommu/vt-d: Fix unmap_pages support
Hi Dan. The updated patch has landed in mainline, as per : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=86dc40c7ea9c22f64571e0e45f695de73a0e2644 On Mon, Nov 29, 2021 at 2:00 PM Dan Carpenter wrote: > > Hello Alex Williamson, > > The patch edad96db58d2: "iommu/vt-d: Fix unmap_pages support" from > Nov 22, 2021, leads to the following Smatch static checker warning: > > drivers/iommu/intel/iommu.c:1369 dma_pte_clear_level() > error: uninitialized symbol 'level_pfn'. > > drivers/iommu/intel/iommu.c > 1330 static struct page *dma_pte_clear_level(struct dmar_domain *domain, > int level, > 1331 struct dma_pte *pte, > unsigned long pfn, > 1332 unsigned long start_pfn, > 1333 unsigned long last_pfn, > 1334 struct page *freelist) > 1335 { > 1336 struct dma_pte *first_pte = NULL, *last_pte = NULL; > 1337 > 1338 pfn = max(start_pfn, pfn); > 1339 pte = [pfn_level_offset(pfn, level)]; > 1340 > 1341 do { > 1342 unsigned long level_pfn; > 1343 > 1344 if (!dma_pte_present(pte)) > 1345 goto next; > ^^ > > If we ever hit this goto then there is going to be a bug. > > 1346 > 1347 level_pfn = pfn & level_mask(level); > 1348 > 1349 /* If range covers entire pagetable, free it */ > 1350 if (start_pfn <= level_pfn && > 1351 last_pfn >= level_pfn + level_size(level) - 1) { > 1352 /* These suborbinate page tables are going > away entirely. Don't > 1353bother to clear them; we're just going to > *free* them. */ > 1354 if (level > 1 && !dma_pte_superpage(pte)) > 1355 freelist = > dma_pte_list_pagetables(domain, level - 1, pte, freelist); > 1356 > 1357 dma_clear_pte(pte); > 1358 if (!first_pte) > 1359 first_pte = pte; > 1360 last_pte = pte; > 1361 } else if (level > 1) { > 1362 /* Recurse down into a level that isn't > *entirely* obsolete */ > 1363 freelist = dma_pte_clear_level(domain, level > - 1, > 1364 > phys_to_virt(dma_pte_addr(pte)), > 1365level_pfn, > start_pfn, last_pfn, > 1366freelist); > 1367 } > 1368 next: > --> 1369 pfn = level_pfn + level_size(level); >^ > > 1370 } while (!first_pte_in_page(++pte) && pfn <= last_pfn); > 1371 > 1372 if (first_pte) > 1373 domain_flush_cache(domain, first_pte, > 1374(void *)++last_pte - (void > *)first_pte); > 1375 > 1376 return freelist; > 1377 } > > regards, > dan carpenter > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix unmap_pages support
Thanks Alex, Baolu. The patch fixes things at my end. No kernel-flooding is seen now (tested starting/stopping vm > 10 times). Thanks and Regards, Ajay On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson wrote: > > When supporting only the .map and .unmap callbacks of iommu_ops, > the IOMMU driver can make assumptions about the size and alignment > used for mappings based on the driver provided pgsize_bitmap. VT-d > previously used essentially PAGE_MASK for this bitmap as any power > of two mapping was acceptably filled by native page sizes. > > However, with the .map_pages and .unmap_pages interface we're now > getting page-size and count arguments. If we simply combine these > as (page-size * count) and make use of the previous map/unmap > functions internally, any size and alignment assumptions are very > different. > > As an example, a given vfio device assignment VM will often create > a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that > does not support IOMMU super pages, the unmap_pages interface will > ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() > will recurse down to level 2 of the page table where the first half > of the pfn range exactly matches the entire pte level. We clear the > pte, increment the pfn by the level size, but (oops) the next pte is > on a new page, so we exit the loop an pop back up a level. When we > then update the pfn based on that higher level, we seem to assume > that the previous pfn value was at the start of the level. In this > case the level size is 256K pfns, which we add to the base pfn and > get a results of 0x7fe00, which is clearly greater than 0x401ff, > so we're done. Meanwhile we never cleared the ptes for the remainder > of the range. When the VM remaps this range, we're overwriting valid > ptes and the VT-d driver complains loudly, as reported by the user > report linked below. > > The fix for this seems relatively simple, if each iteration of the > loop in dma_pte_clear_level() is assumed to clear to the end of the > level pte page, then our next pfn should be calculated from level_pfn > rather than our working pfn. > > Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops > callback") > Reported-by: Ajay Garg > Link: > https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/ > Signed-off-by: Alex Williamson > --- > drivers/iommu/intel/iommu.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index d75f59ae28e6..f6395f5425f0 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct > dmar_domain *domain, int level, >freelist); > } > next: > - pfn += level_size(level); > + pfn = level_pfn + level_size(level); > } while (!first_pte_in_page(++pte) && pfn <= last_pfn); > > if (first_pte) > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
Another piece of information : The observations are same, if the current pci-device (sd/mmc controller) is detached, and another pci-device (sound controller) is attached to the guest. So, it looks that we can rule out any (pci-)device-specific issue. For brevity, here are the details of the other pci-device I tried with : ### sudo lspci -vvv 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) DeviceName: Onboard Audio Subsystem: Dell 6 Series/C200 Series Chipset Family High Definition Audio Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- wrote: > > Ping .. > > Any updates please on this? > > It will be great to have the fix upstreamed (properly of course). > > Right now, the patch contains the change as suggested, of > explicitly/properly clearing out dma-mappings when unmap is called. > Please let me know in whatever way I can help, including > testing/debugging for other approaches if required. > > > Many thanks to Alex and Lu for their continued support on the issue. > > > > P.S. : > > I might have missed mentioning the information about the device that > causes flooding. > Please find it below : > > ## > sudo lspci -vvv > > 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS > SD/MMC Card Reader Controller (rev 05) (prog-if 01) > Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 17 > IOMMU group: 14 > Region 0: Memory at e2c2 (32-bit, non-prefetchable) [size=512] > Capabilities: [a0] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA > PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Address: Data: > Masking: Pending: > Capabilities: [80] Express (v1) Endpoint, MSI 00 > DevCap:MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 > <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > SlotPowerLimit 10.000W > DevCtl:CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta:CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- > TransPend- > LnkCap:Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > Latency L0s <512ns, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- > LnkCtl:ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk- > ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > LnkSta:Speed 2.5GT/s (ok), Width x1 (ok) > TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > Capabilities: [100 v1] Virtual Channel > Caps:LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb:Fixed- WRR32- WRR64- WRR128- > Ctrl:ArbSelect=Fixed > Status:InProgress- > VC0:Caps:PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl:Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > Status:NegoPending- InProgress- > Capabilities: [200 v1] Advanced Error Reporting > UESta:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt:DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta:RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk:RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap:First Error Pointer: 00, ECRCGenCap- ECRCGenEn- > ECRCChkCap- ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: > Kernel driver in use: sdhci-pci > Kernel modules: sdhci_pci > ## > > > > Thanks and Regards, > Ajay > > On Tue, Oct 12, 2021 at 7:
Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
Ping .. Any updates please on this? It will be great to have the fix upstreamed (properly of course). Right now, the patch contains the change as suggested, of explicitly/properly clearing out dma-mappings when unmap is called. Please let me know in whatever way I can help, including testing/debugging for other approaches if required. Many thanks to Alex and Lu for their continued support on the issue. P.S. : I might have missed mentioning the information about the device that causes flooding. Please find it below : ## sudo lspci -vvv 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller (rev 05) (prog-if 01) Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- wrote: > > Origins at : > https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html > > === Changes from v1 => v2 === > > a) > Improved patch-description. > > b) > A more root-level fix, as suggested by > > 1. > Alex Williamson > > 2. > Lu Baolu > > > > === Issue === > > Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in > qemu/kvm > on a x86_64 host (Ubuntu-21), with a host-pci-device attached. > > Following kind of logs, along with the stacktraces, cause the flood : > > .. > DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003) > DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003) > DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003) > DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003) > DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003) > .. > > > > === Current Behaviour, leading to the issue === > > Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but > the pte-entries are not cleared. > > Thus, following sequencing would flood the kernel-logs : > > i) > A dma-unmapping makes the real/leaf-level pte-slot invalid, but the > pte-content itself is not cleared. > > ii) > Now, during some later dma-mapping procedure, as the pte-slot is about > to hold a new pte-value, the intel-iommu checks if a prior > pte-entry exists in the pte-slot. If it exists, it logs a kernel-error, > along with a corresponding stacktrace. > > iii) > Step ii) runs in abundance, and the kernel-logs run insane. > > > > === Fix === > > We ensure that as part of a dma-unmapping, each (unmapped) pte-slot > is also cleared of its value/content (at the leaf-level, where the > real mapping from a iova => pfn mapping is stored). > > This completes a "deep" dma-unmapping. > > > > Signed-off-by: Ajay Garg > --- > drivers/iommu/intel/iommu.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index d75f59ae28e6..485a8ea71394 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain > *domain, > gather->freelist = domain_unmap(dmar_domain, start_pfn, > last_pfn, gather->freelist); > > + dma_pte_clear_range(dmar_domain, start_pfn, last_pfn); > + > if (dmar_domain->max_addr == iova + size) > dmar_domain->max_addr = iova; > > -- > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Host-PCI-Device mapping
Never mind, found the answers in kvm_set_user_memory :) On Fri, Oct 15, 2021 at 9:36 PM Ajay Garg wrote: > > Hello everyone. > > I have a x86_64 L1 guest, running on a x86_64 host, with a > host-pci-device attached to the guest. > The host runs with IOMMU enabled, and passthrough enabled. > > Following are the addresses of the bar0-region of the pci-device, as > per the output of lspci -v : > > * On host (hpa) => e2c2 > * On guest (gpa) => fc078000 > > > Now, if /proc//maps is dumped on the host, following line of > interest is seen : > > # > 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653 > anon_inode:[vfio-device] > # > > Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000. > > > Also, upon attaching gdb to the qemu process, and using a slightly > modified qemugdb/mtree.py (that prints only the information for > :0a:00.0 name) to dump the memory-regions, following is obtained : > > # > (gdb) source qemu-gdb.py > (gdb) qemu mtree > fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@ > 0x56540d8c8da0) > fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@ > 0x56540d8c76b0) > fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0] > (I/O) (@ 0x56540d8c7c30) > (gdb) > # > > Above indicates that the hva for the pci-device starts from 0x56540d8c7c30. > > As seen, there is a discrepancy in the two results. > > > What am I missing? > Looking for pointers, will be grateful. > > > Thanks and Regards, > Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Host-PCI-Device mapping
Hello everyone. I have a x86_64 L1 guest, running on a x86_64 host, with a host-pci-device attached to the guest. The host runs with IOMMU enabled, and passthrough enabled. Following are the addresses of the bar0-region of the pci-device, as per the output of lspci -v : * On host (hpa) => e2c2 * On guest (gpa) => fc078000 Now, if /proc//maps is dumped on the host, following line of interest is seen : # 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653 anon_inode:[vfio-device] # Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000. Also, upon attaching gdb to the qemu process, and using a slightly modified qemugdb/mtree.py (that prints only the information for :0a:00.0 name) to dump the memory-regions, following is obtained : # (gdb) source qemu-gdb.py (gdb) qemu mtree fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@ 0x56540d8c8da0) fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@ 0x56540d8c76b0) fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0] (I/O) (@ 0x56540d8c7c30) (gdb) # Above indicates that the hva for the pci-device starts from 0x56540d8c7c30. As seen, there is a discrepancy in the two results. What am I missing? Looking for pointers, will be grateful. Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
Hi Alex, Lu. Posted v2 patch, as per https://lists.linuxfoundation.org/pipermail/iommu/2021-October/059955.html Kindly review, and let's continue on that thread now. Thanks and Regards, Ajay On Mon, Oct 11, 2021 at 11:43 PM Ajay Garg wrote: > > Thanks Alex for your time. > > I think I may have found the issue. Right now, when doing a > dma-unmapping, we do a "soft-unmapping" only, as the pte-values > themselves are not cleared in the unlinked pagetable-frame. > > I have made the (simple) changes, and things are looking good as of > now (almost an hour now). > However, this time I will give it a day ;) > > If there is not a single-flooding observed in the next 24 hours, I > would float the v2 patch for review. > > > Thanks again for your time and patience. > > > Thanks and Regards, > Ajay > > > > > > Even this QEMU explanation doesn't make a lot of sense, vfio tracks > > userspace mappings and will return an -EEXIST error for duplicate or > > overlapping IOVA entries. We expect to have an entirely empty IOMMU > > domain when a device is assigned, but it seems the only way userspace > > can trigger duplicate PTEs would be if mappings already exist, or we > > have a bug somewhere. > > > > If the most recent instance is purely on bare metal, then it seems the > > host itself has conflicting mappings. I can only speculate with the > > limited data presented, but I'm suspicious there's something happening > > with RMRRs here (but that should also entirely preclude assignment). > > dmesg, lspci -vvv, and VM configuration would be useful. Thanks, > > > > Alex > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.
Origins at : https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html === Changes from v1 => v2 === a) Improved patch-description. b) A more root-level fix, as suggested by 1. Alex Williamson 2. Lu Baolu === Issue === Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in qemu/kvm on a x86_64 host (Ubuntu-21), with a host-pci-device attached. Following kind of logs, along with the stacktraces, cause the flood : .. DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003) DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003) DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003) DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003) DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003) .. === Current Behaviour, leading to the issue === Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but the pte-entries are not cleared. Thus, following sequencing would flood the kernel-logs : i) A dma-unmapping makes the real/leaf-level pte-slot invalid, but the pte-content itself is not cleared. ii) Now, during some later dma-mapping procedure, as the pte-slot is about to hold a new pte-value, the intel-iommu checks if a prior pte-entry exists in the pte-slot. If it exists, it logs a kernel-error, along with a corresponding stacktrace. iii) Step ii) runs in abundance, and the kernel-logs run insane. === Fix === We ensure that as part of a dma-unmapping, each (unmapped) pte-slot is also cleared of its value/content (at the leaf-level, where the real mapping from a iova => pfn mapping is stored). This completes a "deep" dma-unmapping. Signed-off-by: Ajay Garg --- drivers/iommu/intel/iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d75f59ae28e6..485a8ea71394 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain, gather->freelist = domain_unmap(dmar_domain, start_pfn, last_pfn, gather->freelist); + dma_pte_clear_range(dmar_domain, start_pfn, last_pfn); + if (dmar_domain->max_addr == iova + size) dmar_domain->max_addr = iova; -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
Thanks Alex for your time. I think I may have found the issue. Right now, when doing a dma-unmapping, we do a "soft-unmapping" only, as the pte-values themselves are not cleared in the unlinked pagetable-frame. I have made the (simple) changes, and things are looking good as of now (almost an hour now). However, this time I will give it a day ;) If there is not a single-flooding observed in the next 24 hours, I would float the v2 patch for review. Thanks again for your time and patience. Thanks and Regards, Ajay > > Even this QEMU explanation doesn't make a lot of sense, vfio tracks > userspace mappings and will return an -EEXIST error for duplicate or > overlapping IOVA entries. We expect to have an entirely empty IOMMU > domain when a device is assigned, but it seems the only way userspace > can trigger duplicate PTEs would be if mappings already exist, or we > have a bug somewhere. > > If the most recent instance is purely on bare metal, then it seems the > host itself has conflicting mappings. I can only speculate with the > limited data presented, but I'm suspicious there's something happening > with RMRRs here (but that should also entirely preclude assignment). > dmesg, lspci -vvv, and VM configuration would be useful. Thanks, > > Alex > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
The flooding was seen today again, after I booted the host-machine in the morning. Need to look what the heck is going on ... On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg wrote: > > > I'll try and backtrack to the userspace process that is sending these > > ioctls. > > > > The userspace process is qemu. > > I compiled qemu from latest source, installed via "sudo make install" > on host-machine, rebooted the host-machine, and booted up the > guest-machine on the host-machine. Now, no kernel-flooding is seen on > the host-machine. > > For me, the issue is thus closed-invalid; admins may take the > necessary action to officially mark ;) > > > Thanks and Regards, > Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
> I'll try and backtrack to the userspace process that is sending these ioctls. > The userspace process is qemu. I compiled qemu from latest source, installed via "sudo make install" on host-machine, rebooted the host-machine, and booted up the guest-machine on the host-machine. Now, no kernel-flooding is seen on the host-machine. For me, the issue is thus closed-invalid; admins may take the necessary action to officially mark ;) Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
Thanks Alex for the reply. Lu, Alex : I got my diagnosis regarding the host-driver wrong, my apologies. There is no issue with the pci-device's host-driver (confirmed by preventing the loading of host-driver at host-bootup). Thus, nothing to be fixed at the host-driver side. Rather seems some dma mapping/unmapping inconsistency is happening, when kvm/qemu boots up with the pci-device attached to the guest. I put up debug-logs in "vfio_iommu_type1_ioctl" method in "vfio_iommu_type1.c" (on the host-machine). When the guest boots up, repeated DMA-mappings are observed for the same address as per the host-machine's logs (without a corresponding DMA-unmapping first) : ## ajay@ajay-Latitude-E6320:~$ tail -f /var/log/syslog | grep "ajay: " Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.202297] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583179] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:32 ajay-Latitude-E6320 kernel: [ 146.583253] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:12:36 ajay-Latitude-E6320 kernel: [ 150.105584] ajay: _MAP_DMA for [0x7ffe724a8670] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986499] ajay: _UNMAP_DMA for [0x7ffe724a9840] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986559] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 180.986638] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:07 ajay-Latitude-E6320 kernel: [ 181.087359] ajay: _MAP_DMA for [0x7ffe724a97d0] status [0] Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271232] ajay: _UNMAP_DMA for [0x7fde7b7fcfa0] status [0] Oct 7 14:13:13 ajay-Latitude-E6320 kernel: [ 187.271320] ajay: _UNMAP_DMA for [0x7fde7b7fcfa0] status [0] ## I'll try and backtrack to the userspace process that is sending these ioctls. Thanks and Regards, Ajay On Tue, Oct 5, 2021 at 4:01 AM Alex Williamson wrote: > > On Sat, 2 Oct 2021 22:48:24 +0530 > Ajay Garg wrote: > > > Thanks Lu for the reply. > > > > > > > > Isn't the domain should be switched from a default domain to an > > > unmanaged domain when the device is assigned to the guest? > > > > > > Even you want to r-setup the same mappings, you need to un-map all > > > existing mappings, right? > > > > > > > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities > > need to take. > > May be the patch could suppress the flooding till then? > > No, this is wrong. The pte values should not exist, it doesn't matter > that they're the same. Is the host driver failing to remove mappings > and somehow they persist in the new vfio owned domain? There's > definitely a bug beyond logging going on here. Thanks, > > Alex > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Fitment/Use of IOMMU in KVM world when using PCI-devices
Hi All. I have been learning about a lot of inter-related things, kindly correct me if I am wrong anywhere. Till now, following have been broad observations : a) If we have IOMMU disabled on the host, things work fine in general on a guest. But we cannot a attach a pci-device (physically attached to host) to a guest. b) If we have IOMMU enabled on the host, we can attach a pci-device (physically attached to a host) to a guest. Going through the literature on the internet, it looks that we have two modes supported by KVM / QEMU : 1. Conventional shadow-mapping, which works in the most general case, for GVA => GPA => HVA => HPA translations. 2. EPT/NPT shadow-mapping, which works only if hardware-virtualization is supported. As usual, the main purpose is to setup GVA => GPA => HVA => HPA translations. In all the literature that mentioned the above modes, there were roles of software-assisted MMU page-tables (at host-OS / guest-OS / kvm / qemu). The only mention of the IOMMU was with regard to pci-devices, to maintain security and not letting guest-OSes create havoc on a pci-device. So, is the role of IOMMU to provide security/containership only? In other words, if security was not a concern, would it still have been possible to attach pci-devices on the guest-devices without needing to enable the iommu? Will be grateful to get pointers. Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
Thanks Lu for the reply. > > Isn't the domain should be switched from a default domain to an > unmanaged domain when the device is assigned to the guest? > > Even you want to r-setup the same mappings, you need to un-map all > existing mappings, right? > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities need to take. May be the patch could suppress the flooding till then? Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.
Taking a SD-MMC controller (over PCI) as an example, following is an example sequencing, where the log-flooding happened : 0. We have a host and a guest, both running latest x86_64 kernels. 1. Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs are setup for the controller (on the host), for the first time. 2. The SD-controller device is added to a (L1) guest on a KVM-VM (via virt-manager). 3. The KVM-VM is booted up. 4. Above triggers a re-setup of DMA-PTEs on the host, for a second time. It is observed that the new PTEs formed (on the host) are same as the original PTEs, and thus following logs, accompanied by stacktraces, overwhelm the logs : .. DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003) DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003) DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003) DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003) DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003) .. As the PTEs are same, so there is no cause of concern, and we can easily avoid the logs-flood for this non-error case. Signed-off-by: Ajay Garg --- drivers/iommu/intel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d75f59ae28e6..8bea8b4e3ff9 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, * touches the iova range */ tmp = cmpxchg64_local(>val, 0ULL, pteval); - if (tmp) { + if (tmp && (tmp != pteval)) { static int dumps = 5; pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", iov_pfn, tmp, (unsigned long long)pteval); -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Upstream-Patching for iommu (intel)
Hi All. What is the upstream list, wherein patches for iommu (intel) might be posted? Is it iommu@lists.linux-foundation.org? Thanks and Regards, Ajay ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu