Re: [bug report] iommu/vt-d: Fix unmap_pages support

2021-11-29 Thread Ajay Garg
Hi Dan.

The updated patch has landed in mainline, as per :
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=86dc40c7ea9c22f64571e0e45f695de73a0e2644



On Mon, Nov 29, 2021 at 2:00 PM Dan Carpenter  wrote:
>
> Hello Alex Williamson,
>
> The patch edad96db58d2: "iommu/vt-d: Fix unmap_pages support" from
> Nov 22, 2021, leads to the following Smatch static checker warning:
>
> drivers/iommu/intel/iommu.c:1369 dma_pte_clear_level()
> error: uninitialized symbol 'level_pfn'.
>
> drivers/iommu/intel/iommu.c
> 1330 static struct page *dma_pte_clear_level(struct dmar_domain *domain, 
> int level,
> 1331 struct dma_pte *pte, 
> unsigned long pfn,
> 1332 unsigned long start_pfn,
> 1333 unsigned long last_pfn,
> 1334 struct page *freelist)
> 1335 {
> 1336 struct dma_pte *first_pte = NULL, *last_pte = NULL;
> 1337
> 1338 pfn = max(start_pfn, pfn);
> 1339 pte = [pfn_level_offset(pfn, level)];
> 1340
> 1341 do {
> 1342 unsigned long level_pfn;
> 1343
> 1344 if (!dma_pte_present(pte))
> 1345 goto next;
>  ^^
>
> If we ever hit this goto then there is going to be a bug.
>
> 1346
> 1347 level_pfn = pfn & level_mask(level);
> 1348
> 1349 /* If range covers entire pagetable, free it */
> 1350 if (start_pfn <= level_pfn &&
> 1351 last_pfn >= level_pfn + level_size(level) - 1) {
> 1352 /* These suborbinate page tables are going 
> away entirely. Don't
> 1353bother to clear them; we're just going to 
> *free* them. */
> 1354 if (level > 1 && !dma_pte_superpage(pte))
> 1355 freelist = 
> dma_pte_list_pagetables(domain, level - 1, pte, freelist);
> 1356
> 1357 dma_clear_pte(pte);
> 1358 if (!first_pte)
> 1359 first_pte = pte;
> 1360 last_pte = pte;
> 1361 } else if (level > 1) {
> 1362 /* Recurse down into a level that isn't 
> *entirely* obsolete */
> 1363 freelist = dma_pte_clear_level(domain, level 
> - 1,
> 1364
> phys_to_virt(dma_pte_addr(pte)),
> 1365level_pfn, 
> start_pfn, last_pfn,
> 1366freelist);
> 1367 }
> 1368 next:
> --> 1369 pfn = level_pfn + level_size(level);
>^
>
> 1370 } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
> 1371
> 1372 if (first_pte)
> 1373 domain_flush_cache(domain, first_pte,
> 1374(void *)++last_pte - (void 
> *)first_pte);
> 1375
> 1376 return freelist;
> 1377 }
>
> regards,
> dan carpenter
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix unmap_pages support

2021-11-11 Thread Ajay Garg
Thanks Alex, Baolu.
The patch fixes things at my end. No kernel-flooding is seen now
(tested starting/stopping vm > 10 times).


Thanks and Regards,
Ajay

On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson
 wrote:
>
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops 
> callback")
> Reported-by: Ajay Garg 
> Link: 
> https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/
> Signed-off-by: Alex Williamson 
> ---
>  drivers/iommu/intel/iommu.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct 
> dmar_domain *domain, int level,
>freelist);
> }
>  next:
> -   pfn += level_size(level);
> +   pfn = level_pfn + level_size(level);
> } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>
> if (first_pte)
>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.

2021-10-23 Thread Ajay Garg
Another piece of information :

The observations are same, if the current pci-device (sd/mmc
controller) is detached, and another pci-device (sound controller) is
attached to the guest.

So, it looks that we can rule out any (pci-)device-specific issue.


For brevity, here are the details of the other pci-device I tried with :

###
sudo lspci -vvv

00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset
Family High Definition Audio Controller (rev 04)
DeviceName:  Onboard Audio
Subsystem: Dell 6 Series/C200 Series Chipset Family High
Definition Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR-  wrote:
>
> Ping ..
>
> Any updates please on this?
>
> It will be great to have the fix upstreamed (properly of course).
>
> Right now, the patch contains the change as suggested, of
> explicitly/properly clearing out dma-mappings when unmap is called.
> Please let me know in whatever way I can help, including
> testing/debugging for other approaches if required.
>
>
> Many thanks to Alex and Lu for their continued support on the issue.
>
>
>
> P.S. :
>
> I might have missed mentioning the information about the device that
> causes flooding.
> Please find it below :
>
> ##
> sudo lspci -vvv
>
> 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
> SD/MMC Card Reader Controller (rev 05) (prog-if 01)
> Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR-  Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 17
> IOMMU group: 14
> Region 0: Memory at e2c2 (32-bit, non-prefetchable) [size=512]
> Capabilities: [a0] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+
> Address:   Data: 
> Masking:   Pending: 
> Capabilities: [80] Express (v1) Endpoint, MSI 00
> DevCap:MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 
> <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10.000W
> DevCtl:CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta:CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- 
> TransPend-
> LnkCap:Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s <512ns, L1 <64us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
> LnkCtl:ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk-
> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> LnkSta:Speed 2.5GT/s (ok), Width x1 (ok)
> TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
> Capabilities: [100 v1] Virtual Channel
> Caps:LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb:Fixed- WRR32- WRR64- WRR128-
> Ctrl:ArbSelect=Fixed
> Status:InProgress-
> VC0:Caps:PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl:Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> Status:NegoPending- InProgress-
> Capabilities: [200 v1] Advanced Error Reporting
> UESta:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt:DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta:RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk:RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap:First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
> ECRCChkCap- ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog:    
> Kernel driver in use: sdhci-pci
> Kernel modules: sdhci_pci
> ##
>
>
>
> Thanks and Regards,
> Ajay
>
> On Tue, Oct 12, 2021 at 7:

Re: [PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.

2021-10-22 Thread Ajay Garg
Ping ..

Any updates please on this?

It will be great to have the fix upstreamed (properly of course).

Right now, the patch contains the change as suggested, of
explicitly/properly clearing out dma-mappings when unmap is called.
Please let me know in whatever way I can help, including
testing/debugging for other approaches if required.


Many thanks to Alex and Lu for their continued support on the issue.



P.S. :

I might have missed mentioning the information about the device that
causes flooding.
Please find it below :

##
sudo lspci -vvv

0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
SD/MMC Card Reader Controller (rev 05) (prog-if 01)
Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR-  wrote:
>
> Origins at :
> https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html
>
> === Changes from v1 => v2 ===
>
> a)
> Improved patch-description.
>
> b)
> A more root-level fix, as suggested by
>
> 1.
> Alex Williamson 
>
> 2.
> Lu Baolu 
>
>
>
> === Issue ===
>
> Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in 
> qemu/kvm
> on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
>
> Following kind of logs, along with the stacktraces, cause the flood :
>
> ..
>  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
>  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
> ..
>
>
>
> === Current Behaviour, leading to the issue ===
>
> Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
> the pte-entries are not cleared.
>
> Thus, following sequencing would flood the kernel-logs :
>
> i)
> A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
> pte-content itself is not cleared.
>
> ii)
> Now, during some later dma-mapping procedure, as the pte-slot is about
> to hold a new pte-value, the intel-iommu checks if a prior
> pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
> along with a corresponding stacktrace.
>
> iii)
> Step ii) runs in abundance, and the kernel-logs run insane.
>
>
>
> === Fix ===
>
> We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
> is also cleared of its value/content (at the leaf-level, where the
> real mapping from a iova => pfn mapping is stored).
>
> This completes a "deep" dma-unmapping.
>
>
>
> Signed-off-by: Ajay Garg 
> ---
>  drivers/iommu/intel/iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..485a8ea71394 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain 
> *domain,
> gather->freelist = domain_unmap(dmar_domain, start_pfn,
> last_pfn, gather->freelist);
>
> +   dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
> +
> if (dmar_domain->max_addr == iova + size)
> dmar_domain->max_addr = iova;
>
> --
> 2.30.2
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Host-PCI-Device mapping

2021-10-15 Thread Ajay Garg
Never mind, found the answers in kvm_set_user_memory :)

On Fri, Oct 15, 2021 at 9:36 PM Ajay Garg  wrote:
>
> Hello everyone.
>
> I have a x86_64 L1 guest, running on a x86_64 host, with a
> host-pci-device attached to the guest.
> The host runs with IOMMU enabled, and passthrough enabled.
>
> Following are the addresses of the bar0-region of the pci-device, as
> per the output of lspci -v :
>
> * On host (hpa) => e2c2
> * On guest (gpa) => fc078000
>
>
> Now, if /proc//maps is dumped on the host, following line of
> interest is seen :
>
> #
> 7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653
>   anon_inode:[vfio-device]
> #
>
> Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000.
>
>
> Also, upon attaching gdb to the qemu process, and using a slightly
> modified qemugdb/mtree.py (that prints only the information for
> :0a:00.0 name) to dump the memory-regions, following is obtained :
>
> #
> (gdb) source qemu-gdb.py
> (gdb) qemu mtree
> fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@
> 0x56540d8c8da0)
>   fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@
> 0x56540d8c76b0)
> fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0]
> (I/O) (@ 0x56540d8c7c30)
> (gdb)
> #
>
> Above indicates that the hva for the pci-device starts from 0x56540d8c7c30.
>
> As seen, there is a discrepancy in the two results.
>
>
> What am I missing?
> Looking for pointers, will be grateful.
>
>
> Thanks and Regards,
> Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Host-PCI-Device mapping

2021-10-15 Thread Ajay Garg
Hello everyone.

I have a x86_64 L1 guest, running on a x86_64 host, with a
host-pci-device attached to the guest.
The host runs with IOMMU enabled, and passthrough enabled.

Following are the addresses of the bar0-region of the pci-device, as
per the output of lspci -v :

* On host (hpa) => e2c2
* On guest (gpa) => fc078000


Now, if /proc//maps is dumped on the host, following line of
interest is seen :

#
7f0b5c5f4000-7f0b5c5f5000 rw-s e2c2 00:0e 13653
  anon_inode:[vfio-device]
#

Above indicates that the hva for the pci-device starts from 0x7f0b5c5f4000.


Also, upon attaching gdb to the qemu process, and using a slightly
modified qemugdb/mtree.py (that prints only the information for
:0a:00.0 name) to dump the memory-regions, following is obtained :

#
(gdb) source qemu-gdb.py
(gdb) qemu mtree
fc078000-fc07c095 :0a:00.0 base BAR 0 (I/O) (@
0x56540d8c8da0)
  fc078000-fc07c095 :0a:00.0 BAR 0 (I/O) (@
0x56540d8c76b0)
fc078000-fc07c095 :0a:00.0 BAR 0 mmaps[0]
(I/O) (@ 0x56540d8c7c30)
(gdb)
#

Above indicates that the hva for the pci-device starts from 0x56540d8c7c30.

As seen, there is a discrepancy in the two results.


What am I missing?
Looking for pointers, will be grateful.


Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-12 Thread Ajay Garg
Hi Alex, Lu.

Posted v2 patch, as per
https://lists.linuxfoundation.org/pipermail/iommu/2021-October/059955.html


Kindly review, and let's continue on that thread now.


Thanks and Regards,
Ajay

On Mon, Oct 11, 2021 at 11:43 PM Ajay Garg  wrote:
>
> Thanks Alex for your time.
>
> I think I may have found the issue. Right now, when doing a
> dma-unmapping, we do a "soft-unmapping" only, as the pte-values
> themselves are not cleared in the unlinked pagetable-frame.
>
> I have made the (simple) changes, and things are looking good as of
> now (almost an hour now).
> However, this time I will give it a day ;)
>
> If there is not a single-flooding observed in the next 24 hours, I
> would float the v2 patch for review.
>
>
> Thanks again for your time and patience.
>
>
> Thanks and Regards,
> Ajay
>
>
> >
> > Even this QEMU explanation doesn't make a lot of sense, vfio tracks
> > userspace mappings and will return an -EEXIST error for duplicate or
> > overlapping IOVA entries.  We expect to have an entirely empty IOMMU
> > domain when a device is assigned, but it seems the only way userspace
> > can trigger duplicate PTEs would be if mappings already exist, or we
> > have a bug somewhere.
> >
> > If the most recent instance is purely on bare metal, then it seems the
> > host itself has conflicting mappings.  I can only speculate with the
> > limited data presented, but I'm suspicious there's something happening
> > with RMRRs here (but that should also entirely preclude assignment).
> > dmesg, lspci -vvv, and VM configuration would be useful.  Thanks,
> >
> > Alex
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu: intel: do deep dma-unmapping, to avoid kernel-flooding.

2021-10-12 Thread Ajay Garg
Origins at :
https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html

=== Changes from v1 => v2 ===

a)
Improved patch-description.

b)
A more root-level fix, as suggested by

1.
Alex Williamson 

2.
Lu Baolu 



=== Issue ===

Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in 
qemu/kvm
on a x86_64 host (Ubuntu-21), with a host-pci-device attached.

Following kind of logs, along with the stacktraces, cause the flood :

..
 DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
 DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
..



=== Current Behaviour, leading to the issue ===

Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
the pte-entries are not cleared.

Thus, following sequencing would flood the kernel-logs :

i)
A dma-unmapping makes the real/leaf-level pte-slot invalid, but the 
pte-content itself is not cleared.

ii)
Now, during some later dma-mapping procedure, as the pte-slot is about
to hold a new pte-value, the intel-iommu checks if a prior 
pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
along with a corresponding stacktrace.

iii)
Step ii) runs in abundance, and the kernel-logs run insane.



=== Fix ===

We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
is also cleared of its value/content (at the leaf-level, where the 
real mapping from a iova => pfn mapping is stored). 

This completes a "deep" dma-unmapping.



Signed-off-by: Ajay Garg 
---
 drivers/iommu/intel/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..485a8ea71394 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain 
*domain,
gather->freelist = domain_unmap(dmar_domain, start_pfn,
last_pfn, gather->freelist);
 
+   dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
+
if (dmar_domain->max_addr == iova + size)
dmar_domain->max_addr = iova;
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-11 Thread Ajay Garg
Thanks Alex for your time.

I think I may have found the issue. Right now, when doing a
dma-unmapping, we do a "soft-unmapping" only, as the pte-values
themselves are not cleared in the unlinked pagetable-frame.

I have made the (simple) changes, and things are looking good as of
now (almost an hour now).
However, this time I will give it a day ;)

If there is not a single-flooding observed in the next 24 hours, I
would float the v2 patch for review.


Thanks again for your time and patience.


Thanks and Regards,
Ajay


>
> Even this QEMU explanation doesn't make a lot of sense, vfio tracks
> userspace mappings and will return an -EEXIST error for duplicate or
> overlapping IOVA entries.  We expect to have an entirely empty IOMMU
> domain when a device is assigned, but it seems the only way userspace
> can trigger duplicate PTEs would be if mappings already exist, or we
> have a bug somewhere.
>
> If the most recent instance is purely on bare metal, then it seems the
> host itself has conflicting mappings.  I can only speculate with the
> limited data presented, but I'm suspicious there's something happening
> with RMRRs here (but that should also entirely preclude assignment).
> dmesg, lspci -vvv, and VM configuration would be useful.  Thanks,
>
> Alex
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-11 Thread Ajay Garg
The flooding was seen today again, after I booted the host-machine in
the morning.
Need to look what the heck is going on ...

On Sun, Oct 10, 2021 at 11:45 AM Ajay Garg  wrote:
>
> > I'll try and backtrack to the userspace process that is sending these 
> > ioctls.
> >
>
> The userspace process is qemu.
>
> I compiled qemu from latest source, installed via "sudo make install"
> on host-machine, rebooted the host-machine, and booted up the
> guest-machine on the host-machine. Now, no kernel-flooding is seen on
> the host-machine.
>
> For me, the issue is thus closed-invalid; admins may take the
> necessary action to officially mark ;)
>
>
> Thanks and Regards,
> Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-10 Thread Ajay Garg
> I'll try and backtrack to the userspace process that is sending these ioctls.
>

The userspace process is qemu.

I compiled qemu from latest source, installed via "sudo make install"
on host-machine, rebooted the host-machine, and booted up the
guest-machine on the host-machine. Now, no kernel-flooding is seen on
the host-machine.

For me, the issue is thus closed-invalid; admins may take the
necessary action to officially mark ;)


Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-07 Thread Ajay Garg
Thanks Alex for the reply.


Lu, Alex :

I got my diagnosis regarding the host-driver wrong, my apologies.
There is no issue with the pci-device's host-driver (confirmed by
preventing the loading of host-driver at host-bootup). Thus, nothing
to be fixed at the host-driver side.

Rather seems some dma mapping/unmapping inconsistency is happening,
when kvm/qemu boots up with the pci-device attached to the guest.

I put up debug-logs in "vfio_iommu_type1_ioctl" method in
"vfio_iommu_type1.c" (on the host-machine).
When the guest boots up, repeated DMA-mappings are observed for the
same address as per the host-machine's logs (without a corresponding
DMA-unmapping first) :

##
ajay@ajay-Latitude-E6320:~$ tail -f /var/log/syslog | grep "ajay: "
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.202297] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.583179] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:32 ajay-Latitude-E6320 kernel: [  146.583253] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:12:36 ajay-Latitude-E6320 kernel: [  150.105584] ajay:
_MAP_DMA for [0x7ffe724a8670] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986499] ajay:
_UNMAP_DMA for [0x7ffe724a9840] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986559] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  180.986638] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:07 ajay-Latitude-E6320 kernel: [  181.087359] ajay:
_MAP_DMA for [0x7ffe724a97d0] status [0]
Oct  7 14:13:13 ajay-Latitude-E6320 kernel: [  187.271232] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]
Oct  7 14:13:13 ajay-Latitude-E6320 kernel: [  187.271320] ajay:
_UNMAP_DMA for [0x7fde7b7fcfa0] status [0]

##


I'll try and backtrack to the userspace process that is sending these ioctls.


Thanks and Regards,
Ajay






On Tue, Oct 5, 2021 at 4:01 AM Alex Williamson
 wrote:
>
> On Sat, 2 Oct 2021 22:48:24 +0530
> Ajay Garg  wrote:
>
> > Thanks Lu for the reply.
> >
> > >
> > > Isn't the domain should be switched from a default domain to an
> > > unmanaged domain when the device is assigned to the guest?
> > >
> > > Even you want to r-setup the same mappings, you need to un-map all
> > > existing mappings, right?
> > >
> >
> > Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
> > need to take.
> > May be the patch could suppress the flooding till then?
>
> No, this is wrong.  The pte values should not exist, it doesn't matter
> that they're the same.  Is the host driver failing to remove mappings
> and somehow they persist in the new vfio owned domain?  There's
> definitely a bug beyond logging going on here.  Thanks,
>
> Alex
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Fitment/Use of IOMMU in KVM world when using PCI-devices

2021-10-05 Thread Ajay Garg
Hi All.

I have been learning about a lot of inter-related things, kindly
correct me if I am wrong anywhere.
Till now, following have been broad observations :

a)
If we have IOMMU disabled on the host, things work fine in general on
a guest. But we cannot a attach a pci-device (physically attached to
host) to a guest.

b)
If we have IOMMU enabled on the host, we can attach a pci-device
(physically attached to a host) to a guest.




Going through the literature on the internet, it looks that we have
two modes supported by KVM / QEMU :

1.
Conventional shadow-mapping, which works in the most general case, for
GVA => GPA => HVA => HPA translations.

2.
EPT/NPT shadow-mapping, which works only if hardware-virtualization is
supported. As usual, the main purpose is to setup GVA => GPA => HVA =>
HPA translations.


In all the literature that mentioned the above modes, there were roles
of software-assisted MMU page-tables (at host-OS / guest-OS / kvm /
qemu).
The only mention of the IOMMU was with regard to pci-devices, to
maintain security and not letting guest-OSes create havoc on a
pci-device.





So, is the role of IOMMU to provide security/containership only?
In other words, if security was not a concern, would it still have
been possible to attach pci-devices on the guest-devices without
needing to enable the iommu?


Will be grateful to get pointers.


Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-02 Thread Ajay Garg
Thanks Lu for the reply.

>
> Isn't the domain should be switched from a default domain to an
> unmanaged domain when the device is assigned to the guest?
>
> Even you want to r-setup the same mappings, you need to un-map all
> existing mappings, right?
>

Hmm, I guess that's a (design) decision the KVM/QEMU/VFIO communities
need to take.
May be the patch could suppress the flooding till then?



Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: intel: remove flooding of non-error logs, when new-DMA-PTE is the same as old-DMA-PTE.

2021-10-02 Thread Ajay Garg
Taking a SD-MMC controller (over PCI) as an example, following is an
example sequencing, where the log-flooding happened :

0.
We have a host and a guest, both running latest x86_64 kernels.

1.
Host-machine is booted up (with intel_iommu=on), and the DMA-PTEs
are setup for the controller (on the host), for the first time.

2.
The SD-controller device is added to a (L1) guest on a KVM-VM
(via virt-manager).

3.
The KVM-VM is booted up.

4.
Above triggers a re-setup of DMA-PTEs on the host, for a
second time.

It is observed that the new PTEs formed (on the host) are same
as the original PTEs, and thus following logs, accompanied by
stacktraces, overwhelm the logs :

..
 DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 3f6ec003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 3f6ed003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 3f6ee003)
 DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 3f6ef003)
 DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 3f6f0003)
..

As the PTEs are same, so there is no cause of concern, and we can easily
avoid the logs-flood for this non-error case.

Signed-off-by: Ajay Garg 
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..8bea8b4e3ff9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2370,7 +2370,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
long iov_pfn,
 * touches the iova range
 */
tmp = cmpxchg64_local(>val, 0ULL, pteval);
-   if (tmp) {
+   if (tmp && (tmp != pteval)) {
static int dumps = 5;
pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to 
%llx not %llx)\n",
iov_pfn, tmp, (unsigned long long)pteval);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Upstream-Patching for iommu (intel)

2021-10-01 Thread Ajay Garg
Hi All.

What is the upstream list, wherein patches for iommu (intel) might be posted?
Is it iommu@lists.linux-foundation.org?


Thanks and Regards,
Ajay
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu