Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-25 Thread Sumit Gupta




I have worked around the issue by filtering out the request if the
pfn is not valid in __clean_dcache_guest_page().  As the patch
wasn't posted in the community, reverted it as well.


That's papering over the real issue, and this mapping path needs
fixing as it was only ever expected to be called for CoW.

Can you please try the following patch and let me know if that fixes
the issue for good?



Hi Marc,

Thank you for the patch. This patch fixed the crash for me.
For the formal patch, please add:

Tested-by: Sumit Gupta 



diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..b62dd40a4083 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1147,7 +1147,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
pte_t pte)
  * We've moved a page around, probably through CoW, so let's treat it
  * just like a translation fault and clean the cache to the PoC.
  */
-   clean_dcache_guest_page(pfn, PAGE_SIZE);
+   if (!kvm_is_device_pfn(pfn))
+   clean_dcache_guest_page(pfn, PAGE_SIZE);
 handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
 return 0;
  }


--
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-24 Thread Sumit Gupta





Did that patch cause any issue, or is it just not needed on your system?
It fixes an hypothetical problem with the way ATS is implemented.
Maybe I actually observed it on an old software model, I don't
remember. Either way it's unlikely to go upstream but I'd like to know
if I should drop it from my tree.



Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
avoid below crash[1]. I am not sure about the cause yet.


I have noticed this issue earlier with patch pointed here and root caused the 
issue as below.
It happens after vfio_mmap request from QEMU for the PCIe device and during the 
access of VA when
PTE access flags are updated.

kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() --> kvm_set_spte_hva() --> 
clean_dcache_guest_page()

The validation model doesn't have FWB capability supported.
__clean_dcache_guest_page() attempts to perform dcache flush on pcie bar 
address(not a valid_pfn()) through page_address(),
which doesn't have page table mapping and leads to exception.

I have worked around the issue by filtering out the request if the pfn is not 
valid in  __clean_dcache_guest_page().
As the patch wasn't posted in the community, reverted it as well.


Thank you Krishna for sharing the analysis.

Best Regards,
Sumit Gupta
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-24 Thread Sumit Gupta

Hi Jean,



Hi Sumit,

On Thu, Apr 22, 2021 at 08:34:38PM +0530, Sumit Gupta wrote:

Had to revert patch "mm: notify remote TLBs when dirtying a PTE".


Did that patch cause any issue, or is it just not needed on your system?
It fixes an hypothetical problem with the way ATS is implemented. Maybe I
actually observed it on an old software model, I don't remember. Either
way it's unlikely to go upstream but I'd like to know if I should drop it
from my tree.


I tried Nested SMMUv3 patches v15(Eric's branch: 
v5.12-rc6-jean-iopf-14-2stage-v15) on top of your current sva patches 
with Kernel-5.12.0-rc8.
Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to 
avoid below crash[1]. I am not sure about the cause yet.

Didn't get crash after reverting patch and nested translations worked.

[1]
[   11.730943] arm-smmu-v3 905.smmuv3: ias 44-bit, oas 44-bit 
(features 0x8305)^M^M
[   11.833791] arm-smmu-v3 905.smmuv3: allocated 524288 entries for 
cmdq^M^M
[   11.979456] arm-smmu-v3 905.smmuv3: allocated 524288 entries for 
evtq^M^M

[   12.048895] cacheinfo: Unable to detect cache hierarchy for CPU 0^M^M
[   12.234175] loop: module loaded^M^M
[   12.279552] megasas: 07.714.04.00-rc1^M^M
[   12.408831] nvme :00:02.0: Adding to iommu group 0^M^M
[   12.488063] nvme nvme0: pci function :00:02.0^M^M
[   12.525887] nvme :00:02.0: enabling device ( -> 0002)^M^M
[   12.612159] physmap-flash 0.flash: physmap platform flash device: 
[mem 0x-0x03ff]^M^M
[ 1721.586943] Unable to handle kernel paging request at virtual address 
617f8000^M

[ 1721.587263] Mem abort info:^M
[ 1721.587776]   ESR = 0x96000145^M
[ 1721.587968]   EC = 0x25: DABT (current EL), IL = 32 bits^M
[ 1721.588416]   SET = 0, FnV = 0^M
[ 1721.588672]   EA = 0, S1PTW = 0^M
[ 1721.588863] Data abort info:^M
[ 1721.589120]   ISV = 0, ISS = 0x0145^M
[ 1721.589311]   CM = 1, WnR = 1^M
[ 1721.589568] swapper pgtable: 64k pages, 48-bit VAs, 
pgdp=00011128^M
[ 1721.589951] [617f8000] pgd=, 
p4d=, pud=^M

[ 1721.590592] Internal error: Oops: 96000145 [#1] PREEMPT SMP^M
[ 1721.590912] Modules linked in:^M
[ 1721.591232] CPU: 0 PID: 664 Comm: qemu-system-aar Not tainted 
5.12.0-rc8-tegra-229886-g4786d4a20d7 #22^M

[ 1721.591680] pstate: a045 (NzCv daif +PAN -UAO -TCO BTYPE=--)
[ 1721.592128] pc : __flush_dcache_area+0x20/0x38
[ 1721.592511] lr : kvm_set_spte_hva+0x64/0xc8
[ 1721.592832] sp : 8000145cfc30
[ 1721.593087] x29: 8000145cfc30 x28: 95221c80
[ 1721.593599] x27: 0002 x26: a3711c88
[ 1721.594112] x25: 9333a740 x24: 01e861800f53
[ 1721.594624] x23: b832 x22: 0001
[ 1721.595136] x21: b832 x20: a1268000
[ 1721.595647] x19: 800011c95000 x18: 
[ 1721.596160] x17:  x16: 
[ 1721.596608] x15:  x14: 
[ 1721.597120] x13:  x12: 
[ 1721.597568] x11:  x10: 
[ 1721.598080] x9 :  x8 : 9333a740
[ 1721.598592] x7 : 07fd000b8320 x6 : 815bc190
[ 1721.599104] x5 : 00011b06 x4 : 
[ 1721.599552] x3 : 003f x2 : 0040
[ 1721.600064] x1 : 617f8001 x0 : 617f8000
[ 1721.600576] Call trace:
[ 1721.600768]  __flush_dcache_area+0x20/0x38
[ 1721.601216]  kvm_mmu_notifier_change_pte+0x5c/0xa8
[ 1721.601601]  __mmu_notifier_change_pte+0x60/0xa0
[ 1721.601985]  __handle_mm_fault+0x740/0xde8
[ 1721.602367]  handle_mm_fault+0xe8/0x238
[ 1721.602751]  do_page_fault+0x160/0x3a8
[ 1721.603200]  do_mem_abort+0x40/0xb0
[ 1721.603520]  el0_da+0x20/0x30
[ 1721.603967]  el0_sync_handler+0x68/0xd0
[ 1721.604416]  el0_sync+0x154/0x180
[ 1721.604864] Code: 9ac32042 8b010001 d1000443 8a23 (d50b7e20)
[ 1721.605184] ---[ end trace 7678eb97889b6fbd ]---
[ 1721.605504] Kernel panic - not syncing: Oops: Fatal exception
[ 1721.605824] Kernel Offset: disabled
[ 1721.606016] CPU features: 0x00340216,6280a018
[ 1721.606335] Memory Limit: 2909 MB
[ 1721.606656] ---[ end Kernel panic - not syncing: Oops: Fatal 
exception ]---

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-24 Thread Sumit Gupta
Hi Eric,

I have validated v15 of the patch series from your branch 
"v5.12-rc6-jean-iopf-14-2stage-v15"
on top of Jean's current sva patches with Kernel-5.12.0-rc8.
Verfied nested translations with NVMe PCI device assigned to Guest VM.

Tested-by: Sumit Gupta 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-22 Thread Sumit Gupta
Hi Eric,
I have validated the v14 of the patch series from branch 
"jean_sva_current_2stage_v14".
Verfied nested translations with NVMe PCI device assigned to Qemu 5.2 Guest.
Had to revert patch "mm: notify remote TLBs when dirtying a PTE".

Tested-by: Sumit Gupta 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm