Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 12/11/2019 12:05, Jan Beulich wrote: > On 11.11.2019 22:38, Sander Eikelenboom wrote: >> When supplying "pci=nomsi" to the guest kernel, the device works fine, >> and I don't get the "INVALID_DEV_REQUEST". >> >> After reverting 1b00c16bdf, the device works fine >> and I don't get the INVALID_DEV_REQUEST, > > Could you give the patch below a try? That commit took care of only > securing ourselves, but not of relaxing things again when a device > gets handed to a guest for actual use. > > Jan Hi Jan, CC'ed Juergen, as he seems to be dropped off the CC-list at some time. Just tested this patch: the device works fine and I don't get the INVALID_DEV_REQUEST. This was the last remaining issue around pci passthrough I encountered, with all patches applied (yours and Anthony's) pci passthrough for me seems to work again as I was used to. Thanks again for fixing the issues and providing the right educated guesses! -- Sander > AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device() > > Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table > allocation") moved ourselves into a more secure default state, but > didn't take sufficient care to also undo the effects when handing a > previously disabled device back to a(nother) domain. Put the fields > that may have been changed elsewhere back to their intended values > (some fields amd_iommu_disable_domain_device() touches don't > currently get written anywhere else, and hence don't need modifying > here). > > Reported-by: Sander Eikelenboom > Signed-off-by: Jan Beulich > > --- unstable.orig/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ unstable/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic > > if ( !dte->v || !dte->tv ) > { > +const struct ivrs_mappings *ivrs_dev; > + > /* bind DTE to domain page-tables */ > amd_iommu_set_root_page_table( > dte, page_to_maddr(hd->arch.root_table), domain->domain_id, > hd->arch.paging_mode, valid); > > +/* Undo what amd_iommu_disable_domain_device() may have done. */ > +ivrs_dev = _ivrs_mappings(iommu->seg)[req_id]; > +if ( dte->it_root ) > +dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED; > +dte->iv = iommu_intremap; > +dte->ex = ivrs_dev->dte_allow_exclusion; > +dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, > ACPI_IVHD_SYSTEM_MGMT); > + > if ( pci_ats_device(iommu->seg, bus, pdev->devfn) && > iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) ) > dte->i = ats_enabled; > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 11.11.2019 22:38, Sander Eikelenboom wrote: > When supplying "pci=nomsi" to the guest kernel, the device works fine, > and I don't get the "INVALID_DEV_REQUEST". > > After reverting 1b00c16bdf, the device works fine > and I don't get the INVALID_DEV_REQUEST, Could you give the patch below a try? That commit took care of only securing ourselves, but not of relaxing things again when a device gets handed to a guest for actual use. Jan AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device() Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table allocation") moved ourselves into a more secure default state, but didn't take sufficient care to also undo the effects when handing a previously disabled device back to a(nother) domain. Put the fields that may have been changed elsewhere back to their intended values (some fields amd_iommu_disable_domain_device() touches don't currently get written anywhere else, and hence don't need modifying here). Reported-by: Sander Eikelenboom Signed-off-by: Jan Beulich --- unstable.orig/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ unstable/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic if ( !dte->v || !dte->tv ) { +const struct ivrs_mappings *ivrs_dev; + /* bind DTE to domain page-tables */ amd_iommu_set_root_page_table( dte, page_to_maddr(hd->arch.root_table), domain->domain_id, hd->arch.paging_mode, valid); +/* Undo what amd_iommu_disable_domain_device() may have done. */ +ivrs_dev = _ivrs_mappings(iommu->seg)[req_id]; +if ( dte->it_root ) +dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED; +dte->iv = iommu_intremap; +dte->ex = ivrs_dev->dte_allow_exclusion; +dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT); + if ( pci_ats_device(iommu->seg, bus, pdev->devfn) && iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) ) dte->i = ats_enabled; ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 11/11/2019 16:35, Jan Beulich wrote: > On 31.10.2019 21:48, Sander Eikelenboom wrote: >> - The usb3 controller malfunctioning seems indeed to be a separate issue >> (which seems unfortunate, >> because a bisect seems to become even nastier with all the intertwined >> pci-passthrough issues). >> >> Perhaps this one is then related to the only *once* occuring message: >> (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST >> 0800 8a00 f8000840 00fd >> >> While in the guest it is endlessly repeating: >> [ 231.385566] xhci_hcd :00:05.0: Max number of devices this >> xHCI host supports is 32. >> [ 231.407351] usb usb1-port2: couldn't allocate usb_device > > I'm uncertain whether there's a correlation: The device the Xen > message is about is 08:00.0; please let us know what kind of device > that is (the hypervisor log alone don't allow me to guess). > > The specific type is described as "Posted write to the Interrupt/EOI > range from an I/O device that has IntCtl=00b in the device’s DTE." > This would make me guess 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs > right after table allocation") is the culprit here, and I may need > to hand you a debugging patch to gain some insight. But let me first > take a look at sufficiently verbose lspci output from that system. > > Jan > Hi Jan, When supplying "pci=nomsi" to the guest kernel, the device works fine, and I don't get the "INVALID_DEV_REQUEST". After reverting 1b00c16bdf, the device works fine and I don't get the INVALID_DEV_REQUEST, Below is the output of lspci -vvvknn from dom0 for 08:00.0: - just after boot (device owned by pciback / dom0, not active yet) - after the guests have started (owned by guest with a working device) So it is enabling MSI-X interrupts, which is indeed different from the other devices I pass through which seem to use legacy interrupts. This also shows in the guest with a working device in /proc/interrupts: 98: 17846 0 0 0 xen-pirq-msi-x xhci_hcd 99: 0 0 0 0 xen-pirq-msi-x xhci_hcd 100: 0 0 0 0 xen-pirq-msi-x xhci_hcd 101: 0 0 0 0 xen-pirq-msi-x xhci_hcd 102: 0 0 0 0 xen-pirq-msi-x xhci_hcd I forgot to take a snapshot of /proc/interrupts in the guest in the malfunctioning state. -- Sander just after boot (device owned by pciback / dom0, not active yet): 08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03) (prog-if 30 [XHCI]) Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413] Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 31.10.2019 21:48, Sander Eikelenboom wrote: > While in the guest it is endlessly repeating: > [ 231.385566] xhci_hcd :00:05.0: Max number of devices this > xHCI host supports is 32. > [ 231.407351] usb usb1-port2: couldn't allocate usb_device For this one, could you try "pci=nomsi" on the Linux kernel command line? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 31.10.2019 21:48, Sander Eikelenboom wrote: > - The usb3 controller malfunctioning seems indeed to be a separate issue > (which seems unfortunate, > because a bisect seems to become even nastier with all the intertwined > pci-passthrough issues). > > Perhaps this one is then related to the only *once* occuring message: > (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST > 0800 8a00 f8000840 00fd > > While in the guest it is endlessly repeating: > [ 231.385566] xhci_hcd :00:05.0: Max number of devices this > xHCI host supports is 32. > [ 231.407351] usb usb1-port2: couldn't allocate usb_device I'm uncertain whether there's a correlation: The device the Xen message is about is 08:00.0; please let us know what kind of device that is (the hypervisor log alone don't allow me to guess). The specific type is described as "Posted write to the Interrupt/EOI range from an I/O device that has IntCtl=00b in the device’s DTE." This would make me guess 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table allocation") is the culprit here, and I may need to hand you a debugging patch to gain some insight. But let me first take a look at sufficiently verbose lspci output from that system. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 31/10/2019 11:15, Jan Beulich wrote: > On 30.10.2019 23:21, Sander Eikelenboom wrote: >> Call trace seems to be the same in all cases. >> >> -- >> Sander >> >> >> (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access >> pdev_list without aquiring pcidevs_lock. >> (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc x86_64 debug=y Not >> tainted ] >> (XEN) [2019-10-30 22:07:05.748] CPU:1 >> (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] >> iommu_map.c#update_paging_mode+0x1f2/0x3eb >> (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286 CONTEXT: >> hypervisor (d0v2) > > I didn't pay attention to this when writing my earlier reply: The > likely culprit looks to be f89f555827 ("remove late (on-demand) > construction of IOMMU page tables"). Prior to this I assume IOMMU > page tables got constructed only after ... OK, I tested f89f555827 and f89f555827~1, my observations: with f89f555827~1: - I'm NOT seeing the aquiring pcidevs_lock message - the usb3 controller is also working. with f89f555827: - I'm now seeing the aquiring pcidevs_lock messages. - but I'm NOT seeing them *once* per booting guest, but multiple times. - the usb3 controller is still working. with staging: - Seeing the aquiring pcidevs_lock messages, but only *once* per guest boot. - the usb3 controller goes haywire in the guest. So you seem to be right about both things: - f89f555827 is the culprit for the aquiring pcidevs_lock messages. Although I get less of them with current staging, so some other later patch must have had some influence in reducing the amount. - The usb3 controller malfunctioning seems indeed to be a separate issue (which seems unfortunate, because a bisect seems to become even nastier with all the intertwined pci-passthrough issues). Perhaps this one is then related to the only *once* occuring message: (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST 0800 8a00 f8000840 00fd While in the guest it is endlessly repeating: [ 231.385566] xhci_hcd :00:05.0: Max number of devices this xHCI host supports is 32. [ 231.407351] usb usb1-port2: couldn't allocate usb_device Hopefully this also gives you a hunch as to which commits to look at. -- Sander >> (XEN) [2019-10-30 22:07:05.748] Xen call trace: >> (XEN) [2019-10-30 22:07:05.748][] R >> iommu_map.c#update_paging_mode+0x1f2/0x3eb >> (XEN) [2019-10-30 22:07:05.748][] F >> amd_iommu_map_page+0x72/0x1c2 >> (XEN) [2019-10-30 22:07:05.748][] F >> iommu_map+0x98/0x17e >> (XEN) [2019-10-30 22:07:05.748][] F >> iommu_legacy_map+0x28/0x73 >> (XEN) [2019-10-30 22:07:05.748][] F >> p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844 >> (XEN) [2019-10-30 22:07:05.748][] F >> p2m_set_entry+0x91/0x128 >> (XEN) [2019-10-30 22:07:05.748][] F >> guest_physmap_add_entry+0x39f/0x5a3 >> (XEN) [2019-10-30 22:07:05.748][] F >> guest_physmap_add_page+0x12f/0x138 >> (XEN) [2019-10-30 22:07:05.748][] F >> memory.c#populate_physmap+0x2e3/0x505 > > ... Dom0 had populated the new guest's physmap. > > Anyway, as odd as it may seem I guess there's little choice > besides making populate_physmap() (and likely a few others) > acquire the lock. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 30.10.2019 23:21, Sander Eikelenboom wrote: > Call trace seems to be the same in all cases. > > -- > Sander > > > (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access > pdev_list without aquiring pcidevs_lock. > (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc x86_64 debug=y Not > tainted ] > (XEN) [2019-10-30 22:07:05.748] CPU:1 > (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] > iommu_map.c#update_paging_mode+0x1f2/0x3eb > (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286 CONTEXT: > hypervisor (d0v2) I didn't pay attention to this when writing my earlier reply: The likely culprit looks to be f89f555827 ("remove late (on-demand) construction of IOMMU page tables"). Prior to this I assume IOMMU page tables got constructed only after ... > (XEN) [2019-10-30 22:07:05.748] Xen call trace: > (XEN) [2019-10-30 22:07:05.748][] R > iommu_map.c#update_paging_mode+0x1f2/0x3eb > (XEN) [2019-10-30 22:07:05.748][] F > amd_iommu_map_page+0x72/0x1c2 > (XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e > (XEN) [2019-10-30 22:07:05.748][] F > iommu_legacy_map+0x28/0x73 > (XEN) [2019-10-30 22:07:05.748][] F > p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844 > (XEN) [2019-10-30 22:07:05.748][] F > p2m_set_entry+0x91/0x128 > (XEN) [2019-10-30 22:07:05.748][] F > guest_physmap_add_entry+0x39f/0x5a3 > (XEN) [2019-10-30 22:07:05.748][] F > guest_physmap_add_page+0x12f/0x138 > (XEN) [2019-10-30 22:07:05.748][] F > memory.c#populate_physmap+0x2e3/0x505 ... Dom0 had populated the new guest's physmap. Anyway, as odd as it may seem I guess there's little choice besides making populate_physmap() (and likely a few others) acquire the lock. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 31/10/2019 10:18, Jan Beulich wrote: > On 31.10.2019 09:35, Sander Eikelenboom wrote: >> Platform is perhaps what specific (older AMD 890FX chipset) and I need the >> bios workaround: >> ivrs_ioapic[6]=00:14.0 iommu=on. > > Shouldn't matter here. > >> On the other hand, this has ran like this for quite some time. >> >> I have 3 guests (HVM) for which i use PCI passthrough and >> for each of those 3 guests I get this message *once* on start of the guest. >> One guest has a soundcard passed through, >> One guest has a USB2 card passed through, >> One guest has a USB3 card passed through. >> >> Another observation is that both the soundcard and USB2 card >> still seem to function despite the message. > > Reality is - this message is benign as long as you don't do PCI > hot (un)plug. I don't use any of: pci-attach pci-detach pci-list pci-assignable-add pci-assignable-remove pci-assignable-list Only shutting down and (re)starting VMs with the devices specified in the vm cfg file. >> The USB3 controller goes haywire though (a lot of driver messages in the >> guest during init). > > As a consequence I don't think there's a connection between this > and the observed message. Ok, although it functions fine when (with same kernel etc. reverting to the commit I referenced to below), if so, that would be another issue then. CC'ed Juergen as release manager so he is aware. >> I could try to bisect, but that would be somewhere next week before I can >> get to that. >> >> At present I run with a tree with as latest commit >> ee7170822f1fc209f33feb47b268bab35541351d, >> which is stable for me. This predates some of the IOMMU changes and >> Anthony's QMP work that had >> some issues, but that would be the last known real good point for me to >> start a bisect from. > > I.e. at that point you didn't observe this message, yet? With ee7170822f1fc209f33feb47b268bab35541351d, nor this message, nor the "INVALID_DEV_REQUEST", even with longer uptimes. -- Sander > Jan > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 31.10.2019 09:35, Sander Eikelenboom wrote: > Platform is perhaps what specific (older AMD 890FX chipset) and I need the > bios workaround: > ivrs_ioapic[6]=00:14.0 iommu=on. Shouldn't matter here. > On the other hand, this has ran like this for quite some time. > > I have 3 guests (HVM) for which i use PCI passthrough and > for each of those 3 guests I get this message *once* on start of the guest. > One guest has a soundcard passed through, > One guest has a USB2 card passed through, > One guest has a USB3 card passed through. > > Another observation is that both the soundcard and USB2 card > still seem to function despite the message. Reality is - this message is benign as long as you don't do PCI hot (un)plug. > The USB3 controller goes haywire though (a lot of driver messages in the > guest during init). As a consequence I don't think there's a connection between this and the observed message. > I could try to bisect, but that would be somewhere next week before I can get > to that. > > At present I run with a tree with as latest commit > ee7170822f1fc209f33feb47b268bab35541351d, > which is stable for me. This predates some of the IOMMU changes and Anthony's > QMP work that had > some issues, but that would be the last known real good point for me to start > a bisect from. I.e. at that point you didn't observe this message, yet? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 30.10.2019 23:21, Sander Eikelenboom wrote: > Call trace seems to be the same in all cases. Thanks much. > (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access > pdev_list without aquiring pcidevs_lock. > (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc x86_64 debug=y Not > tainted ] > (XEN) [2019-10-30 22:07:05.748] CPU:1 > (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] > iommu_map.c#update_paging_mode+0x1f2/0x3eb > (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286 CONTEXT: > hypervisor (d0v2) > (XEN) [2019-10-30 22:07:05.748] rax: 830523f9 rbx: 82e004905f00 > rcx: > (XEN) [2019-10-30 22:07:05.748] rdx: 0001 rsi: 000a > rdi: 82d0804a0698 > (XEN) [2019-10-30 22:07:05.748] rbp: 830523f9f848 rsp: 830523f9f808 > r8: 8305320a > (XEN) [2019-10-30 22:07:05.748] r9: 0038 r10: 0002 > r11: 000a > (XEN) [2019-10-30 22:07:05.748] r12: 82e004905f00 r13: 0003 > r14: 0003 > (XEN) [2019-10-30 22:07:05.748] r15: 83041fb83000 cr0: 80050033 > cr4: 06e0 > (XEN) [2019-10-30 22:07:05.748] cr3: 00040a58d000 cr2: 8880604835a0 > (XEN) [2019-10-30 22:07:05.748] fsb: 7f4b8f899bc0 gsb: 88807d48 > gss: > (XEN) [2019-10-30 22:07:05.748] ds: es: fs: gs: > ss: e010 cs: e008 > (XEN) [2019-10-30 22:07:05.748] Xen code around > (iommu_map.c#update_paging_mode+0x1f2/0x3eb): > (XEN) [2019-10-30 22:07:05.748] 3d 3b 7b 22 00 00 75 07 <0f> 0b e9 c2 01 00 > 00 48 8d 35 1a ce 13 00 48 8d > (XEN) [2019-10-30 22:07:05.748] Xen stack trace from rsp=830523f9f808: [...] > (XEN) [2019-10-30 22:07:05.748] Xen call trace: > (XEN) [2019-10-30 22:07:05.748][] R > iommu_map.c#update_paging_mode+0x1f2/0x3eb > (XEN) [2019-10-30 22:07:05.748][] F > amd_iommu_map_page+0x72/0x1c2 > (XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e > (XEN) [2019-10-30 22:07:05.748][] F > iommu_legacy_map+0x28/0x73 > (XEN) [2019-10-30 22:07:05.748][] F > p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844 > (XEN) [2019-10-30 22:07:05.748][] F > p2m_set_entry+0x91/0x128 > (XEN) [2019-10-30 22:07:05.748][] F > guest_physmap_add_entry+0x39f/0x5a3 > (XEN) [2019-10-30 22:07:05.748][] F > guest_physmap_add_page+0x12f/0x138 > (XEN) [2019-10-30 22:07:05.748][] F > memory.c#populate_physmap+0x2e3/0x505 > (XEN) [2019-10-30 22:07:05.748][] F > do_memory_op+0x695/0x1bf7 > (XEN) [2019-10-30 22:07:05.748][] F > pv_hypercall+0x2ca/0x537 > (XEN) [2019-10-30 22:07:05.748][] F > lstar_enter+0x112/0x120 Now this looks to be a pretty common path, i.e. I wonder why no-one before has noticed this message getting logged. Fixing, as it seems, will require careful auditing of lock nesting, as the PCI devices lock will need to be acquired on a path that's entirely unrelated to any PCI operation; I'll try to get to this asap. Is there anything special about the guest that triggers this? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 30/10/2019 16:48, Jan Beulich wrote: > On 28.10.2019 11:32, Sander Eikelenboom wrote: >> While testing the latest xen-unstable and starting an HVM guest with >> pci-passtrough on my AMD machine, >> my eye catched the following messages in xl dmesg I haven't seen before: >> >> (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access >> pdev_list without aquiring pcidevs_lock. > > Unfortunately this sits on the map/unmap path, and hence the > violator is far up one of the many call chains. Therefore I'd > like to ask that you rebuild and retry with the debugging > patch below. In case you observe multiple different call > trees, post them all please. > > Jan Hi Jan, Call trace seems to be the same in all cases. -- Sander (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock. (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc x86_64 debug=y Not tainted ] (XEN) [2019-10-30 22:07:05.748] CPU:1 (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] iommu_map.c#update_paging_mode+0x1f2/0x3eb (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286 CONTEXT: hypervisor (d0v2) (XEN) [2019-10-30 22:07:05.748] rax: 830523f9 rbx: 82e004905f00 rcx: (XEN) [2019-10-30 22:07:05.748] rdx: 0001 rsi: 000a rdi: 82d0804a0698 (XEN) [2019-10-30 22:07:05.748] rbp: 830523f9f848 rsp: 830523f9f808 r8: 8305320a (XEN) [2019-10-30 22:07:05.748] r9: 0038 r10: 0002 r11: 000a (XEN) [2019-10-30 22:07:05.748] r12: 82e004905f00 r13: 0003 r14: 0003 (XEN) [2019-10-30 22:07:05.748] r15: 83041fb83000 cr0: 80050033 cr4: 06e0 (XEN) [2019-10-30 22:07:05.748] cr3: 00040a58d000 cr2: 8880604835a0 (XEN) [2019-10-30 22:07:05.748] fsb: 7f4b8f899bc0 gsb: 88807d48 gss: (XEN) [2019-10-30 22:07:05.748] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2019-10-30 22:07:05.748] Xen code around (iommu_map.c#update_paging_mode+0x1f2/0x3eb): (XEN) [2019-10-30 22:07:05.748] 3d 3b 7b 22 00 00 75 07 <0f> 0b e9 c2 01 00 00 48 8d 35 1a ce 13 00 48 8d (XEN) [2019-10-30 22:07:05.748] Xen stack trace from rsp=830523f9f808: (XEN) [2019-10-30 22:07:05.748]82e00a6bc6e0 82e00a6bc6e0 83041fb83000 83041fb83000 (XEN) [2019-10-30 22:07:05.748]83041fb83148 000feff8 83041fb83150 830523f9f93c (XEN) [2019-10-30 22:07:05.748]830523f9f8c8 82d080265ded 000380240580 002482f9 (XEN) [2019-10-30 22:07:05.748] (XEN) [2019-10-30 22:07:05.748] (XEN) [2019-10-30 22:07:05.748]83041fb83000 000feff8 000feff8 002482f9 (XEN) [2019-10-30 22:07:05.748]830523f9f928 82d0802583b6 000feff8 0001 (XEN) [2019-10-30 22:07:05.748]0003802405da 002482f9 830523f9f93c 0003 (XEN) [2019-10-30 22:07:05.748]83041fb83000 000feff8 (XEN) [2019-10-30 22:07:05.748]830523f9f960 82d0802586fb 48834780 0003 (XEN) [2019-10-30 22:07:05.748] 830248834780 000feff8 830523f9f9f8 (XEN) [2019-10-30 22:07:05.748]82d08034a4a6 8038a845 820040024fc0 (XEN) [2019-10-30 22:07:05.748]83041fb83000 002482f9 8002484b5367 (XEN) [2019-10-30 22:07:05.748]82004002d5a0 8002484b5367 (XEN) [2019-10-30 22:07:05.748]820040024000 002482f9 000feff8 (XEN) [2019-10-30 22:07:05.748]0001 830248834780 830523f9fa50 82d080342e13 (XEN) [2019-10-30 22:07:05.748] 0007 83041fb83000 0023 (XEN) [2019-10-30 22:07:05.748]002482fa 830248834780 002482f9 (XEN) [2019-10-30 22:07:05.748]000feff8 830523f9fac8 82d080343c52 23f9fa78 (XEN) [2019-10-30 22:07:05.748]0001 002482f9 000feff8 830523f9fa98 (XEN) [2019-10-30 22:07:05.748] Xen call trace: (XEN) [2019-10-30 22:07:05.748][] R iommu_map.c#update_paging_mode+0x1f2/0x3eb (XEN) [2019-10-30 22:07:05.748][] F amd_iommu_map_page+0x72/0x1c2 (XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e (XEN) [2019-10-30 22:07:05.748][] F iommu_legacy_map+0x28/0x73 (XEN) [2019-10-30 22:07:05.748][] F p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844 (XEN) [2019-10-30 22:07:05.748][] F p2m_set_entry+0x91/0x128 (XEN) [2019-10-30 22:07:05.748][] F guest_physmap_add_entry+0x39f/0x5a3 (XEN)
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 28.10.2019 11:32, Sander Eikelenboom wrote: > While testing the latest xen-unstable and starting an HVM guest with > pci-passtrough on my AMD machine, > my eye catched the following messages in xl dmesg I haven't seen before: > > (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access > pdev_list without aquiring pcidevs_lock. Unfortunately this sits on the map/unmap path, and hence the violator is far up one of the many call chains. Therefore I'd like to ask that you rebuild and retry with the debugging patch below. In case you observe multiple different call trees, post them all please. Jan --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c +++ unstable/xen/drivers/passthrough/amd/iommu_map.c @@ -331,9 +331,12 @@ static int update_paging_mode(struct dom hd->arch.paging_mode = level; hd->arch.root_table = new_root; -if ( !pcidevs_locked() ) +if ( iommu_debug && !pcidevs_locked() ) +{ AMD_IOMMU_DEBUG("%s Try to access pdev_list " "without aquiring pcidevs_lock.\n", __func__); +dump_execution_state(); +} /* Update device table entries using new root table and paging mode */ for_each_pdev( d, pdev ) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.
On 28.10.2019 11:32, Sander Eikelenboom wrote: > Hi Jan / Andrew, > > While testing the latest xen-unstable and starting an HVM guest with > pci-passtrough on my AMD machine, > my eye catched the following messages in xl dmesg I haven't seen before: > > (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access > pdev_list without aquiring pcidevs_lock. > (XEN) [2019-10-28 10:24:08.136] AMD-Vi: INVALID_DEV_REQUEST 0800 8a00 > f8000240 00fd > > Probably something from the AMD iommu rework that got committed lately ? Not very likely at least for the former; I'll have to look at the latter in some more detail (and at both to see whether I can spot a sensible solution). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel