Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-11-12 Thread Sander Eikelenboom
On 12/11/2019 12:05, Jan Beulich wrote:
> On 11.11.2019 22:38, Sander Eikelenboom wrote:
>> When supplying "pci=nomsi" to the guest kernel, the device works fine,
>> and I don't get the "INVALID_DEV_REQUEST".
>>
>> After reverting 1b00c16bdf, the device works fine 
>> and I don't get the INVALID_DEV_REQUEST, 
> 
> Could you give the patch below a try? That commit took care of only
> securing ourselves, but not of relaxing things again when a device
> gets handed to a guest for actual use.
> 
> Jan

Hi Jan,

CC'ed Juergen, as he seems to be dropped off the CC-list at some time.

Just tested this patch: 
the device works fine and I don't get the INVALID_DEV_REQUEST.

This was the last remaining issue around pci passthrough I encountered, 
with all patches applied (yours and Anthony's) pci passthrough for me 
seems to work again as I was used to.

Thanks again for fixing the issues and providing the right educated guesses!

--
Sander


> AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device()
> 
> Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
> allocation") moved ourselves into a more secure default state, but
> didn't take sufficient care to also undo the effects when handing a
> previously disabled device back to a(nother) domain. Put the fields
> that may have been changed elsewhere back to their intended values
> (some fields amd_iommu_disable_domain_device() touches don't
> currently get written anywhere else, and hence don't need modifying
> here).
> 
> Reported-by: Sander Eikelenboom 
> Signed-off-by: Jan Beulich 
> 
> --- unstable.orig/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ unstable/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic
>  
>  if ( !dte->v || !dte->tv )
>  {
> +const struct ivrs_mappings *ivrs_dev;
> +
>  /* bind DTE to domain page-tables */
>  amd_iommu_set_root_page_table(
>  dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
>  hd->arch.paging_mode, valid);
>  
> +/* Undo what amd_iommu_disable_domain_device() may have done. */
> +ivrs_dev = _ivrs_mappings(iommu->seg)[req_id];
> +if ( dte->it_root )
> +dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +dte->iv = iommu_intremap;
> +dte->ex = ivrs_dev->dte_allow_exclusion;
> +dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, 
> ACPI_IVHD_SYSTEM_MGMT);
> +
>  if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
>   iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
>  dte->i = ats_enabled;
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-11-12 Thread Jan Beulich
On 11.11.2019 22:38, Sander Eikelenboom wrote:
> When supplying "pci=nomsi" to the guest kernel, the device works fine,
> and I don't get the "INVALID_DEV_REQUEST".
> 
> After reverting 1b00c16bdf, the device works fine 
> and I don't get the INVALID_DEV_REQUEST, 

Could you give the patch below a try? That commit took care of only
securing ourselves, but not of relaxing things again when a device
gets handed to a guest for actual use.

Jan

AMD/IOMMU: restore DTE fields in amd_iommu_setup_domain_device()

Commit 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs right after table
allocation") moved ourselves into a more secure default state, but
didn't take sufficient care to also undo the effects when handing a
previously disabled device back to a(nother) domain. Put the fields
that may have been changed elsewhere back to their intended values
(some fields amd_iommu_disable_domain_device() touches don't
currently get written anywhere else, and hence don't need modifying
here).

Reported-by: Sander Eikelenboom 
Signed-off-by: Jan Beulich 

--- unstable.orig/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ unstable/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -114,11 +114,21 @@ static void amd_iommu_setup_domain_devic
 
 if ( !dte->v || !dte->tv )
 {
+const struct ivrs_mappings *ivrs_dev;
+
 /* bind DTE to domain page-tables */
 amd_iommu_set_root_page_table(
 dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
 hd->arch.paging_mode, valid);
 
+/* Undo what amd_iommu_disable_domain_device() may have done. */
+ivrs_dev = _ivrs_mappings(iommu->seg)[req_id];
+if ( dte->it_root )
+dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+dte->iv = iommu_intremap;
+dte->ex = ivrs_dev->dte_allow_exclusion;
+dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, 
ACPI_IVHD_SYSTEM_MGMT);
+
 if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
  iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
 dte->i = ats_enabled;


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-11-11 Thread Sander Eikelenboom
On 11/11/2019 16:35, Jan Beulich wrote:
> On 31.10.2019 21:48, Sander Eikelenboom wrote:
>> - The usb3 controller malfunctioning seems indeed to be a separate issue 
>> (which seems unfortunate, 
>>   because a bisect seems to become even nastier with all the intertwined 
>> pci-passthrough issues).
>>   
>>   Perhaps this one is then related to the only *once* occuring message: 
>>   (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST 
>> 0800 8a00 f8000840 00fd
>>  
>>   While in the guest it is endlessly repeating:
>>   [  231.385566] xhci_hcd :00:05.0: Max number of devices this 
>> xHCI host supports is 32.
>>   [  231.407351] usb usb1-port2: couldn't allocate usb_device
> 
> I'm uncertain whether there's a correlation: The device the Xen
> message is about is 08:00.0; please let us know what kind of device
> that is (the hypervisor log alone don't allow me to guess).
> 
> The specific type is described as "Posted write to the Interrupt/EOI
> range from an I/O device that has IntCtl=00b in the device’s DTE."
> This would make me guess 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs
> right after table allocation") is the culprit here, and I may need
> to hand you a debugging patch to gain some insight. But let me first
> take a look at sufficiently verbose lspci output from that system.
> 
> Jan
> 

Hi Jan,

When supplying "pci=nomsi" to the guest kernel, the device works fine,
and I don't get the "INVALID_DEV_REQUEST".

After reverting 1b00c16bdf, the device works fine 
and I don't get the INVALID_DEV_REQUEST, 

Below is the output of lspci -vvvknn from dom0 for 08:00.0:
- just after boot (device owned by pciback / dom0, not active yet)
- after the guests have started (owned by guest with a working device)

So it is enabling MSI-X interrupts, which is indeed different from the other 
devices I pass through which
seem to use legacy interrupts.
This also shows in the guest with a working device in /proc/interrupts:
 98:  17846  0  0  0  xen-pirq-msi-x 
xhci_hcd
 99:  0  0  0  0  xen-pirq-msi-x 
xhci_hcd
100:  0  0  0  0  xen-pirq-msi-x 
xhci_hcd
101:  0  0  0  0  xen-pirq-msi-x 
xhci_hcd
102:  0  0  0  0  xen-pirq-msi-x 
xhci_hcd

I forgot to take a snapshot of /proc/interrupts in the guest in the 
malfunctioning state.

--
Sander


just after boot (device owned by pciback / dom0, not active yet):
08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host 
Controller [1033:0194] (rev 03) (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-11-11 Thread Jan Beulich
On 31.10.2019 21:48, Sander Eikelenboom wrote:
>   While in the guest it is endlessly repeating:
>   [  231.385566] xhci_hcd :00:05.0: Max number of devices this 
> xHCI host supports is 32.
>   [  231.407351] usb usb1-port2: couldn't allocate usb_device

For this one, could you try "pci=nomsi" on the Linux kernel command
line?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-11-11 Thread Jan Beulich
On 31.10.2019 21:48, Sander Eikelenboom wrote:
> - The usb3 controller malfunctioning seems indeed to be a separate issue 
> (which seems unfortunate, 
>   because a bisect seems to become even nastier with all the intertwined 
> pci-passthrough issues).
>   
>   Perhaps this one is then related to the only *once* occuring message: 
>   (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST 
> 0800 8a00 f8000840 00fd
>  
>   While in the guest it is endlessly repeating:
>   [  231.385566] xhci_hcd :00:05.0: Max number of devices this 
> xHCI host supports is 32.
>   [  231.407351] usb usb1-port2: couldn't allocate usb_device

I'm uncertain whether there's a correlation: The device the Xen
message is about is 08:00.0; please let us know what kind of device
that is (the hypervisor log alone don't allow me to guess).

The specific type is described as "Posted write to the Interrupt/EOI
range from an I/O device that has IntCtl=00b in the device’s DTE."
This would make me guess 1b00c16bdf ("AMD/IOMMU: pre-fill all DTEs
right after table allocation") is the culprit here, and I may need
to hand you a debugging patch to gain some insight. But let me first
take a look at sufficiently verbose lspci output from that system.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-31 Thread Sander Eikelenboom
On 31/10/2019 11:15, Jan Beulich wrote:
> On 30.10.2019 23:21, Sander Eikelenboom wrote:
>> Call trace seems to be the same in all cases.
>>
>> --
>> Sander
>>
>>
>> (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access 
>> pdev_list without aquiring pcidevs_lock.
>> (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc  x86_64  debug=y   Not 
>> tainted ]
>> (XEN) [2019-10-30 22:07:05.748] CPU:1
>> (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] 
>> iommu_map.c#update_paging_mode+0x1f2/0x3eb
>> (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286   CONTEXT: 
>> hypervisor (d0v2)
> 
> I didn't pay attention to this when writing my earlier reply: The
> likely culprit looks to be f89f555827 ("remove late (on-demand)
> construction of IOMMU page tables"). Prior to this I assume IOMMU
> page tables got constructed only after ...

OK, I tested f89f555827 and f89f555827~1, my observations:

with f89f555827~1:
- I'm NOT seeing the aquiring pcidevs_lock message
- the usb3 controller is also working.

with f89f555827:
- I'm now seeing the aquiring pcidevs_lock messages.
- but I'm NOT seeing them *once* per booting guest, but multiple times.
- the usb3 controller is still working.

with staging:
- Seeing the aquiring pcidevs_lock messages, but only *once* per guest 
boot.
- the usb3 controller goes haywire in the guest.

So you seem to be right about both things:
- f89f555827 is the culprit for the aquiring pcidevs_lock messages. 
  Although I get less of them with current staging, so some other later 
patch must have had some influence
  in reducing the amount.

- The usb3 controller malfunctioning seems indeed to be a separate issue 
(which seems unfortunate, 
  because a bisect seems to become even nastier with all the intertwined 
pci-passthrough issues).
  
  Perhaps this one is then related to the only *once* occuring message: 
  (XEN) [2019-10-31 20:39:30.746] AMD-Vi: INVALID_DEV_REQUEST 0800 
8a00 f8000840 00fd
 
  While in the guest it is endlessly repeating:
  [  231.385566] xhci_hcd :00:05.0: Max number of devices this xHCI 
host supports is 32.
  [  231.407351] usb usb1-port2: couldn't allocate usb_device

  Hopefully this also gives you a hunch as to which commits to look at.

--
Sander

>> (XEN) [2019-10-30 22:07:05.748] Xen call trace:
>> (XEN) [2019-10-30 22:07:05.748][] R 
>> iommu_map.c#update_paging_mode+0x1f2/0x3eb
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> amd_iommu_map_page+0x72/0x1c2
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> iommu_map+0x98/0x17e
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> iommu_legacy_map+0x28/0x73
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> p2m_set_entry+0x91/0x128
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> guest_physmap_add_entry+0x39f/0x5a3
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> guest_physmap_add_page+0x12f/0x138
>> (XEN) [2019-10-30 22:07:05.748][] F 
>> memory.c#populate_physmap+0x2e3/0x505
> 
> ... Dom0 had populated the new guest's physmap.
> 
> Anyway, as odd as it may seem I guess there's little choice
> besides making populate_physmap() (and likely a few others)
> acquire the lock.
> 
> Jan
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-31 Thread Jan Beulich
On 30.10.2019 23:21, Sander Eikelenboom wrote:
> Call trace seems to be the same in all cases.
> 
> --
> Sander
> 
> 
> (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access 
> pdev_list without aquiring pcidevs_lock.
> (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc  x86_64  debug=y   Not 
> tainted ]
> (XEN) [2019-10-30 22:07:05.748] CPU:1
> (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] 
> iommu_map.c#update_paging_mode+0x1f2/0x3eb
> (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286   CONTEXT: 
> hypervisor (d0v2)

I didn't pay attention to this when writing my earlier reply: The
likely culprit looks to be f89f555827 ("remove late (on-demand)
construction of IOMMU page tables"). Prior to this I assume IOMMU
page tables got constructed only after ...

> (XEN) [2019-10-30 22:07:05.748] Xen call trace:
> (XEN) [2019-10-30 22:07:05.748][] R 
> iommu_map.c#update_paging_mode+0x1f2/0x3eb
> (XEN) [2019-10-30 22:07:05.748][] F 
> amd_iommu_map_page+0x72/0x1c2
> (XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e
> (XEN) [2019-10-30 22:07:05.748][] F 
> iommu_legacy_map+0x28/0x73
> (XEN) [2019-10-30 22:07:05.748][] F 
> p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844
> (XEN) [2019-10-30 22:07:05.748][] F 
> p2m_set_entry+0x91/0x128
> (XEN) [2019-10-30 22:07:05.748][] F 
> guest_physmap_add_entry+0x39f/0x5a3
> (XEN) [2019-10-30 22:07:05.748][] F 
> guest_physmap_add_page+0x12f/0x138
> (XEN) [2019-10-30 22:07:05.748][] F 
> memory.c#populate_physmap+0x2e3/0x505

... Dom0 had populated the new guest's physmap.

Anyway, as odd as it may seem I guess there's little choice
besides making populate_physmap() (and likely a few others)
acquire the lock.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-31 Thread Sander Eikelenboom
On 31/10/2019 10:18, Jan Beulich wrote:
> On 31.10.2019 09:35, Sander Eikelenboom wrote:
>> Platform is perhaps what specific (older AMD 890FX chipset) and I need the 
>> bios workaround:
>> ivrs_ioapic[6]=00:14.0 iommu=on.
> 
> Shouldn't matter here.
> 
>> On the other hand, this has ran like this for quite some time.
>>
>> I have 3 guests (HVM) for which i use PCI passthrough and 
>> for each of those 3 guests I get this message *once* on start of the guest.
>>  One guest has a soundcard passed through,
>>  One guest has a USB2 card passed through,
>>  One guest has a USB3 card passed through.
>>
>> Another observation is that both the soundcard and USB2 card
>> still seem to function despite the message.
> 
> Reality is - this message is benign as long as you don't do PCI
> hot (un)plug.

I don't use any of:
 pci-attach
 pci-detach
 pci-list
 pci-assignable-add
 pci-assignable-remove
 pci-assignable-list

Only shutting down and (re)starting VMs with the devices specified in
the vm cfg file.

>> The USB3 controller goes haywire though (a lot of driver messages in the 
>> guest during init).
> 
> As a consequence I don't think there's a connection between this
> and the observed message.

Ok, although it functions fine when (with same kernel etc. reverting to
the commit I referenced to below), if so, that would be another issue then.

CC'ed Juergen as release manager so he is aware.

>> I could try to bisect, but that would be somewhere next week before I can 
>> get to that.
>>
>> At present I run with a tree with as latest commit 
>> ee7170822f1fc209f33feb47b268bab35541351d,
>> which is stable for me. This predates some of the IOMMU changes and 
>> Anthony's QMP work that had
>> some issues, but that would be the last known real good point for me to 
>> start a bisect from.
> 
> I.e. at that point you didn't observe this message, yet?

With ee7170822f1fc209f33feb47b268bab35541351d, nor this message, nor the
"INVALID_DEV_REQUEST", even with longer uptimes.

--
Sander

> Jan
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-31 Thread Jan Beulich
On 31.10.2019 09:35, Sander Eikelenboom wrote:
> Platform is perhaps what specific (older AMD 890FX chipset) and I need the 
> bios workaround:
> ivrs_ioapic[6]=00:14.0 iommu=on.

Shouldn't matter here.

> On the other hand, this has ran like this for quite some time.
> 
> I have 3 guests (HVM) for which i use PCI passthrough and 
> for each of those 3 guests I get this message *once* on start of the guest.
>   One guest has a soundcard passed through,
>   One guest has a USB2 card passed through,
>   One guest has a USB3 card passed through.
> 
> Another observation is that both the soundcard and USB2 card
> still seem to function despite the message.

Reality is - this message is benign as long as you don't do PCI
hot (un)plug.

> The USB3 controller goes haywire though (a lot of driver messages in the 
> guest during init).

As a consequence I don't think there's a connection between this
and the observed message.

> I could try to bisect, but that would be somewhere next week before I can get 
> to that.
> 
> At present I run with a tree with as latest commit 
> ee7170822f1fc209f33feb47b268bab35541351d,
> which is stable for me. This predates some of the IOMMU changes and Anthony's 
> QMP work that had
> some issues, but that would be the last known real good point for me to start 
> a bisect from.

I.e. at that point you didn't observe this message, yet?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-31 Thread Jan Beulich
On 30.10.2019 23:21, Sander Eikelenboom wrote:
> Call trace seems to be the same in all cases.

Thanks much.

> (XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access 
> pdev_list without aquiring pcidevs_lock.
> (XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc  x86_64  debug=y   Not 
> tainted ]
> (XEN) [2019-10-30 22:07:05.748] CPU:1
> (XEN) [2019-10-30 22:07:05.748] RIP:e008:[] 
> iommu_map.c#update_paging_mode+0x1f2/0x3eb
> (XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286   CONTEXT: 
> hypervisor (d0v2)
> (XEN) [2019-10-30 22:07:05.748] rax: 830523f9   rbx: 82e004905f00 
>   rcx: 
> (XEN) [2019-10-30 22:07:05.748] rdx: 0001   rsi: 000a 
>   rdi: 82d0804a0698
> (XEN) [2019-10-30 22:07:05.748] rbp: 830523f9f848   rsp: 830523f9f808 
>   r8:  8305320a
> (XEN) [2019-10-30 22:07:05.748] r9:  0038   r10: 0002 
>   r11: 000a
> (XEN) [2019-10-30 22:07:05.748] r12: 82e004905f00   r13: 0003 
>   r14: 0003
> (XEN) [2019-10-30 22:07:05.748] r15: 83041fb83000   cr0: 80050033 
>   cr4: 06e0
> (XEN) [2019-10-30 22:07:05.748] cr3: 00040a58d000   cr2: 8880604835a0
> (XEN) [2019-10-30 22:07:05.748] fsb: 7f4b8f899bc0   gsb: 88807d48 
>   gss: 
> (XEN) [2019-10-30 22:07:05.748] ds:    es:    fs:    gs:    
> ss: e010   cs: e008
> (XEN) [2019-10-30 22:07:05.748] Xen code around  
> (iommu_map.c#update_paging_mode+0x1f2/0x3eb):
> (XEN) [2019-10-30 22:07:05.748]  3d 3b 7b 22 00 00 75 07 <0f> 0b e9 c2 01 00 
> 00 48 8d 35 1a ce 13 00 48 8d
> (XEN) [2019-10-30 22:07:05.748] Xen stack trace from rsp=830523f9f808:
[...]
> (XEN) [2019-10-30 22:07:05.748] Xen call trace:
> (XEN) [2019-10-30 22:07:05.748][] R 
> iommu_map.c#update_paging_mode+0x1f2/0x3eb
> (XEN) [2019-10-30 22:07:05.748][] F 
> amd_iommu_map_page+0x72/0x1c2
> (XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e
> (XEN) [2019-10-30 22:07:05.748][] F 
> iommu_legacy_map+0x28/0x73
> (XEN) [2019-10-30 22:07:05.748][] F 
> p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844
> (XEN) [2019-10-30 22:07:05.748][] F 
> p2m_set_entry+0x91/0x128
> (XEN) [2019-10-30 22:07:05.748][] F 
> guest_physmap_add_entry+0x39f/0x5a3
> (XEN) [2019-10-30 22:07:05.748][] F 
> guest_physmap_add_page+0x12f/0x138
> (XEN) [2019-10-30 22:07:05.748][] F 
> memory.c#populate_physmap+0x2e3/0x505
> (XEN) [2019-10-30 22:07:05.748][] F 
> do_memory_op+0x695/0x1bf7
> (XEN) [2019-10-30 22:07:05.748][] F 
> pv_hypercall+0x2ca/0x537
> (XEN) [2019-10-30 22:07:05.748][] F 
> lstar_enter+0x112/0x120

Now this looks to be a pretty common path, i.e. I wonder why no-one
before has noticed this message getting logged. Fixing, as it seems,
will require careful auditing of lock nesting, as the PCI devices
lock will need to be acquired on a path that's entirely unrelated to
any PCI operation; I'll try to get to this asap. Is there anything
special about the guest that triggers this?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-30 Thread Sander Eikelenboom
On 30/10/2019 16:48, Jan Beulich wrote:
> On 28.10.2019 11:32, Sander Eikelenboom wrote:
>> While testing the latest xen-unstable and starting an HVM guest with 
>> pci-passtrough on my AMD machine,
>> my eye catched the following messages in xl dmesg I haven't seen before:
>>
>> (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access 
>> pdev_list without aquiring pcidevs_lock.
> 
> Unfortunately this sits on the map/unmap path, and hence the
> violator is far up one of the many call chains. Therefore I'd
> like to ask that you rebuild and retry with the debugging
> patch below. In case you observe multiple different call
> trees, post them all please.
> 
> Jan

Hi Jan,

Call trace seems to be the same in all cases.

--
Sander


(XEN) [2019-10-30 22:07:05.748] AMD-Vi: update_paging_mode Try to access 
pdev_list without aquiring pcidevs_lock.
(XEN) [2019-10-30 22:07:05.748] [ Xen-4.13.0-rc  x86_64  debug=y   Not 
tainted ]
(XEN) [2019-10-30 22:07:05.748] CPU:1
(XEN) [2019-10-30 22:07:05.748] RIP:e008:[] 
iommu_map.c#update_paging_mode+0x1f2/0x3eb
(XEN) [2019-10-30 22:07:05.748] RFLAGS: 00010286   CONTEXT: hypervisor 
(d0v2)
(XEN) [2019-10-30 22:07:05.748] rax: 830523f9   rbx: 82e004905f00   
rcx: 
(XEN) [2019-10-30 22:07:05.748] rdx: 0001   rsi: 000a   
rdi: 82d0804a0698
(XEN) [2019-10-30 22:07:05.748] rbp: 830523f9f848   rsp: 830523f9f808   
r8:  8305320a
(XEN) [2019-10-30 22:07:05.748] r9:  0038   r10: 0002   
r11: 000a
(XEN) [2019-10-30 22:07:05.748] r12: 82e004905f00   r13: 0003   
r14: 0003
(XEN) [2019-10-30 22:07:05.748] r15: 83041fb83000   cr0: 80050033   
cr4: 06e0
(XEN) [2019-10-30 22:07:05.748] cr3: 00040a58d000   cr2: 8880604835a0
(XEN) [2019-10-30 22:07:05.748] fsb: 7f4b8f899bc0   gsb: 88807d48   
gss: 
(XEN) [2019-10-30 22:07:05.748] ds:    es:    fs:    gs:    ss: 
e010   cs: e008
(XEN) [2019-10-30 22:07:05.748] Xen code around  
(iommu_map.c#update_paging_mode+0x1f2/0x3eb):
(XEN) [2019-10-30 22:07:05.748]  3d 3b 7b 22 00 00 75 07 <0f> 0b e9 c2 01 00 00 
48 8d 35 1a ce 13 00 48 8d
(XEN) [2019-10-30 22:07:05.748] Xen stack trace from rsp=830523f9f808:
(XEN) [2019-10-30 22:07:05.748]82e00a6bc6e0 82e00a6bc6e0 
83041fb83000 83041fb83000
(XEN) [2019-10-30 22:07:05.748]83041fb83148 000feff8 
83041fb83150 830523f9f93c
(XEN) [2019-10-30 22:07:05.748]830523f9f8c8 82d080265ded 
000380240580 002482f9
(XEN) [2019-10-30 22:07:05.748]  
 
(XEN) [2019-10-30 22:07:05.748]  
 
(XEN) [2019-10-30 22:07:05.748]83041fb83000 000feff8 
000feff8 002482f9
(XEN) [2019-10-30 22:07:05.748]830523f9f928 82d0802583b6 
000feff8 0001
(XEN) [2019-10-30 22:07:05.748]0003802405da 002482f9 
830523f9f93c 0003
(XEN) [2019-10-30 22:07:05.748]83041fb83000 000feff8 
 
(XEN) [2019-10-30 22:07:05.748]830523f9f960 82d0802586fb 
48834780 0003
(XEN) [2019-10-30 22:07:05.748] 830248834780 
000feff8 830523f9f9f8
(XEN) [2019-10-30 22:07:05.748]82d08034a4a6 8038a845 
 820040024fc0
(XEN) [2019-10-30 22:07:05.748]83041fb83000 002482f9 
 8002484b5367
(XEN) [2019-10-30 22:07:05.748]82004002d5a0 8002484b5367 
 
(XEN) [2019-10-30 22:07:05.748]820040024000  
002482f9 000feff8
(XEN) [2019-10-30 22:07:05.748]0001 830248834780 
830523f9fa50 82d080342e13
(XEN) [2019-10-30 22:07:05.748] 0007 
83041fb83000 0023
(XEN) [2019-10-30 22:07:05.748]002482fa 830248834780 
 002482f9
(XEN) [2019-10-30 22:07:05.748]000feff8 830523f9fac8 
82d080343c52 23f9fa78
(XEN) [2019-10-30 22:07:05.748]0001 002482f9 
000feff8 830523f9fa98
(XEN) [2019-10-30 22:07:05.748] Xen call trace:
(XEN) [2019-10-30 22:07:05.748][] R 
iommu_map.c#update_paging_mode+0x1f2/0x3eb
(XEN) [2019-10-30 22:07:05.748][] F 
amd_iommu_map_page+0x72/0x1c2
(XEN) [2019-10-30 22:07:05.748][] F iommu_map+0x98/0x17e
(XEN) [2019-10-30 22:07:05.748][] F 
iommu_legacy_map+0x28/0x73
(XEN) [2019-10-30 22:07:05.748][] F 
p2m-pt.c#p2m_pt_set_entry+0x4d3/0x844
(XEN) [2019-10-30 22:07:05.748][] F 
p2m_set_entry+0x91/0x128
(XEN) [2019-10-30 22:07:05.748][] F 
guest_physmap_add_entry+0x39f/0x5a3
(XEN) 

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-30 Thread Jan Beulich
On 28.10.2019 11:32, Sander Eikelenboom wrote:
> While testing the latest xen-unstable and starting an HVM guest with 
> pci-passtrough on my AMD machine,
> my eye catched the following messages in xl dmesg I haven't seen before:
> 
> (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access 
> pdev_list without aquiring pcidevs_lock.

Unfortunately this sits on the map/unmap path, and hence the
violator is far up one of the many call chains. Therefore I'd
like to ask that you rebuild and retry with the debugging
patch below. In case you observe multiple different call
trees, post them all please.

Jan

--- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c
+++ unstable/xen/drivers/passthrough/amd/iommu_map.c
@@ -331,9 +331,12 @@ static int update_paging_mode(struct dom
 hd->arch.paging_mode = level;
 hd->arch.root_table = new_root;
 
-if ( !pcidevs_locked() )
+if ( iommu_debug && !pcidevs_locked() )
+{
 AMD_IOMMU_DEBUG("%s Try to access pdev_list "
 "without aquiring pcidevs_lock.\n", __func__);
+dump_execution_state();
+}
 
 /* Update device table entries using new root table and paging mode */
 for_each_pdev( d, pdev )

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable: AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock.

2019-10-28 Thread Jan Beulich
On 28.10.2019 11:32, Sander Eikelenboom wrote:
> Hi Jan / Andrew,
> 
> While testing the latest xen-unstable and starting an HVM guest with 
> pci-passtrough on my AMD machine,
> my eye catched the following messages in xl dmesg I haven't seen before:
> 
> (XEN) [2019-10-28 10:23:16.372] AMD-Vi: update_paging_mode Try to access 
> pdev_list without aquiring pcidevs_lock.
> (XEN) [2019-10-28 10:24:08.136] AMD-Vi: INVALID_DEV_REQUEST 0800 8a00 
> f8000240 00fd
> 
> Probably something from the AMD iommu rework that got committed lately ?

Not very likely at least for the former; I'll have to look at the
latter in some more detail (and at both to see whether I can spot
a sensible solution).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel