Re: [vfio-users] [PATCH] PCI: Mark Intel bridge on SuperMicro Atom C3xxx motherboards to avoid bus reset

2019-05-29 Thread Bjorn Helgaas
[+cc Alex]

On Fri, May 24, 2019 at 05:31:18PM +0200, Maik Broemme wrote:
> The Intel PCI bridge on SuperMicro Atom C3xxx motherboards do not
> successfully complete a bus reset when used with certain child devices.
> After the reset, config accesses to the child may fail. If assigning
> such device via VFIO it will immediately fail with:
> 
>   vfio-pci :01:00.0: Failed to return from FLR
>   vfio-pci :01:00.0: timed out waiting for pending transaction;
>   performing function level reset anyway

I guess these messages are from v4.13 or earlier, since the "Failed to
return from FLR" text was removed by 821cdad5c46c ("PCI: Wait up to 60
seconds for device to become ready after FLR"), which appeared in
v4.14.

I suppose a current kernel would fail similarly, but could you try it?
I think a current kernel would give more informative messages like:

  not ready XXms after FLR, giving up
  not ready XXms after bus reset, giving up

I don't understand the connection here: the messages you quote are
related to FLR, but the quirk isn't related to FLR.  The quirk
prevents a secondary bus reset.  So is it the case that we try FLR
first, it fails, then we try a secondary bus reset (does this succeed?
you don't mention an error from it), and the device remains
unresponsive and VFIO assignment fails?

And with the quirk, I assume we still try FLR, and it still fails.
But we *don't* try a secondary bus reset, and the device magically
works?  That's confusing to me.

> Device will disappear from PCI device list:
> 
>   !!! Unknown header type 7f
>   Kernel driver in use: vfio-pci
>   Kernel modules: ddbridge
> 
> The attached patch will mark the root port as incapable of doing a
> bus level reset. After that all my tested devices survive a VFIO
> assignment and several VM reboot cycles.
> 
> Signed-off-by: Maik Broemme 
> ---
>  drivers/pci/quirks.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0f16acc323c6..86cd42872708 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3433,6 +3433,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 
> 0x0034, quirk_no_bus_reset);
>   */
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>  
> +/*
> + * Root port on some SuperMicro Atom C3xxx motherboards do not successfully
> + * complete a bus reset when used with certain child devices. After the
> + * reset, config accesses to the child may fail.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x19a4, quirk_no_bus_reset);
> +
>  static void quirk_no_pm_reset(struct pci_dev *dev)
>  {
>   /*
> -- 
> 2.21.0

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] [PATCH] PCI: Mark Intel bridge on SuperMicro Atom C3xxx motherboards to avoid bus reset

2019-05-29 Thread Alex Williamson
On Wed, 29 May 2019 17:03:07 -0500
Bjorn Helgaas  wrote:

> [+cc Alex]
> 
> On Fri, May 24, 2019 at 05:31:18PM +0200, Maik Broemme wrote:
> > The Intel PCI bridge on SuperMicro Atom C3xxx motherboards do not
> > successfully complete a bus reset when used with certain child devices.
> > After the reset, config accesses to the child may fail. If assigning
> > such device via VFIO it will immediately fail with:
> > 
> >   vfio-pci :01:00.0: Failed to return from FLR
> >   vfio-pci :01:00.0: timed out waiting for pending transaction;
> >   performing function level reset anyway  
> 
> I guess these messages are from v4.13 or earlier, since the "Failed to
> return from FLR" text was removed by 821cdad5c46c ("PCI: Wait up to 60
> seconds for device to become ready after FLR"), which appeared in
> v4.14.
> 
> I suppose a current kernel would fail similarly, but could you try it?
> I think a current kernel would give more informative messages like:
> 
>   not ready XXms after FLR, giving up
>   not ready XXms after bus reset, giving up
> 
> I don't understand the connection here: the messages you quote are
> related to FLR, but the quirk isn't related to FLR.  The quirk
> prevents a secondary bus reset.  So is it the case that we try FLR
> first, it fails, then we try a secondary bus reset (does this succeed?
> you don't mention an error from it), and the device remains
> unresponsive and VFIO assignment fails?
> 
> And with the quirk, I assume we still try FLR, and it still fails.
> But we *don't* try a secondary bus reset, and the device magically
> works?  That's confusing to me.

As a counter point, I found a system with this root port in our test
environment.  It's not ideal as this root port has a PCIe-to-PCI bridge
downstream of it with a Matrox graphics downstream of that.  I can't
use vfio-pci to reset this hierarchy, but I can use setpci, ex:

# lspci -nnvs 00:09.0
00:09.0 PCI bridge [0604]: Intel Corporation Device [8086:19a4] (rev 11) 
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 26
Memory at 108000 (64-bit, non-prefetchable) [size=128K]
Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
I/O behind bridge: None
Memory behind bridge: 8400-848f [size=9M]
Prefetchable memory behind bridge: 8200-83ff 
[size=32M]
# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev 03) (prog-if 00 [Normal decode])
Flags: fast devsel
Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
I/O behind bridge: -0fff [size=4K]
Memory behind bridge: -000f [size=1M]
Prefetchable memory behind bridge: -000f 
[size=1M]

(resources are reset from previous experiments)

# setpci -s 00:09.0 3e.w=40:40
# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev ff) (prog-if ff)
!!! Unknown header type 7f

(bus in reset, config space unavailable, EXPECTED)

# setpci -s 00:09.0 3e.w=0:40
[root@intel-harrisonville-01 devices]# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev 03) (prog-if 00 [Normal decode])
Flags: fast devsel
Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
I/O behind bridge: -0fff [size=4K]
Memory behind bridge: -000f [size=1M]
Prefetchable memory behind bridge: -000f 
[size=1M]

(bus out of reset, downstream config space is available again)

I'm also confused about the description of this device:

On Fri, 24 May 2019 20:41:13 +0200 Maik Broemme  wrote:
> Also I've tried a PCI-E switch from PLX technology, sold by MikroTik, the
> RouterBoard RB14eU. It is exports 4 Mini PCI ports in one PCI-E port and
> I tried it with one card and multiple cards.
> 
> All these devices start to work once I enabled the bus reset quirk. The
> RB14eU even allows to assign the individual Mini PCI-E ports to
> different VMs and survive independent resets behind the PLX bridge.

To me this describes a topology like:

[RP]---[US]-+-[DS]--[EP]
+-[DS]--[EP]
+-[DS]--[EP]
\-[DS]--[EP]

(RootPort/UpstreamSwitch/DownstreamSwitch/EndPoint)

We can only assigned endpoints to VMs through vfio, therefore if we
need to reset the EP via a bus reset, that reset would occur at the
downstream switch point, not the root port.  It doesn't make sense that
a quirk at the RP would resolve anything about this use case.

Also, per the Intel datasheet, this is not the only root port in this
processor and presumably they'd all work the same way, so handling one
ID as a special case seems wrong regardless.  Thanks,

Alex

> > Device will disappear from PCI device list:
> 

Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-29 Thread Micah Morton
Ah my bad. Just realized I was using my own copy of SeaBIOS that I had
built. When I use the copy from qemu-3.0.0/pc-bios/bios-256k.bin I see
the i915 driver finding the OpRegion:
[0.269341] in i915_driver_init_hw
[0.269374] [drm] Memory usable by graphics device = 4096M
[0.269585] in intel_opregion_setup
[0.269600] graphic opregion physical addr: 0x7fffe000

Still working on getting the screen to light up

On Wed, May 29, 2019 at 9:44 AM Alex Williamson
 wrote:
>
> On Wed, 29 May 2019 09:25:59 -0700
> Micah Morton  wrote:
>
> > So as I mentioned, the ChromeOS firmware writes the location of the
> > OpRegion to the ASLS PCI config register
> > (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L88).
> > The i915 driver then gets the address for the OpRegion from that
> > register here: 
> > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/intel_opregion.c#L910.
> > This all works for Chrome OS, but when we run a VM with SeaBIOS the
> > ASLS PCI config register doesn't get written with the location of the
> > OpRegion.:
> > [0.263640] in i915_driver_init_hw (I added this)
> > ...
> > [0.263922] in intel_opregion_setup (and this)
> > [0.263954] graphic opregion physical addr: 0x0 <-- This is
> > supposed to point to the OpRegion, not be zero.
> > [0.263954] ACPI OpRegion not supported!
> > ...
> > [0.267727] Failed to find VBIOS tables (VBT)
> >
> > I'm also not sure if the OpRegion is actually in VM memory or not. Do
> > you think I need to find a way to put the OpRegion in VM memory as we
> > have seen coreboot (Chrome OS firmware) do above? Or should using
> > "x-igd-opregion=on" somehow ensure that the OpRegion makes it into VM
> > memory? Clearly I at least need to find a way to set that ASLS PCI
> > config register in the VM or modify the i915 driver that runs in the
> > guest so it can find the OpRegion.
>
> In QEMU, vfio_pci_igd_opregion_init() adds the opregion to a fw_cfg
> file "etc/igd-opregion" and makes the (virtual) ASLS register
> writable.  Then in SeaBIOS, any Intel vendor ID, PCI class VGA device
> will trigger the intel_igd_setup() function, which looks for the fw_cfg
> file, allocates space for it, and writes the GPA back to the ASLS
> register.  That's at least how it's supposed to work, which again
> reminds me for the umpteenth time that x-igd-opregion only works with
> SeaBIOS as OVMF has rejected this support in favor of an option ROM
> based solution, which Intel never provided.  I think you're using
> SeaBIOS though so, so as long as that's not an ancient version it
> should do the little dance here.  The ASLS is writable though, we don't
> do any write-once tricks, so something could blindly stomp on it.  You
> might enable logging in SeaBIOS, it will emit some spew for the
> OpRegion support.  You could also enable tracing to see the write of
> the ASLS into QEMU.  Thanks,
>
> Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-29 Thread Alex Williamson
On Wed, 29 May 2019 09:25:59 -0700
Micah Morton  wrote:

> So as I mentioned, the ChromeOS firmware writes the location of the
> OpRegion to the ASLS PCI config register
> (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L88).
> The i915 driver then gets the address for the OpRegion from that
> register here: 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/intel_opregion.c#L910.
> This all works for Chrome OS, but when we run a VM with SeaBIOS the
> ASLS PCI config register doesn't get written with the location of the
> OpRegion.:
> [0.263640] in i915_driver_init_hw (I added this)
> ...
> [0.263922] in intel_opregion_setup (and this)
> [0.263954] graphic opregion physical addr: 0x0 <-- This is
> supposed to point to the OpRegion, not be zero.
> [0.263954] ACPI OpRegion not supported!
> ...
> [0.267727] Failed to find VBIOS tables (VBT)
> 
> I'm also not sure if the OpRegion is actually in VM memory or not. Do
> you think I need to find a way to put the OpRegion in VM memory as we
> have seen coreboot (Chrome OS firmware) do above? Or should using
> "x-igd-opregion=on" somehow ensure that the OpRegion makes it into VM
> memory? Clearly I at least need to find a way to set that ASLS PCI
> config register in the VM or modify the i915 driver that runs in the
> guest so it can find the OpRegion.

In QEMU, vfio_pci_igd_opregion_init() adds the opregion to a fw_cfg
file "etc/igd-opregion" and makes the (virtual) ASLS register
writable.  Then in SeaBIOS, any Intel vendor ID, PCI class VGA device
will trigger the intel_igd_setup() function, which looks for the fw_cfg
file, allocates space for it, and writes the GPA back to the ASLS
register.  That's at least how it's supposed to work, which again
reminds me for the umpteenth time that x-igd-opregion only works with
SeaBIOS as OVMF has rejected this support in favor of an option ROM
based solution, which Intel never provided.  I think you're using
SeaBIOS though so, so as long as that's not an ancient version it
should do the little dance here.  The ASLS is writable though, we don't
do any write-once tricks, so something could blindly stomp on it.  You
might enable logging in SeaBIOS, it will emit some spew for the
OpRegion support.  You could also enable tracing to see the write of
the ASLS into QEMU.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-29 Thread Micah Morton
So as I mentioned, the ChromeOS firmware writes the location of the
OpRegion to the ASLS PCI config register
(https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L88).
The i915 driver then gets the address for the OpRegion from that
register here: 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/intel_opregion.c#L910.
This all works for Chrome OS, but when we run a VM with SeaBIOS the
ASLS PCI config register doesn't get written with the location of the
OpRegion.:
[0.263640] in i915_driver_init_hw (I added this)
...
[0.263922] in intel_opregion_setup (and this)
[0.263954] graphic opregion physical addr: 0x0 <-- This is
supposed to point to the OpRegion, not be zero.
[0.263954] ACPI OpRegion not supported!
...
[0.267727] Failed to find VBIOS tables (VBT)

I'm also not sure if the OpRegion is actually in VM memory or not. Do
you think I need to find a way to put the OpRegion in VM memory as we
have seen coreboot (Chrome OS firmware) do above? Or should using
"x-igd-opregion=on" somehow ensure that the OpRegion makes it into VM
memory? Clearly I at least need to find a way to set that ASLS PCI
config register in the VM or modify the i915 driver that runs in the
guest so it can find the OpRegion.

On Tue, May 28, 2019 at 1:43 PM Micah Morton  wrote:
>
> Hey Alex,
>
> I'm seeing the firmware get a hold of the VBT
> (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L253)
> and write the location of the OpRegion to the ASLS PCI register
> (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L88).
> To sanity check, I booted and Chrome OS firmware put the OpRegion at
> 0x7aa9b520 (I can do `mem rm 0x7aa9b520 16` in the host and see it
> print out "IntelGraphicsMem"). So like you said, OpRegion is
> definitely there and used in the host.
>
> This line 
> (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L312)
> seems to imply that the VBT is being included in the OpRegion, so not
> sure whats going wrong. I am right in the middle of debugging this so
> I'll follow up on here if I have further specific questions.
>
> Thanks!
>
> On Tue, May 28, 2019 at 1:23 PM Alex Williamson
>  wrote:
> >
> > On Tue, 28 May 2019 09:35:16 -0700
> > Micah Morton  wrote:
> >
> > > Ah ok thanks!
> > >
> > > The qemu command line i was using is here: `qemu-system-x86_64
> > > -chardev stdio,id=seabios -device
> > > isa-debugcon,iobase=0x402,chardev=seabios -m 2G -smp 2 -M pc -vga none
> > > -usbdevice tablet -cpu host,-invpcid,-tsc-deadline,check -drive
> > > 'file=/path/to/image.bin,index=0,media=disk,cache=unsafe,format=raw'
> > > -enable-kvm -device
> > > vfio-pci,x-igd-opregion=on,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2,rombar=0
> > > -device 'virtio-net,netdev=eth0' -netdev
> > > 'user,id=eth0,net=10.0.2.0/27,hostfwd=tcp:127.0.0.1:9222-:22'`
> > >
> > > It didn't work, but now at least I know why:
> > > [0.316117] i915 :00:02.0: No more image in the PCI ROM
> > > [0.316261] [drm] Failed to find VBIOS tables (VBT)
> > >
> > > If I can expose the VBT to the VM maybe it will work :)
> >
> > Hmm, looking at i915 it seems it didn't find this VBT thing in the
> > OpRegion so tried to look at the ROM, which comments indicate would
> > only be the VBT location on an older device.  QEMU should fail if
> > x-igd-opregion=on is specified but the host kernel didn't provide an
> > OpRegion at all, so we've at least done some minimal sanity checking at
> > the host kernel before exposing it, but maybe the OpRegion is missing
> > some things on this chrome device vs a standard pc?  Maybe Chrome OS
> > uses a modified i915 driver that doesn't depend on it so the firmware
> > guys stripped it?  You could write a minimal vfio driver to dump
> > the opregion data if you want to parse it by hand.  Thanks,
> >
> > Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users