Re: AMD GPU problems under Xen

2022-11-29 Thread Marek Marczykowski-Górecki
On Tue, Nov 29, 2022 at 09:32:54AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 8:59 PM Demi Marie Obenour
>  wrote:
> >
> > On Mon, Nov 28, 2022 at 11:18:00AM -0500, Alex Deucher wrote:
> > > On Mon, Nov 28, 2022 at 2:18 AM Demi Marie Obenour
> > >  wrote:
> > > >
> > > > Dear Christian:
> > > >
> > > > What is the status of the AMDGPU work for Xen dom0?  That was mentioned 
> > > > in
> > > > https://lore.kernel.org/dri-devel/b2dec9b3-03a7-e7ac-306e-1da024af8...@amd.com/
> > > > and there have been bug reports to Qubes OS about problems with AMDGPU
> > > > under Xen (such as https://github.com/QubesOS/qubes-issues/issues/7648).
> > >
> > > I would say it's a work in progress.  It depends what GPU  you have
> > > and what type of xen setup you are using (PV vs PVH, etc.).
> >
> > The current situation is:
> >
> > - dom0 is PV.
> > - VMs with assigned PCI devices are HVM and use a Linux-based stubdomain
> >   QEMU does not run in dom0.
> > - Everything else is PVH.
> >
> > In the future, I believe the goal is to move away from PV and HVM in
> > favor of PVH, though HVM support will remain for compatibility with
> > guests (such as Windows) that need emulated devices.
> >
> > > In general, your best bet currently is dGPU add in boards because they
> > > are largely self contained.
> >
> > The main problem is that for the trusted GUI to work, there needs to
> > be at least one GPU attached to a trusted VM, such as the host or a
> > dedicated GUI VM.  That VM will typically not be running graphics-
> > intensive workloads, so the compute power of a dGPU is largely wasted.
> > SR-IOV support would help with that, but the only GPU vendor with open
> > source SR-IOV support is Intel and it is still not upstream.  I am also
> > not certain if the support extends to Arc dGPUs.
> 
> Can you elaborate on this?  Why wouldn't you just want to pass-through
> a dGPU to a domU to use directly in the guest?

You can do that, but if that's your only GPU in the system, you'll lose
graphical interface for other guests.
But yes, simply pass-through of a dGPU is enough in some setups.

> Are you sure?  I didn't think intel's GVT solution was actually
> SR-IOV.  I think GVT is just a paravirtualized solution.

Yes, it's a paravirtualized solution, with device emulation done in dom0
kernel. This, besides being rather unusual approach in Xen world
(emulators, aka IOREQ servers usually live in userspace) puts rather
complex piece of code that interacts with untrusted data (instructions
from guests) in almost the most privileged system component, without
ability to sandbox it in any way. We consider it too risky for Qubes OS,
especially since the kernel patches were never accepted upstream and the
Xen support is not maintained anymore.

The SR-IOV approach Demi is talking about is newer development,
supported since Adler Lake (technically, IGD in Tiger Lake presents
SR-IOV capability too, but officially it's supported since ADL). The driver
for managing it is in the process of upstreaming. Some links here:
https://github.com/intel/linux-intel-lts/issues/33
(I have not tried it, yet)

>  That aside,
> we are working on enabling virtio gpu with our GPUs on xen in addition
> to domU passthrough.

That's interesting development. Please note, Linux recently (part of
6.1) gained support to use grant tables with virtio. This allows having
backends without full access to guest's memory. The work is done in
generic way, so a driver using proper APIs (including DMA) should work
out in such setup out of the box. Please try to not break it :)

> >
> > > APUs and platforms with integrated dGPUs
> > > are a bit more complicated as they tend to have more platform
> > > dependencies like ACPI tables and methods in order for the driver to
> > > be able to initialize the hardware properly.
> >
> > Is Xen dom0/domU support for such GPUs being worked on?  Is there an
> > estimate as to when the needed support will be available upstream?  This
> > is mostly directed at Christian and other people who work for hardware
> > vendors.
> 
> Yes, there are some minor fixes in the driver required which we'll be
> sending out soon and we had to add some ACPI tables to the whitelist
> in xen, but unfortunately the ACPI tables are AMD platform specific so
> there has been pushback from the xen maintainers on accepting them
> because they are not an official part of the ACPI spec.

Can the driver work without them? Such dependency, as you noted above,
make things rather complicated for pass-through (specific ACPI tables
can probably be made available to the guest, but usually guest wouldn't
see all the resources they talk about anyway).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


[PATCH] drm/amdgpu: do not use passthrough mode in Xen dom0

2022-04-27 Thread Marek Marczykowski-Górecki
While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:

[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: 
status=3

While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index a025f080aa6a..5e3756643da3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -24,6 +24,7 @@
 #include 
 
 #include 
+#include 
 
 #include "amdgpu.h"
 #include "amdgpu_ras.h"
@@ -710,7 +711,8 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
*adev)
adev->virt.caps |= AMDGPU_SRIOV_CAPS_ENABLE_IOV;
 
if (!reg) {
-   if (is_virtual_machine())   /* passthrough mode exclus 
sriov mod */
+   /* passthrough mode exclus sriov mod */
+   if (is_virtual_machine() && !xen_initial_domain())
adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE;
}
 
-- 
2.35.1