On 1/23/25 14:58, Alex Bennée wrote: > Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: > >> On 1/22/25 20:00, Alex Bennée wrote: >>> Dmitry Osipenko <dmitry.osipe...@collabora.com> writes: >>> >>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu. >>>> >>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs, >>>> DRM native context [1] mediates lower level kernel driver UAPI, which >>>> reflects in a less CPU overhead and less/simpler code needed to support it. >>>> DRM context consists of a host and guest parts that have to be implemented >>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as >>>> a real/native host GPU device for GL/VK applications. >>>> >>>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ >>>> >>>> Today there are four known DRM native context drivers existing in a wild: >>>> >>>> - Freedreno (Qualcomm SoC GPUs), completely upstreamed >>>> - AMDGPU, mostly merged into upstreams >>> >>> I tried my AMD system today with: >>> >>> Host: >>> Aarch64 AVA system >>> Trixie >>> virglrenderer @ v1.1.0/99557f5aa130930d11f04ffeb07f3a9aa5963182 >>> -display sdl,gl=on (gtk,gl=on also came up but handled window resizing >>> poorly) >>> >>> KVM Guest >>> >>> Aarch64 >>> Trixie >>> mesa @ main/d27748a76f7dd9236bfcf9ef172dc13b8c0e170f >>> -Dvulkan-drivers=virtio,amd -Dgallium-drivers=virgl,radeonsi >>> -Damdgpu-virtio=true >>> >>> However when I ran vulkan-info --summary KVM faulted with: >>> >>> debian-trixie login: error: kvm run failed Bad address >>> PC=0000ffffb9aa1eb0 X00=0000ffffba0450a4 X01=0000aaaaf7f32400 >>> X02=000000000000013c X03=0000ffffba045098 X04=0000aaaaf7f3253c >>> X05=0000ffffba0451d4 X06=00000000c0016900 X07=000000000000000e >>> X08=0000000000000014 X09=00000000000000ff X10=0000aaaaf7f32500 >>> X11=0000aaaaf7e4d028 X12=0000aaaaf7edbcb0 X13=0000000000000001 >>> X14=000000000000000c X15=0000000000007718 X16=0000ffffb93601f0 >>> X17=0000ffffb9aa1dc0 X18=00000000000076f0 X19=0000aaaaf7f31330 >>> X20=0000aaaaf7f323f0 X21=0000aaaaf7f235e0 X22=000000000000004c >>> X23=0000aaaaf7f2b5e0 X24=0000aaaaf7ee0cb0 X25=00000000000000ff >>> X26=0000000000000076 X27=0000ffffcd2b18a8 X28=0000aaaaf7ee0cb0 >>> X29=0000ffffcd2b0bd0 X30=0000ffffb86c8b98 SP=0000ffffcd2b0bd0 >>> PSTATE=20001000 --C- EL0t >>> QEMU 9.2.50 monitor - type 'help' for more information >>> (qemu) quit >>> >>> Which looks very much like the PFN locking failure. However booting up >>> with venus=on instead works. Could there be any differences in the way >>> device memory is mapped in the two cases? >> >> Memory mapping works exactly the same for nctx and venus. Are you on >> 6.13 host kernel? > > Yes - with the Altra PCI workaround patches on both host and guest > kernel. > > Is there anyway to trace the sharing of device memory on the host so I > can verify its an attempt at device access? The PC looks like its in > user-space but once this fails the guest is suspended so I can't poke > around in its environment.
I'm adding printk's to kernel in a such cases. Likely there is no other better way to find why it fails. Does your ARM VM and host both use 4k page size? Well, if it's a page refcounting bug on ARM/KMV, then applying [1] to the host driver will make it work and we will know where the problem is. Please try. [1] https://patchwork.kernel.org/project/kvm/patch/20220815095423.11131-1-dmitry.osipe...@collabora.com/ -- Best regards, Dmitry