Good morning Alex,

Just one observation,

On Sun, 04 Feb 2024 13:06, Alex Bennée <alex.ben...@linaro.org> wrote:
Hi,

I'm trying to get an understanding of the blob allocation and mapping
flow for virtio-gpu for Vulkan and Rutabaga. Having gotten all the
various libraries setup I'm still seeing failures when running a TCG
guest (buildroot + latest glm, mesa, vkmark) with:

 ./qemu-system-aarch64 \
            -M virt -cpu cortex-a76 \
            -m 8192 \
            -object memory-backend-memfd,id=mem,size=8G,share=on \
            -serial mon:stdio \
            -kernel 
~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
            -append "console=ttyAMA0" \
            -device virtio-gpu-gl,context_init=true,blob=true,hostmem=4G \
            -display sdl,gl=on -d 
guest_errors,trace:virtio_gpu_cmd_res\*,trace:virtio_gpu_virgl_process_command 
-D debug.log

which shows up as detected in dmesg but not to vulkaninfo:

 [    0.644879] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
 [    0.648643] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
 [    0.672391] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
 [    0.678071] Serial: AMBA driver
 [    0.682122] [drm] pci: virtio-gpu-pci detected at 0000:00:02.0
 [    0.683249] [drm] Host memory window: 0x8000000000 +0x100000000
 [    0.683420] [drm] features: +virgl +edid +resource_blob +host_visible
 [    0.683470] [drm] features: +context_init
 [    0.695695] [drm] number of scanouts: 1
 [    0.695837] [drm] number of cap sets: 3
 [    0.716173] [drm] cap set 0: id 1, max-version 1, max-size 308
 [    0.716499] [drm] cap set 1: id 2, max-version 2, max-size 1384
 [    0.716686] [drm] cap set 2: id 4, max-version 0, max-size 160
 [    0.726001] [drm] Initialized virtio_gpu 0.1.0 0 for 0000:00:02.0 on minor 0
 virgl_resource_create: err=0, res=2
 virgl_renderer_resource_attach_iov: 0x55b843c17a80/2
 virgl_resource_attach_iov: pipe_resource: 0x55b8434da8f0
 vrend_pipe_resource_attach_iov: 0x43

 ...

 # vulkaninfo --summary
 WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to 
CreateInstance in ICD 1.  Skipping ICD.

a common problem I have when testing different mesa builds is not declaring the intended driver each time. I could be getting errors like yours but if I set the VK_ICD_FILENAMES env var to the correct driver manifest (the installed icd.d/virtio-*.json file from my mesa build) the device is properly recognized. Might be unrelated to this error, but still it helps to define it explicitly each time.



 error: XDG_RUNTIME_DIR is invalid or not set in the environment.
 ==========
 VULKANINFO
 ==========

 Vulkan Instance Version: 1.3.262


 Instance Extensions: count = 12
 -------------------------------
 VK_EXT_debug_report                    : extension revision 10
 VK_EXT_debug_utils                     : extension revision 2
 VK_KHR_device_group_creation           : extension revision 1
 VK_KHR_external_fence_capabilities     : extension revision 1
 VK_KHR_external_memory_capabilities    : extension revision 1
 VK_KHR_external_semaphore_capabilities : extension revision 1
 VK_KHR_get_physical_device_properties2 : extension revision 2
 VK_KHR_get_surface_capabilities2       : extension revision 1
 VK_KHR_portability_enumeration         : extension revision 1
 VK_KHR_surface                         : extension revision 25
 VK_KHR_surface_protected_capabilities  : extension revision 1
 VK_LUNARG_direct_driver_loading        : extension revision 1

 Instance Layers:
 ----------------

 Devices:
 ========
 GPU0:
         apiVersion         = 1.3.267
         driverVersion      = 0.0.1
         vendorID           = 0x10005
         deviceID           = 0x0000
         deviceType         = PHYSICAL_DEVICE_TYPE_CPU
         deviceName         = llvmpipe (LLVM 15.0.3, 128 bits)
         driverID           = DRIVER_ID_MESA_LLVMPIPE
         driverName         = llvmpipe
         driverInfo         = Mesa 23.3.2 (LLVM 15.0.3)
         conformanceVersion = 1.3.1.1
         deviceUUID         = 6d657361-3233-2e33-2e32-000000000000
         driverUUID         = 6c6c766d-7069-7065-5555-494400000000

With an older and more hacked up set of the blob patches I can get
vulkaninfo to work but I see multiple GPUs and vkmark falls over when
mapping stuff:

 # vulkaninfo --summary
 render_state_create_resource: res_id = 5
 vkr_context_add_resource: res_id = 5
 vkr_context_import_resource_internal: res_id = 5
 virgl_resource_create: err=0, res=5
 render_state_create_resource: res_id = 6
 vkr_context_add_resource: res_id = 6
 vkr_context_import_resource_internal: res_id = 6
 virgl_resource_create: err=0, res=6
 error: XDG_RUNTIME_DIR is invalid or not set in the environment.
 ==========
 VULKANINFO
 ==========

 Vulkan Instance Version: 1.3.262


 Instance Extensions: count = 12
 -------------------------------
 VK_EXT_debug_report                    : extension revision 10
 VK_EXT_debug_utils                     : extension revision 2
 VK_KHR_device_group_creation           : extension revision 1
 VK_KHR_external_fence_capabilities     : extension revision 1
 VK_KHR_external_memory_capabilities    : extension revision 1
 VK_KHR_external_semaphore_capabilities : extension revision 1
 VK_KHR_get_physical_device_properties2 : extension revision 2
 VK_KHR_get_surface_capabilities2       : extension revision 1
 VK_KHR_portability_enumeration         : extension revision 1
 VK_KHR_surface                         : extension revision 25
 VK_KHR_surface_protected_capabilities  : extension revision 1
 VK_LUNARG_direct_driver_loading        : extension revision 1
 VK_LUNARG_direct_driver_loading        : extension revision 1                  
                                                                                
      [0/7869]

 Instance Layers:
 ----------------

 Devices:
 ========
 GPU0:
         apiVersion         = 1.3.230
         driverVersion      = 23.3.4
         vendorID           = 0x8086
         deviceID           = 0xa780
         deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
         deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
         driverID           = DRIVER_ID_MESA_VENUS
         driverName         = venus
         driverInfo         = Mesa 23.3.4
         conformanceVersion = 1.3.0.0
         deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
         driverUUID         = b11fafe9-8706-9ab8-0f16-8b272cf893ca
 GPU1:
         apiVersion         = 1.2.0
         driverVersion      = 23.3.4
         vendorID           = 0x10005
         deviceID           = 0x0000
         deviceType         = PHYSICAL_DEVICE_TYPE_CPU
         deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 
bits))
         driverID           = DRIVER_ID_MESA_VENUS
         driverName         = venus
         driverInfo         = Mesa 23.3.4
         conformanceVersion = 1.3.0.0
         deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
         driverUUID         = b11fafe9-8706-9ab8-0f16-8b272cf893ca
 GPU2:
         apiVersion         = 1.3.267
         driverVersion      = 0.0.1
         vendorID           = 0x10005
         deviceID           = 0x0000
         deviceType         = PHYSICAL_DEVICE_TYPE_CPU
         deviceName         = llvmpipe (LLVM 15.0.3, 128 bits)
         driverID           = DRIVER_ID_MESA_LLVMPIPE
         driverName         = llvmpipe
         driverInfo         = Mesa 23.3.2 (LLVM 15.0.3)
         conformanceVersion = 1.3.1.1
         deviceUUID         = 6d657361-3233-2e33-2e32-000000000000
         driverUUID         = 6c6c766d-7069-7065-5555-494400000000
 render_state_destroy_resource: res_id = 5
 vkr_context_remove_resource: res_id = 5
 virgl_resource_destroy_func: res=5
 render_state_destroy_resource: res_id = 6
 vkr_context_remove_resource: res_id = 6
 virgl_resource_destroy_func: res=6

running vkmark gives:

 # vkmark --winsys kms
 render_state_create_resource: res_id = 7
 vkr_context_add_resource: res_id = 7
 vkr_context_import_resource_internal: res_id = 7
 virgl_resource_create: err=0, res=7
 render_state_create_resource: res_id = 8
 vkr_context_add_resource: res_id = 8
 vkr_context_import_resource_internal: res_id = 8
 virgl_resource_create: err=0, res=8
 virgl_resource_create: err=0, res=9
 virgl_renderer_resource_attach_iov: 0x55615acf7f40/9
 virgl_resource_attach_iov: pipe_resource: 0x55615acf7db0
 vrend_pipe_resource_attach_iov: 0x43

this bit does nothing as VREND_STORAGE_HOST_SYSTEM_MEMORY isn't set.

 virgl_resource_create: err=0, res=10
 virgl_renderer_resource_attach_iov: 0x55615ae569a0/10
 virgl_resource_attach_iov: pipe_resource: 0x55615a99ce20
 vrend_pipe_resource_attach_iov: 0x43
 Warning: KMSWindowSystem: Using VK_IMAGE_TILING_OPTIMAL for dmabuf with 
invalid modifier, but this is not guaranteed to work.
 vkr_dispatch_vkAllocateMemory: mem_index_type:0
 virgl_render_server[2889817]: vkr: failed to import resource: invalid res_id 9
 virgl_render_server[2889817]: vkr: vkAllocateMemory resulted in CS error
 virgl_render_server[2889817]: vkr: ring_submit_cmd: vn_dispatch_command failed
 MESA-VIRTIO: debug: vn_ring_submit abort on fatal
 render_state_destroy_resource: res_id = 7
 vkr_context_remove_resource: res_id = 7
 render_state_destroy_resource: res_id = 8
 vkr_context_remove_resource: res_id = 8
 virgl_resource_destroy_func: res=7
 virgl_resource_destroy_func: res=8
 virgl_resource_destroy_func: res=10
 virgl_resource_destroy_func: res=9
 Aborted

The debug printfs are throughout QEMU and virlgrenderer while trying to
work out what was going on. While the eventual aim is to enable this
stack on a ARM64 platform with Xen I wanted to make sure I understand
the blob resource flow first.

As I understand it we need resource blobs to hold 3D data for rendering
that is visible to the underlying GPU that will be doing the work. While
these blobs can be copied back and forth the most efficient way to do
this is to allocate HOST VISIBLE blobs that are appropriately placed
and aligned for the host GPU. For Vulkan there will also be the command
stream which will need to be translated from the Vulkan's portable
shader IR (as emitted by mesa on the guest) to the underlying shader
commands on the host system (via mesa on the host).

I can see the host visible region is created as a large chunk of the PCI
BAR space and is outside of the guests system RAM.

The guest creates a unique resource ID by submitting a
VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB command.

Q1: is the actual allocation done here? I assume this happens somewhere
in the depths of virglrenderer or is it the kernel DRM subsystem?

Q2: should the memory for this resource be visible to the host
userspace at this point? Where is the mapping into userspace done?

The guest submits a VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command with a
unique resource ID and requests it is mapped at an offset into the host
visible memory region.

Q3: Is all the mapping for host and guest handled by QEMU's memory_region
code?

Q4: Are there any differences between cards with VRAM and those with a
unified memory architecture (e.g. using system RAM)?

Finally should we expect to see any other resources
(RESOURCE_CREATE_2D/3D, TRANSFER_TO_HOST, ATTACH_BACKING) if we have
host visible blobs properly allocated and working?

--
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to