On Tue, Oct 14, 2025 at 03:16:46PM +0200, Cédric Le Goater wrote: > > + /* check cache */ > > It would be good to add an assert to check the index value. More important > we need to fix an ugly "index out-of-bounds" bug that can occur when booting > a VM with a vGPU : > > -device vfio-pci-nohotplug,host=0000:27:00.4,display=on,ramfb=true ... > > The interesting part is : > > Thread 1 (Thread 0x7ffff6891ec0 (LWP 11372) "qemu-kvm"): > #0 0x000055555581b83d in vfio_region_setup (obj=0x5555588c0b70, > vbasedev=0x5555588c1630, region=0x555558a9c040, index=9, name=0x555555de94ba > <str.68.llvm> "display") at ../hw/vfio/region.c:199 > #1 0x00005555558208a4 in vfio_display_region_update (opaque=<optimized > out>) at ../hw/vfio/display.c:449 > #2 0x00005555556bdd6c in graphic_hw_update (con=0x555558acf830) at > ../ui/console.c:143 > #3 vnc_refresh (dcl=0x7fffec048050) at ../ui/vnc.c:3262 > #4 0x00005555556a15cb in dpy_refresh (s=0x555558acf980) at > ../ui/console.c:880 > #5 gui_update (opaque=0x555558acf980) at ../ui/console.c:90 > (gdb) p vbasedev->num_regions > $9 = 9 > > Index 9 is beyond the maximum valid index of the reginfo array :/ > > We didn't take into account the ioctl VFIO_DEVICE_QUERY_GFX_PLANE > which can return region index 9 which is beyond the maximum valid > index of the reginfo array :/
My apologies - we hit the exact same issue internally, but with a much older codebase, so I did not realise this could be an upstream problem as well! We put this down to a bug in the nvidia driver - surely it shouldn't be reporting fewer regions than are actually in use. So we applied what we thought to be a gross hack of boundary checking, and not using the region cache in case it's beyond num_regions. To put it another way, the header file says: 217 __u32 num_regions; /* Max region index + 1 */ If it's not actually the max region index + 1, what are the expected semantics of this field, or of region indices more generally? We could not find any clear documentation on the topic other than this comment. regards john
