On Tue, Oct 14, 2025, at 7:32 AM, John Levon wrote: > On Tue, Oct 14, 2025 at 03:16:46PM +0200, Cédric Le Goater wrote: > >> > + /* check cache */ >> >> It would be good to add an assert to check the index value. More important >> we need to fix an ugly "index out-of-bounds" bug that can occur when booting >> a VM with a vGPU : >> >> -device vfio-pci-nohotplug,host=0000:27:00.4,display=on,ramfb=true ... >> >> The interesting part is : >> >> Thread 1 (Thread 0x7ffff6891ec0 (LWP 11372) "qemu-kvm"): >> #0 0x000055555581b83d in vfio_region_setup (obj=0x5555588c0b70, >> vbasedev=0x5555588c1630, region=0x555558a9c040, index=9, name=0x555555de94ba >> <str.68.llvm> "display") at ../hw/vfio/region.c:199 >> #1 0x00005555558208a4 in vfio_display_region_update (opaque=<optimized >> out>) at ../hw/vfio/display.c:449 >> #2 0x00005555556bdd6c in graphic_hw_update (con=0x555558acf830) at >> ../ui/console.c:143 >> #3 vnc_refresh (dcl=0x7fffec048050) at ../ui/vnc.c:3262 >> #4 0x00005555556a15cb in dpy_refresh (s=0x555558acf980) at >> ../ui/console.c:880 >> #5 gui_update (opaque=0x555558acf980) at ../ui/console.c:90 >> (gdb) p vbasedev->num_regions >> $9 = 9 >> >> Index 9 is beyond the maximum valid index of the reginfo array :/ >> >> We didn't take into account the ioctl VFIO_DEVICE_QUERY_GFX_PLANE >> which can return region index 9 which is beyond the maximum valid >> index of the reginfo array :/ > > My apologies - we hit the exact same issue internally, but with a much older > codebase, so I did not realise this could be an upstream problem as well! > > We put this down to a bug in the nvidia driver - surely it shouldn't be > reporting fewer regions than are actually in use. So we applied what we > thought > to be a gross hack of boundary checking, and not using the region cache in > case > it's beyond num_regions. > > To put it another way, the header file says: > > 217 __u32 num_regions; /* Max region index + 1 */ > > If it's not actually the max region index + 1, what are the expected semantics > of this field, or of region indices more generally? We could not find any > clear > documentation on the topic other than this comment.
'9' only defines the end of the fixed, pre-defined region indexes for vfio-pci, ie. VFIO_PCI_NUM_REGIONS. Beyond that, we support device specific regions. The GFX region is one such device specific region. We enumerate these regions based on vfio_device_info.num_regions and use the capabilities feature of the vfio_region_info to introspect the region type provided. There is no fixed limit to the number of regions a device may expose, nor is vfio_device_info.num_regions necessarily a static value. We're currently discussing a uAPI for generating special mappings to a region that could dynamically increase the reported regions. Thanks, Alex
