On Tue, Oct 14, 2025, at 7:32 AM, John Levon wrote:
> On Tue, Oct 14, 2025 at 03:16:46PM +0200, Cédric Le Goater wrote:
>
>> > +    /* check cache */
>> 
>> It would be good to add an assert to check the index value. More important
>> we need to fix an ugly "index out-of-bounds" bug that can occur when booting
>> a VM with a vGPU :
>> 
>>   -device vfio-pci-nohotplug,host=0000:27:00.4,display=on,ramfb=true ...
>> 
>> The interesting part is :
>> 
>>   Thread 1 (Thread 0x7ffff6891ec0 (LWP 11372) "qemu-kvm"):
>>   #0  0x000055555581b83d in vfio_region_setup (obj=0x5555588c0b70, 
>> vbasedev=0x5555588c1630, region=0x555558a9c040, index=9, name=0x555555de94ba 
>> <str.68.llvm> "display") at ../hw/vfio/region.c:199
>>   #1  0x00005555558208a4 in vfio_display_region_update (opaque=<optimized 
>> out>) at ../hw/vfio/display.c:449
>>   #2  0x00005555556bdd6c in graphic_hw_update (con=0x555558acf830) at 
>> ../ui/console.c:143
>>   #3  vnc_refresh (dcl=0x7fffec048050) at ../ui/vnc.c:3262
>>   #4  0x00005555556a15cb in dpy_refresh (s=0x555558acf980) at 
>> ../ui/console.c:880
>>   #5  gui_update (opaque=0x555558acf980) at ../ui/console.c:90
>>   (gdb) p vbasedev->num_regions
>>   $9 = 9
>> 
>> Index 9 is beyond the maximum valid index of the reginfo array :/
>> 
>> We didn't take into account the ioctl VFIO_DEVICE_QUERY_GFX_PLANE
>> which can return region index 9 which is beyond the maximum valid
>> index of the reginfo array :/
>
> My apologies - we hit the exact same issue internally, but with a much older
> codebase, so I did not realise this could be an upstream problem as well!
>
> We put this down to a bug in the nvidia driver - surely it shouldn't be
> reporting fewer regions than are actually in use. So we applied what we 
> thought
> to be a gross hack of boundary checking, and not using the region cache in 
> case
> it's beyond num_regions.
>
> To put it another way, the header file says:
>
>    217         __u32   num_regions;    /* Max region index + 1 */
>
> If it's not actually the max region index + 1, what are the expected semantics
> of this field, or of region indices more generally? We could not find any 
> clear
> documentation on the topic other than this comment.

'9' only defines the end of the fixed, pre-defined region indexes for vfio-pci, 
ie. VFIO_PCI_NUM_REGIONS.  Beyond that, we support device specific regions.  
The GFX region is one such device specific region.

We enumerate these regions based on vfio_device_info.num_regions and use the 
capabilities feature of the vfio_region_info to introspect the region type 
provided.

There is no fixed limit to the number of regions a device may expose, nor is 
vfio_device_info.num_regions necessarily a static value.  We're currently 
discussing a uAPI for generating special mappings to a region that could 
dynamically increase the reported regions.  Thanks,

Alex

Reply via email to