Thanks Alex.

>> +config VGPU_VFIO
>> +    tristate
>> +    depends on VGPU
>> +    default n
>> +
>
> This is a little bit convoluted, it seems like everything added in this
> patch is vfio agnostic, it doesn't necessarily care what the consumer
> is.  That makes me think we should only be adding CONFIG_VGPU here and
> it should not depend on CONFIG_VFIO or be enabling CONFIG_VGPU_VFIO.
> The middle config entry is also redundant to the first, just move the
> default line up to the first and remove the rest.

CONFIG_VGPU doesn't directly depend on VFIO. CONFIG_VGPU_VFIO is directly dependent on VFIO. But devices created by VGPU core module need a driver to manage those devices. CONFIG_VGPU_VFIO is the driver which will manage vgpu devices. So I think CONFIG_VGPU_VFIO should be enabled by CONFIG_VGPU.

This would look like:
menuconfig VGPU
    tristate "VGPU driver framework"
    select VGPU_VFIO
    default n
    help
        VGPU provides a framework to virtualize GPU without SR-IOV cap
        See Documentation/vgpu.txt for more details.

        If you don't know what do here, say N.

config VGPU_VFIO
    tristate
    depends on VGPU
    depends on VFIO
    default n



>> +int vgpu_register_device(struct pci_dev *dev, const struct gpu_device_ops *ops)
>
> Why do we care that it's a pci_dev?  It seems like there's only a very
> small portion of the API that cares about pci_devs in order to describe
> BARs, which could be switched based on the device type.  Otherwise we
> could operate on a struct device here.
>

GPUs are PCI devices, hence used pci_dev. I agree with you, I'll change it to operate on struct device and add checks in vgpu_vfio.c where config space and BARs are populated.

>> +static void vgpu_device_free(struct vgpu_device *vgpu_dev)
>> +{
>> +      if (vgpu_dev) {
>> +              mutex_lock(&vgpu.vgpu_devices_lock);
>> +              list_del(&vgpu_dev->list);
>> +              mutex_unlock(&vgpu.vgpu_devices_lock);
>> +              kfree(vgpu_dev);
>> +      }
>
> Why aren't we using the kref to remove and free the vgpu when the last
> reference is released?

vgpu_device_free() is called from 2 places,
1. from create_vgpu_device(), when device_register() is failed.
2. vgpu_device_release(), which is set as a release function to device registered by device_register():
vgpu_dev->dev.release = vgpu_device_release;
This release function is called from device_unregister() kernel function where it uses kref to call release function. I don't think I need to do it again.


>> +struct vgpu_device *vgpu_drv_get_vgpu_device(uuid_le uuid, int instance)
>> +{
>> +      struct vgpu_device *vdev = NULL;
>> +
>> +      mutex_lock(&vgpu.vgpu_devices_lock);
>> +      list_for_each_entry(vdev, &vgpu.vgpu_devices_list, list) {
>> +              if ((uuid_le_cmp(vdev->uuid, uuid) == 0) &&
>> +                  (vdev->vgpu_instance == instance)) {
>> +                      mutex_unlock(&vgpu.vgpu_devices_lock);
>> +                      return vdev;
>
> We're not taking any sort of reference to the vgpu, what prevents races
> with it being removed?  A common exit path would be easy to achieve
> here too.
>

I'll add reference count.


>> +create_attr_error:
>> +      if (gpu_dev->ops->vgpu_destroy) {
>> +              int ret = 0;
>> +              ret = gpu_dev->ops->vgpu_destroy(gpu_dev->dev,
>> +                                               vgpu_dev->uuid,
>> +                                               vgpu_dev->vgpu_instance);
>
> Unnecessary initialization and we don't do anything with the result.
> Below indicates lack of vgpu_destroy indicates the vendor doesn't
> support unplug, but doesn't that break our error cleanup path here?
>

Comment about vgpu_destroy:
If VM is running and vgpu_destroy is called that means the vGPU is being hotunpluged. Return
error if VM is running and graphics driver
doesn't support vgpu hotplug.

Its GPU drivers responsibility to check if VM is running and return accordingly. This is vgpu creation path. Vgpu device would be hotplug to VM on vgpu_start.

>> +              retval = gpu_dev->ops->vgpu_destroy(gpu_dev->dev,
>> +                                                  vgpu_dev->uuid,
>> +                                                  vgpu_dev->vgpu_instance);
>> + /* if vendor driver doesn't return success that means vendor driver doesn't
>> +       * support hot-unplug */
>> +              if (retval)
>> +                      return;
>
> Should we return an error code then?  Inconsistent comment style.
>

destroy_vgpu_device() is called from
- release_vgpubus_dev(), which is a release function of vgpu_class and has return type void.
- vgpu_unregister_device(), which again return void

Even if error code is returned from here, it is not going to be used.

I'll change the comment style in next patch update.



>> + * @write:                    Write emulation callback
>> + *                            @vdev: vgpu device structure
>> + *                            @buf: write buffer
>> + *                            @count: number bytes to be written
>> + *                            @address_space: specifies for which address 
space
>> + *                            the request is: pci_config_space, IO register
>> + *                            space or MMIO space.
>> + *                            @pos: offset from base address.
>> + *                            Retuns number on bytes written on success or 
error.
>
> How do these support multiple MMIO spaces or IO port spaces?  GPUs, and
> therefore I assume vGPUs, often have more than one MMIO space, how
> does the enum above tell us which one?  We could simply make this be a
> region index.
>

addresss_space should be one of these, yes address_space is of
'enum vgpu_emul_space_e', will change uint32_t to vgpu_emul_space_e.
enum vgpu_emul_space_e {
        vgpu_emul_space_config = 0, /*!< PCI configuration space */
        vgpu_emul_space_io = 1,     /*!< I/O register space */
        vgpu_emul_space_mmio = 2    /*!< Memory-mapped I/O space */
};

If you see vgpu_dev_bar_rw() in 2nd patch, in vgpu_vfio.c
int bar_index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);

@pos: offset from base address is calculated as:
pos = vdev->bar_info[bar_index].start + offset

GPU driver can find the bar index from 'pos'.

>> + * @vgpu_set_irqs:            Called to send about interrupts configuration
>> + *                            information that qemu set.
>> + *                            @vdev: vgpu device structure
>> + *                            @flags, index, start, count and *data : same 
as
>> + *                            that of struct vfio_irq_set of
>> + *                            VFIO_DEVICE_SET_IRQS API.
>
> How do we learn about the supported interrupt types?  Should this be
> called vgpu_vfio_set_irqs if it's following the vfio API?
>

GPU driver provides config space of vgpu device.
QEMU learns about supported interrupt type by reading this config space and of vgpu device and check PCI capabilites.
Yes, this follows vfio API, will change the function name.

>> + * @vgpu_bar_info:            Called to get BAR size and flags of vGPU 
device.
>> + *                            @vdev: vgpu device structure
>> + *                            @bar_index: BAR index
>> + *                            @bar_info: output, returns size and flags of
>> + *                            requested BAR
>> + *                            Returns integer: success (0) or error (< 0)
>
> This is called bar_info, but the bar_index is actually the vfio region
> index and things like the config region info is being overloaded
> through it.  We already have a structure defined for getting a generic
> region index, why not use it?  Maybe this should just be
> vgpu_vfio_get_region_info.
>

Ok. Will do.


>> + * @validate_map_request:     Validate remap pfn request
>> + *                            @vdev: vgpu device structure
>> + *                            @virtaddr: target user address to start at
>> + *                            @pfn: physical address of kernel memory, GPU
>> + *                            driver can change if required.
>> + *                            @size: size of map area, GPU driver can change
>> + *                            the size of map area if desired.
>> + *                            @prot: page protection flags for this mapping,
>> + *                            GPU driver can change, if required.
>> + *                            Returns integer: success (0) or error (< 0)
>
> Was not at all clear to me what this did until I got to patch 2, this
> is actually providing the fault handling for mmap'ing a vGPU mmio BAR.
> Needs a better name or better description.
>

If say VMM mmap whole BAR1 of GPU, say 128MB, so fault would occur when BAR1 is tried to access then the size is calculated as:
req_size = vma->vm_end - virtaddr
Since GPU is being shared by multiple vGPUs, GPU driver might not remap whole BAR1 for only one vGPU device, so would prefer, say map one page at a time. GPU driver returns PAGE_SIZE. This is used by remap_pfn_range(). Now on next access to BAR1 other than that page, we will again get a fault(). As the name says this call is to validate from GPU driver for the size and prot of map area. GPU driver can change size and prot for this map area.


>> +      ssize_t (*write)(struct vgpu_device *vdev, char *buf, size_t count,
>> +                       uint32_t address_space, loff_t pos);
>
> Aren't these really 'enum vgpu_emul_space_e', not uint32_t?
>

Yes. I'll change to enum vgpu_emul_space_e.

Thanks,
Kirti.



Reply via email to