Re: [libvirt] [Qemu-devel] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-04-23 Thread Neo Jia
On Tue, Apr 23, 2019 at 11:39:39AM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 19, 2019 at 04:35:04AM -0400, Yan Zhao wrote:
> > device version attribute in mdev sysfs is used by user space software
> > (e.g. libvirt) to query device compatibility for live migration of VFIO
> > mdev devices. This attribute is mandatory if a mdev device supports live
> > migration.
> > 
> > It consists of two parts: common part and vendor proprietary part.
> > common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
> >  identifies device type. e.g., for pci device, it is
> >  "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> > vendor proprietary part: this part is varied in length. vendor driver can
> >  specify any string to identify a device.
> > 
> > When reading this attribute, it should show device version string of the
> > device of type . If a device does not support live migration, it
> > should return errno.
> > When writing a string to this attribute, it returns errno for
> > incompatibility or returns written string length in compatibility case.
> > If a device does not support live migration, it always returns errno.
> > 
> > For user space software to use:
> > 1.
> > Before starting live migration, user space software first reads source side
> > mdev device's version. e.g.
> > "#cat \
> > /sys/bus/pci/devices/\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
> > 00028086-193b-i915-GVTg_V5_4
> > 
> > 2.
> > Then, user space software writes the source side returned version string
> > to device version attribute in target side, and checks the return value.
> > If a negative errno is returned in the target side, then mdev devices in
> > source and target sides are not compatible;
> > If a positive number is returned and it equals to the length of written
> > string, then the two mdev devices in source and target side are compatible.
> > e.g.
> > (a) compatibility case
> > "# echo 00028086-193b-i915-GVTg_V5_4 >
> > /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > 
> > (b) incompatibility case
> > "#echo 00028086-193b-i915-GVTg_V5_1 >
> > /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > -bash: echo: write error: Invalid argument
> 
> What you have written here seems to imply that each mdev type is able to
> support many different versions at the same time. Writing a version into
> this sysfs file then chooses which of the many versions to actually use.
> 
> This is good as it allows for live migration across driver software upgrades.
> 
> A mgmt application may well want to know what versions are supported for an
> mdev type *before* starting a migration. A mgmt app can query all the 100's
> of hosts it knows and thus figure out which are valid to use as the target
> of a migration.
> 
> IOW, we want to avoid the ever hitting the incompatibility case in the
> first place, by only choosing to migrate to a host that we know is going
> to be compatible.
> 
> This would need some kind of way to report the full list of supported
> versions against the mdev supported types on the host.

What would be the typical scenario / use case for mgmt layer to query the 
version
information? Do they expect this get done completely offline as long as the
vendor driver installed on each host?

Thanks,
Neo

> 
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] summary of current vfio mdev upstreaming status

2016-09-29 Thread Neo Jia
On Thu, Sep 29, 2016 at 05:05:47PM +0800, Xiao Guangrong wrote:
> 
> 
> On 09/29/2016 04:55 PM, Jike Song wrote:
> > Hi all,
> > 
> > In order to have a clear understanding about the VFIO mdev upstreaming
> > status, I'd like to summarize it. Please share your opinions on this,
> > and correct my misunderstandings.
> > 
> > The whole vfio mdev series can be logically divided into several parts,
> > they work together to provide the mdev support.
> 
> I think what Jike want to suggest is how about partially push/develop the
> mdev. As jike listed, there are some parts can be independent and they have
> mostly been agreed.
> 
> Such development plan can make the discussion be much efficient in the
> community. Also it make the possibility that Intel, Nvdia, IBM can focus
> on different parts and co-develop it.

Hi Guangrong,

JFYI. we are preparing v8 patches to accommodate most comments we have discussed
so far and we will also include several things that we have decided on sysfs.

I definitely would like to see more interactive discussions especially on the
sysfs class front from intel folks.

Regarding the patch development and given the current status, especially where
we are and what we have been through, I am very confident that we should be able
to fully handle this ourselves, but thanks for offering help anyway!

We should be able to react as fast as possible based on the public mailing list
discussions, so again I don't think that part is an issue.

Thanks,
Neo

> 
> The maintainer can hold these development patches in local branch before
> pushing the full-functionality version to upstream.
> 
> Thanks!
> 
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] summary of current vfio mdev upstreaming status

2016-09-29 Thread Neo Jia
On Thu, Sep 29, 2016 at 04:55:39PM +0800, Jike Song wrote:
> Hi all,
> 
> In order to have a clear understanding about the VFIO mdev upstreaming
> status, I'd like to summarize it. Please share your opinions on this,
> and correct my misunderstandings.
> 
> The whole vfio mdev series can be logically divided into several parts,
> they work together to provide the mdev support.

Hi Jike,

Thanks for summarizing this, but I will defer to Kirti to comment on the actual
upstream status of her patches, couples things to note though:

1) iommu type1 patches have been extensively reviewed by Alex already and we
have one action item left to implement which is already queued up in v8 
patchset.

2) regarding the sysfs interface and libvirt discussion, I would like to hear
what kind of attributes Intel folks are having so far as Daniel is
asking about adding a class "gpu" which will pull several attributes as 
mandatory.

Thanks,
Neo

> 
> 
> 
> PART 1: mdev core driver
> 
>   [task]
>   -   the mdev bus/device support
>   -   the utilities of mdev lifecycle management
>   -   the physical device register/unregister interfaces
> 
>   [status]
>   -   basically agreed by community
> 
> 
> PART 2: vfio bus driver for mdev
> 
>   [task]
>   -   interfaces with vendor drivers
>   -   the vfio bus implementation
> 
>   [status]
> 
>   -   basically agreed by community
> 
> 
> PART 3: iommu support for mdev
> 
>   [task]
>   -   iommu support for mdev
> 
>   [status]
>   -   Kirti's v7 implementation, not yet fully reviewed
> 
> 
> PART 4: sysfs interfaces for mdev
> 
>   [task]
>   -   define the hierarchy of minimal sysfs directories/files
>   -   check the validity from vendor drivers, init/de-init 
> them
>   [status]
>   -   interfaces are in discussion
> 
> 
> PART 6: Documentation
> 
>   [task]
>   -   clearly document the architecture and interfaces
>   -   coding example for vendor drivers
> 
>   [status]
>   -   N/A
> 
> 
> What I'm curious here is 'PART 4', which is needed by other parts to
> perform further steps, is it possible to accelerate the process somehow? :-)
> 
> 
> --
> Thanks,
> Jike
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-29 Thread Neo Jia
On Thu, Sep 29, 2016 at 09:03:40AM +0100, Daniel P. Berrange wrote:
> On Wed, Sep 28, 2016 at 12:22:35PM -0700, Neo Jia wrote:
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> > > > 
> > > > > >>>>> My concern is that a type id seems arbitrary but we're 
> > > > > >>>>> specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So 
> > > > > >>>>> why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally 
> > > > > >>>>> create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely 
> > > > > >>>>> deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their 
> > > > > >>>>> product
> > > > > >>>>> lines.
> > > > > >>>>>   
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same 
> > > > > >>>> name.
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system 
> > > > > >>> and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on 
> > > > > >>> that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've 
> > > > > >>> based
> > > > > >>> our XML on the wrong attribute.  If the new device does not 
> > > > > >>> support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply 
> > > > > >>> initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>> 
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type 
> > > > > >> directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type 
> > > > > >> would
> > > > > >> not be there in its mdev_supported_types, it will have different 
> > > > > >> types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. 
> > > > > >> XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, 
> > > > > >> libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory 
> > > > > >> format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>  |-- 11
> > > > > >>  |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should 
> > > > > >> throw
> > > > > >> error on not able to find '11' directory.  
> > > > > > 
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that 
> > > > > > happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed 
> > > > > > to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and 
> > > > > > removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and 
> > > > > > the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >  
> > > > > 
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or 
> > > > > "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > > 
> > > > > 
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. 
> > > > > >>>> Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor 
> > > > > >>>> driver
> > > > > >>>> decide what to use for directory name and same should be used in 
> > > > > >>>> device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>> 
> > > > > >>>>   my-vgpu
> > > > > >>>>   pci__86_00_0
> > > > > >>>>   
> > > > > >>>> 

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-28 Thread Neo Jia
On Wed, Sep 28, 2016 at 04:31:25PM -0400, Laine Stump wrote:
> On 09/28/2016 03:59 PM, Neo Jia wrote:
> > On Wed, Sep 28, 2016 at 07:45:38PM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Thursday, September 29, 2016 3:23 AM
> > > > 
> > > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> > > > > > 
> > > > > > > > > > > > My concern is that a type id seems arbitrary but we're 
> > > > > > > > > > > > specifying that
> > > > > > > > > > > > it be unique.  We already have something unique, the 
> > > > > > > > > > > > name.  So why try
> > > > > > > > > > > > to make the type id unique as well?  A vendor can 
> > > > > > > > > > > > accidentally create
> > > > > > > > > > > > their vendor driver so that a given name means 
> > > > > > > > > > > > something very
> > > > > > > > > > > > specific.  On the other hand they need to be extremely 
> > > > > > > > > > > > deliberate to
> > > > > > > > > > > > coordinate that a type id means a unique thing across 
> > > > > > > > > > > > all their product
> > > > > > > > > > > > lines.
> > > > > > > > > > > > 
> > > > > > > > > > > Let me clarify, type id should be unique in the list of
> > > > > > > > > > > mdev_supported_types. You can't have 2 directories in 
> > > > > > > > > > > with same name.
> > > > > > > > > > Of course, but does that mean it's only unique to the 
> > > > > > > > > > machine I'm
> > > > > > > > > > currently running on?  Let's say I have a Tesla P100 on my 
> > > > > > > > > > system and
> > > > > > > > > > type-id 11 is named "GRID-M60-0B".  At some point in the 
> > > > > > > > > > future I
> > > > > > > > > > replace the Tesla P100 with a Q1000 (made up).  Is type-id 
> > > > > > > > > > 11 on that
> > > > > > > > > > new card still going to be a "GRID-M60-0B"?  If not then 
> > > > > > > > > > we've based
> > > > > > > > > > our XML on the wrong attribute.  If the new device does not 
> > > > > > > > > > support
> > > > > > > > > > "GRID-M60-0B" then we should generate an error, not simply 
> > > > > > > > > > initialize
> > > > > > > > > > whatever type-id 11 happens to be on this new card.
> > > > > > > > > > 
> > > > > > > > > If there are 2 M60 in the system then you would find '11' 
> > > > > > > > > type directory
> > > > > > > > > in mdev_supported_types of both M60. If you have P100, '11' 
> > > > > > > > > type would
> > > > > > > > > not be there in its mdev_supported_types, it will have 
> > > > > > > > > different types.
> > > > > > > > > 
> > > > > > > > > For example, if you replace M60 with P100, but XML is not 
> > > > > > > > > updated. XML
> > > > > > > > > have type '11'. When libvirt would try to create mdev device, 
> > > > > > > > > libvirt
> > > > > > > > > would have to find 'create' file in sysfs in following 
> > > > > > > > > directory format:
> > > > > > > > > 
> > > > > > > > >   --- mdev_supported_types
> > > > > > > > >   |-- 11
> > > > > > > > >   |   |-- create
> > > > > > > > > 
> > > > > > > > > but now for P100, '11' directory is 

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-28 Thread Neo Jia
On Wed, Sep 28, 2016 at 01:55:47PM -0600, Alex Williamson wrote:
> On Wed, 28 Sep 2016 12:22:35 -0700
> Neo Jia <c...@nvidia.com> wrote:
> 
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:  
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> > > >   
> > > > > >>>>> My concern is that a type id seems arbitrary but we're 
> > > > > >>>>> specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So 
> > > > > >>>>> why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally 
> > > > > >>>>> create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely 
> > > > > >>>>> deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their 
> > > > > >>>>> product
> > > > > >>>>> lines.
> > > > > >>>>> 
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same 
> > > > > >>>> name.  
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system 
> > > > > >>> and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on 
> > > > > >>> that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've 
> > > > > >>> based
> > > > > >>> our XML on the wrong attribute.  If the new device does not 
> > > > > >>> support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply 
> > > > > >>> initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>   
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type 
> > > > > >> directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type 
> > > > > >> would
> > > > > >> not be there in its mdev_supported_types, it will have different 
> > > > > >> types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. 
> > > > > >> XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, 
> > > > > >> libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory 
> > > > > >> format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>  |-- 11
> > > > > >>  |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should 
> > > > > >> throw
> > > > > >> error on not able to find '11' directory.
> > > > > > 
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that 
> > > > > > happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed 
> > > > > > to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and 
> > > > > > removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and 
> > > > > > the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >
> > > > > 
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or 
> > > > > "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > > 
> > > > >   
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. 
> > > > > >>>> Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor 
> > > > > >>>> driver
> > > > > >>>> decide what to use for directory name and same should be used in 
> > > > > >>>> device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>> 
> > > > > >>>>   my-vgpu
> > > > > >>>>   pci__86_00_0
> > > > > >>>>   
> > > > > >>>> 

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-28 Thread Neo Jia
On Wed, Sep 28, 2016 at 07:45:38PM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Thursday, September 29, 2016 3:23 AM
> > 
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> > > >
> > > > > >>>>> My concern is that a type id seems arbitrary but we're 
> > > > > >>>>> specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So 
> > > > > >>>>> why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally 
> > > > > >>>>> create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely 
> > > > > >>>>> deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their 
> > > > > >>>>> product
> > > > > >>>>> lines.
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same 
> > > > > >>>> name.
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system 
> > > > > >>> and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on 
> > > > > >>> that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've 
> > > > > >>> based
> > > > > >>> our XML on the wrong attribute.  If the new device does not 
> > > > > >>> support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply 
> > > > > >>> initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type 
> > > > > >> directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type 
> > > > > >> would
> > > > > >> not be there in its mdev_supported_types, it will have different 
> > > > > >> types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. 
> > > > > >> XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, 
> > > > > >> libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory 
> > > > > >> format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>  |-- 11
> > > > > >>  |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should 
> > > > > >> throw
> > > > > >> error on not able to find '11' directory.
> > > > > >
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that 
> > > > > > happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed 
> > > > > > to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and 
> > > > > > removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and 
> > > > > > the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >
> > > > >
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or 
> > > > > "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > >
> > > > >
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. 
> > > > > >>>> Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor 
> > > > > >>>> driver
> > > > > >>>> decide what to use for directory name and same should be used in 
> > > > > >>>> device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>> 
> > > > > >>>>   my-vgpu
> > > > > >>>>   pci__86_00_0
> > > > > >>>>   
> > > > > >>>> 

Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-09-28 Thread Neo Jia
On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > 
> > Hi libvirt experts,
> > 
> > Thanks for valuable input on v1 version of RFC.
> > 
> > Quick brief, VFIO based mediated device framework provides a way to
> > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > functionalities for mediated devices which are currently being used for
> > pass through devices. This framework introduces a set of new sysfs files
> > for device creation and its life cycle management.
> > 
> > Here is the summary of discussion on v1:
> > 1. Discover mediated device:
> > As part of physical device initialization process, vendor driver will
> > register their physical devices, which will be used to create virtual
> > device (mediated device, aka mdev) to the mediated framework.
> > 
> > Vendor driver should specify mdev_supported_types in directory format.
> > This format is class based, for example, display class directory format
> > should be as below. We need to define such set for each class of devices
> > which would be supported by mediated device framework.
> > 
> >  --- mdev_destroy
> >  --- mdev_supported_types
> >  |-- 11
> >  |   |-- create
> >  |   |-- name
> >  |   |-- fb_length
> >  |   |-- resolution
> >  |   |-- heads
> >  |   |-- max_instances
> >  |   |-- params
> >  |   |-- requires_group
> >  |-- 12
> >  |   |-- create
> >  |   |-- name
> >  |   |-- fb_length
> >  |   |-- resolution
> >  |   |-- heads
> >  |   |-- max_instances
> >  |   |-- params
> >  |   |-- requires_group
> >  |-- 13
> >  |-- create
> >  |-- name
> >  |-- fb_length
> >  |-- resolution
> >  |-- heads
> >  |-- max_instances
> >  |-- params
> >  |-- requires_group
> > 
> > 
> > In the above example directory '11' represents a type id of mdev device.
> > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > 'requires_group' would be Read-Only files that vendor would provide to
> > describe about that type.
> > 
> > 'create':
> > Write-only file. Mandatory.
> > Accepts string to create mediated device.
> > 
> > 'name':
> > Read-Only file. Mandatory.
> > Returns string, the name of that type id.
> 
> Presumably this is a human-targetted title/description of
> the device.
> 
> > 
> > 'fb_length':
> > Read-only file. Mandatory.
> > Returns {K,M,G}, size of framebuffer.
> > 
> > 'resolution':
> > Read-Only file. Mandatory.
> > Returns 'hres x vres' format. Maximum supported resolution.
> > 
> > 'heads':
> > Read-Only file. Mandatory.
> > Returns integer. Number of maximum heads supported.
> 
> None of these should be mandatory as that makes the mdev
> useless for non-GPU devices.
> 
> I'd expect to see a 'class' or 'type' attribute in the
> directory whcih tells you what kind of mdev it is. A
> valid 'class' value would be 'gpu'. The fb_length,
> resolution, and heads parameters would only be mandatory
> when class==gpu.
> 

Hi Daniel,

Here you are proposing to add a class named "gpu", which will make all those gpu
related attributes mandatory, which libvirt can allow user to better
parse/present a particular mdev configuration?

I am just wondering if there is another option that we just make all those
attributes that a mdev device can have as optional but still meaningful to
libvirt, so libvirt can still parse / recognize them as an class "mdev".

In general, I am just trying to understand the requirement from libvirt and see
how we can fit in this requirement for both Intel and NVIDIA since Intel is also
moving to the type-based interface although they don't have "class" concept yet.

Thanks,
Neo

> > 'max_instance':
> > Read-Only file. Mandatory.
> > Returns integer.  Returns maximum mdev device could be created
> > at the moment when this file is read. This count would be updated by
> > vendor driver. Before creating mdev device of this type, check if
> > max_instance is > 0.
> > 
> > 'params'
> > Write-Only file. Optional.
> > String input. Libvirt would pass the string given in XML file to
> > this file and then create mdev device. Set empty string to clear params.
> > For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> > limiter for performance benchmarking, then create device of type 11. The
> > device created would have that parameter set by vendor driver.
> 
> Nope, libvirt will explicitly *NEVER* allow arbitrary opaque
> passthrough of vendor specific data in this way.
> 
> > The parent device would look like:
> > 
> >
> >  pci__86_00_0
> >  
> >0
> >134
> >0
> >0
> >
> >  
> >  
> >
> >GRID M60-0B
> >512M
> >

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-28 Thread Neo Jia
On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > On Thu, 22 Sep 2016 09:41:20 +0530
> > Kirti Wankhede  wrote:
> > 
> > > > My concern is that a type id seems arbitrary but we're specifying 
> > > > that
> > > > it be unique.  We already have something unique, the name.  So why 
> > > > try
> > > > to make the type id unique as well?  A vendor can accidentally 
> > > > create
> > > > their vendor driver so that a given name means something very
> > > > specific.  On the other hand they need to be extremely deliberate to
> > > > coordinate that a type id means a unique thing across all their 
> > > > product
> > > > lines.
> > > >   
> > > 
> > >  Let me clarify, type id should be unique in the list of
> > >  mdev_supported_types. You can't have 2 directories in with same 
> > >  name.
> > > >>>
> > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > >>> our XML on the wrong attribute.  If the new device does not support
> > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > >>> whatever type-id 11 happens to be on this new card.
> > > >>> 
> > > >>
> > > >> If there are 2 M60 in the system then you would find '11' type 
> > > >> directory
> > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > >> not be there in its mdev_supported_types, it will have different types.
> > > >>
> > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > >> would have to find 'create' file in sysfs in following directory 
> > > >> format:
> > > >>
> > > >>  --- mdev_supported_types
> > > >>  |-- 11
> > > >>  |   |-- create
> > > >>
> > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > >> error on not able to find '11' directory.  
> > > > 
> > > > This really seems like an accident waiting to happen.  What happens
> > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > yet another arbitrary requirement that we have some sort of globally
> > > > unique type-id database make a lot of sense?  The same issue applies
> > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > name is the primary index for the mdev device, I know what it is.
> > > > Seeing type-id='11' is meaningless.
> > > >  
> > > 
> > > Let me clarify again, type '11' is a string that vendor driver would
> > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > If 2 vendors used same string we can't control that. right?
> > > 
> > > 
> > >  Lets remove 'id' from type id in XML if that is the concern. 
> > >  Supported
> > >  types is going to be defined by vendor driver, so let vendor driver
> > >  decide what to use for directory name and same should be used in 
> > >  device
> > >  xml file, it could be '11' or "GRID M60-0B":
> > > 
> > >  
> > >    my-vgpu
> > >    pci__86_00_0
> > >    
> > >  

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-07 Thread Neo Jia
On Wed, Sep 07, 2016 at 07:27:19PM +0100, Daniel P. Berrange wrote:
> On Wed, Sep 07, 2016 at 11:17:39AM -0700, Neo Jia wrote:
> > On Wed, Sep 07, 2016 at 10:44:56AM -0600, Alex Williamson wrote:
> > > On Wed, 7 Sep 2016 21:45:31 +0530
> > > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> > > 
> > > > To hot-plug mdev device to a domain in which there is already a mdev
> > > > device assigned, mdev device should be created with same group number as
> > > > the existing devices are and then hot-plug it. If there is no mdev
> > > > device in that domain, then group number should be a unique number.
> > > > 
> > > > This simplifies the mdev grouping and also provide flexibility for
> > > > vendor driver implementation.
> > > 
> > > The 'start' operation for NVIDIA mdev devices allocate peer-to-peer
> > > resources between mdev devices.  Does this not represent some degree of
> > > an isolation hole between those devices?  Will peer-to-peer DMA between
> > > devices honor the guest IOVA when mdev devices are placed into separate
> > > address spaces, such as possible with vIOMMU?
> > 
> > Hi Alex,
> > 
> > In reality, the p2p operation will only work under same translation domain.
> > 
> > As we are discussing the multiple mdev per VM use cases, I think we probably
> > should not just limit it for p2p operation.
> > 
> > So, in general, the NVIDIA vGPU device model's requirement is to 
> > know/register 
> > all mdevs per VM before opening any those mdev devices.
> 
> It concerns me that if we bake this rule into the sysfs interface,
> then it feels like we're making life very hard for future support
> for hotplug / unplug of mdevs to running VMs.

Hi Daniel,

I don't think the grouping will stop anybody from supporting hotplug / unplug at
least from syntax point of view.

> 
> Conversely, if we can solve the hotplug/unplug problem, then we
> potentially would not need this grouping concept.

I think Kirti has also mentioned about hotplug support in her proposal, do you
mind to comment on that thread so I can think if I have missed anything?

Thanks,
Neo

> 
> I'd hate us to do all this complex work to group multiple mdevs per
> VM only to throw it away later when we hotplug support is made to
> work.
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-07 Thread Neo Jia
On Wed, Sep 07, 2016 at 10:44:56AM -0600, Alex Williamson wrote:
> On Wed, 7 Sep 2016 21:45:31 +0530
> Kirti Wankhede  wrote:
> 
> > To hot-plug mdev device to a domain in which there is already a mdev
> > device assigned, mdev device should be created with same group number as
> > the existing devices are and then hot-plug it. If there is no mdev
> > device in that domain, then group number should be a unique number.
> > 
> > This simplifies the mdev grouping and also provide flexibility for
> > vendor driver implementation.
> 
> The 'start' operation for NVIDIA mdev devices allocate peer-to-peer
> resources between mdev devices.  Does this not represent some degree of
> an isolation hole between those devices?  Will peer-to-peer DMA between
> devices honor the guest IOVA when mdev devices are placed into separate
> address spaces, such as possible with vIOMMU?

Hi Alex,

In reality, the p2p operation will only work under same translation domain.

As we are discussing the multiple mdev per VM use cases, I think we probably
should not just limit it for p2p operation. 

So, in general, the NVIDIA vGPU device model's requirement is to know/register 
all mdevs per VM before opening any those mdev devices.

> 
> I don't particularly like the iommu group solution either, which is why
> in my latest proposal I've given the vendor driver a way to indicate
> this grouping is required so more flexible mdev devices aren't
> restricted by this.  But the limited knowledge I have of the hardware
> configuration which imposes this restriction on NVIDIA devices seems to
> suggest that iommu grouping of these sets is appropriate.  The vfio-core
> infrastructure is almost entirely built for managing vfio group, which
> are just a direct mapping of iommu groups.  So the complexity of iommu
> groups is already handled.  Adding a new layer of grouping into mdev
> seems like it's increasing the complexity further, not decreasing it.

I really appreciate your thoughts on this issue, and consideration of how NVIDIA
vGPU device model works, but so far I still feel we are borrowing a very
meaningful concept "iommu group" to solve an device model issues, which I 
actually 
hope can be workarounded by a more independent piece of logic, and that is why 
Kirti is
proposing the "mdev group".

Let's see if we can address your concerns / questions in Kirti's reply.

Thanks,
Neo

> Thanks,
> 
> Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC] libvirt vGPU QEMU integration

2016-08-21 Thread Neo Jia
On Fri, Aug 19, 2016 at 03:22:48PM -0400, Laine Stump wrote:
> On 08/18/2016 12:41 PM, Neo Jia wrote:
> > Hi libvirt experts,
> > 
> > I am starting this email thread to discuss the potential solution / 
> > proposal of
> > integrating vGPU support into libvirt for QEMU.
> 
> Thanks for the detailed description. This is very helpful.
> 
> 
> > 
> > Some quick background, NVIDIA is implementing a VFIO based mediated device
> > framework to allow people to virtualize their devices without SR-IOV, for
> > example NVIDIA vGPU, and Intel KVMGT. Within this framework, we are reusing 
> > the
> > VFIO API to process the memory / interrupt as what QEMU does today with 
> > passthru
> > device.
> > 
> > The difference here is that we are introducing a set of new sysfs file for
> > virtual device discovery and life cycle management due to its virtual 
> > nature.
> > 
> > Here is the summary of the sysfs, when they will be created and how they 
> > should
> > be used:
> > 
> > 1. Discover mediated device
> > 
> > As part of physical device initialization process, vendor driver will 
> > register
> > their physical devices, which will be used to create virtual device 
> > (mediated
> > device, aka mdev) to the mediated framework.
> 
> 
> We've discussed this question offline, but I just want to make sure I
> understood correctly - all initialization of the physical device on the host
> is already handled "elsewhere", so libvirt doesn't need to be concerned with
> any physical device lifecycle or configuration (setting up the number or
> types of vGPUs), correct? 

Hi Laine,

Yes, that is right, at least for NVIDIA vGPU.

> Do you think this would also be the case for other
> vendors using the same APIs? I guess this all comes down to whether or not
> the setup of the physical device is defined within the bounds of the common
> infrastructure/API, or if it's something that's assumed to have just
> magically happened somewhere else.

I would assume that is the case for other vendors as well, although this common
infrastructure doesn't put any restrictions about the physical device setup or
initialization, so actually vendor can have options to defer some of them till
the point when virtual device gets created. 

But if we just look at from the API level which gets exposed to libvirt, it is
the vendor driver's responsibility to ensure that the virtual device will be
available in a reasonable amount of time after the "online" sysfs file is set to
1. But where to hide the HW setup is not enforced in this common API.

In NVIDIA case, once our kernel driver registers the physical devices that he
owns to the "common infrastructure", all the physical devices are already fully
initialized and ready for virtual device creation.

> 
> 
> > 
> > Then, the sysfs file "mdev_supported_types" will be available under the 
> > physical
> > device sysfs, and it will indicate the supported mdev and configuration for 
> > this
> > particular physical device, and the content may change dynamically based on 
> > the
> > system's current configurations, so libvirt needs to query this file every 
> > time
> > before create a mdev.
> 
> I had originally thought that libvirt would be setting up and managing a
> pool of virtual devices, similar to what we currently do with SRIOV VFs. But
> from this it sounds like the management of this pool is completely handled
> by your drivers (especially since the contents of the pool can apparently
> completely change at any instant). In one way that makes life easier for
> libvirt, because it doesn't need to manage anything.

The pool (vgpu type availabilities) will only subject to change when virtual
devices get created or destroyed, as for now we don't support heterogeneous vGPU
type on the same physical GPU. Even in the future we have added such support,
the point of change is still the same.

> 
> On the other hand, it makes thing less predictable. For example, when
> libvirt defines a domain, it queries the host system to see what types of
> devices are legal in guests on this host, and expects those devices to be
> available at a later time. As I understand it (and I may be completely
> wrong), when no vGPUs are running on the hardware, there is a choice of
> several different models of vGPU (like the example you give below), but when
> the first vGPU is started up, that triggers the host driver to restrict the
> available models. If that's the case, then a particular vGPU could be
> "available" when a domain is defined, but not an option by the time the
> domain is started. That's not a show stopper, but

Re: [libvirt] [RFC] libvirt vGPU QEMU integration

2016-08-21 Thread Neo Jia
On Fri, Aug 19, 2016 at 02:42:27PM +0200, Michal Privoznik wrote:
> On 18.08.2016 18:41, Neo Jia wrote:
> > Hi libvirt experts,
> 
> Hi, welcome to the list.
> 
> > 
> > I am starting this email thread to discuss the potential solution / 
> > proposal of
> > integrating vGPU support into libvirt for QEMU.
> > 
> > Some quick background, NVIDIA is implementing a VFIO based mediated device
> > framework to allow people to virtualize their devices without SR-IOV, for
> > example NVIDIA vGPU, and Intel KVMGT. Within this framework, we are reusing 
> > the
> > VFIO API to process the memory / interrupt as what QEMU does today with 
> > passthru
> > device.
> 
> So as far as I understand, this is solely NVIDIA's API and other vendors
> (e.g. Intel) will use their own or is this a standard that others will
> comply to?

Hi Michal,

Based on the initial vGPU VFIO design discussion thread on QEMU mailing, I
believe this is what both NVIDIA, Intel and even other companies will comply to.

(People from related parties are CC'ed in this email, such as Intel and IBM.)

As you know, I can't speak for Intel, so I would like to defer this question to
them, but above is my understanding based on the QEMU/KVM community discussions.

> 
> > 
> > The difference here is that we are introducing a set of new sysfs file for
> > virtual device discovery and life cycle management due to its virtual 
> > nature.
> > 
> > Here is the summary of the sysfs, when they will be created and how they 
> > should
> > be used:
> > 
> > 1. Discover mediated device
> > 
> > As part of physical device initialization process, vendor driver will 
> > register
> > their physical devices, which will be used to create virtual device 
> > (mediated
> > device, aka mdev) to the mediated framework.
> > 
> > Then, the sysfs file "mdev_supported_types" will be available under the 
> > physical
> > device sysfs, and it will indicate the supported mdev and configuration for 
> > this 
> > particular physical device, and the content may change dynamically based on 
> > the
> > system's current configurations, so libvirt needs to query this file every 
> > time
> > before create a mdev.
> 
> Ah, that was gonna be my question. Because in the example below, you
> used "echo '...vgpu_type_id=20...' > /sys/bus/.../mdev_create". And I
> was wondering where does the number 20 come from. Now what I am
> wondering about is how libvirt should expose these to users. Moreover,
> how it should let users to chose.
> We have a node device driver where I guess we could expose possible
> options and then require some explicit value in the domain XML (but what
> value would that be? I don't think taking vgpu_type_id-s as they are
> would be a great idea).

Right, the vgpu_type_id is just a handle for a given type of vGPU device for
NVIDIA case.  How about expose the "vgpu_type" which is a meaningful name
for the vGPU end users?

Also, when you are saying "let users to chose", does this mean to expose some
virsh command to allow user to dump their potential virtual devices and pick
one?

> 
> > 
> > Note: different vendors might have their own specific configuration sysfs as
> > well, if they don't have pre-defined types.
> > 
> > For example, we have a NVIDIA Tesla M60 on 86:00.0 here registered, and 
> > here is
> > NVIDIA specific configuration on an idle system.
> > 
> > For example, to query the "mdev_supported_types" on this Tesla M60:
> > 
> > cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> > # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
> > max_resolution
> > 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> > 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> > 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> > 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> > 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> > 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> > 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> > 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> > 
> > 2. Create/destroy mediated device
> > 
> > Two sysfs files are available under the physical device sysfs path : 
> > mdev_create
> > and mdev_destroy
> > 
> > Th

[libvirt] [RFC] libvirt vGPU QEMU integration

2016-08-18 Thread Neo Jia
Hi libvirt experts,

I am starting this email thread to discuss the potential solution / proposal of
integrating vGPU support into libvirt for QEMU.

Some quick background, NVIDIA is implementing a VFIO based mediated device
framework to allow people to virtualize their devices without SR-IOV, for
example NVIDIA vGPU, and Intel KVMGT. Within this framework, we are reusing the
VFIO API to process the memory / interrupt as what QEMU does today with passthru
device.

The difference here is that we are introducing a set of new sysfs file for
virtual device discovery and life cycle management due to its virtual nature.

Here is the summary of the sysfs, when they will be created and how they should
be used:

1. Discover mediated device

As part of physical device initialization process, vendor driver will register
their physical devices, which will be used to create virtual device (mediated
device, aka mdev) to the mediated framework.

Then, the sysfs file "mdev_supported_types" will be available under the physical
device sysfs, and it will indicate the supported mdev and configuration for 
this 
particular physical device, and the content may change dynamically based on the
system's current configurations, so libvirt needs to query this file every time
before create a mdev.

Note: different vendors might have their own specific configuration sysfs as
well, if they don't have pre-defined types.

For example, we have a NVIDIA Tesla M60 on 86:00.0 here registered, and here is
NVIDIA specific configuration on an idle system.

For example, to query the "mdev_supported_types" on this Tesla M60:

cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
# vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
max_resolution
11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

2. Create/destroy mediated device

Two sysfs files are available under the physical device sysfs path : mdev_create
and mdev_destroy

The syntax of creating a mdev is:

echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

The syntax of destroying a mdev is:

echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

The $mdev_UUID is a unique identifier for this mdev device to be created, and it
is unique per system.

For NVIDIA vGPU, we require a vGPU type identifier (shown as vgpu_type_id in
above Tesla M60 output), and a VM UUID to be passed as
"vendor_specific_argument_list".

If there is no vendor specific arguments required, either "$mdev_UUID" or
"$mdev_UUID:" will be acceptable as input syntax for the above two commands.

To create a M60-4Q device, libvirt needs to do:

echo "$mdev_UUID:vgpu_type_id=20,vm_uuid=$VM_UUID" >
/sys/bus/pci/devices/\:86\:00.0/mdev_create

Then, you will see a virtual device shows up at:

/sys/bus/mdev/devices/$mdev_UUID/

For NVIDIA, to create multiple virtual devices per VM, it has to be created
upfront before bringing any of them online.

Regarding error reporting and detection, on failure, write() to sysfs using fd
returns error code, and write to sysfs file through command prompt shows the
string corresponding to error code.

3. Start/stop mediated device

Under the virtual device sysfs, you will see a new "online" sysfs file.

you can do cat /sys/bus/mdev/devices/$mdev_UUID/online to get the current status
of this virtual device (0 or 1), and to start a virtual device or stop a 
virtual 
device you can do:

echo "1|0" > /sys/bus/mdev/devices/$mdev_UUID/online

libvirt needs to query the current state before changing state.

Note: if you have multiple devices, you need to write to the "online" file
individually.

For NVIDIA, if there are multiple mdev per VM, libvirt needs to bring all of
them "online" before starting QEMU.

4. Launch QEMU/VM

Pass the mdev sysfs path to QEMU as vfio-pci device:

-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$mdev_UUID,id=vgpu0

5. Shutdown sequence 

libvirt needs to shutdown the qemu, bring the virtual device offline, then 
destroy the
virtual device

6. VM Reset

No change or requirement for libvirt as this will be handled via VFIO reset API
and QEMU process will keep running as before.

7. Hot-plug

It optional for vendors to support hot-plug.

And it is same syntax to create a virtual device for hot-plug. 

For hot-unplug, after executing QEMU monitor "device del" command, libvirt needs
to write to "destroy" sysfs to