[libvirt] [PATCH v2 1/1] Add attribute single_usage_restriction for mdev type-id

2018-10-08 Thread Kirti Wankhede
Generally a single instance of mdev device, a share of physical device, is
assigned to user space application or a VM. There are cases when multiple
instances of mdev devices of same or different types are required by user
space application or VM. For example in case of vGPU, multiple mdev devices
of type which represents whole GPU can be assigned to one instance of
application or VM.

All types of mdev devices may not support assigning multiple mdev devices
to a user space application. In that case vendor driver can fail open()
call of mdev device. But there is no way to know User space application to
about the configuration supported by vendor driver.

To expose supported configuration, vendor driver should add
'single_usage_restriction' attribute to type-id directory. Returning Y for
this attribute indicates vendor driver has restriction of single mdev
device of particular  assigned to one user space application.
Returning N indicates that multiple mdev devices of particular 
can be assigned to one user space application.

User space application should read if 'single_usage_restriction' attibute
is present in  directory of all mdev devices which are going to be
used. If all read N then user space application can proceed with multiple
mdev devices.

This is optional and readonly attribute.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 16 
 1 file changed, 16 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
index 452dbe39270e..3aca352a70e5 100644
--- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
+++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
@@ -85,6 +85,22 @@ Users:
a particular  that can help in understanding the
features provided by that type of mediated device.
 
+What:  /sys/.../mdev_supported_types//single_usage_restriction
+Date:   October 2018
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will return Y or N. Returning Y indicates
+   vendor driver has restriction of single mdev device of this
+   particular  assigned to one user space application.
+   Returning N indicates that multiple mdev devices of particular
+can be assigned to one user space application.
+   This is optional and readonly attribute.
+Users:
+   User space application should read if 'single_usage_restriction'
+   attibute is present in  directory of all mdev devices
+   which are going to be used. If all read N then user space
+   application can proceed with multiple mdev devices.
+
 What:   /sys/...///
 Date:   October 2016
 Contact:Kirti Wankhede 
-- 
2.7.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Kirti Wankhede


On 9/27/2018 9:29 PM, Alex Williamson wrote:
> On Thu, 27 Sep 2018 06:48:27 +
> "Tian, Kevin"  wrote:
> 
>>> From: Kirti Wankhede
>>> Sent: Thursday, September 27, 2018 2:22 PM
>>>
>>> Generally a single instance of mdev device, a share of physical device, is
>>> assigned to user space application or a VM. There are cases when multiple
>>> instances of mdev devices of same or different types are required by User
>>> space application or VM. For example in case of vGPU, multiple mdev
>>> devices
>>> of type which represents whole GPU can be assigned to one instance of
>>> application or VM.
>>>
>>> All types of mdev devices may not support assigning multiple mdev devices
>>> to a user space application. In that case vendor driver can fail open()
>>> call of mdev device. But there is no way to know User space application
>>> about the configuration supported by vendor driver.
>>>
>>> To expose supported configuration, vendor driver should add
>>> 'multiple_mdev_support' attribute to type-id directory if vendor driver
>>> supports assigning multiple mdev devices of a particular type-id to one
>>> instance of user space application or a VM.
>>>
>>> User space application should read if 'multiple_mdev_support' attibute is
>>> present in type-id directory of all mdev devices which are going to be
>>> used. If all read 1 then user space application can proceed with multiple
>>> mdev devices.
>>>
>>> This is optional and readonly attribute.  
>>
>> I didn't get what is the exact problem from above description. Isn't it the
>> basic point behind mdev concept that parent driver can create multiple
>> mdev instances on a physical device? VFIO usually doesn't care whether
>> multiple devices (pci or mdev) are assigned to same process/VM or not.
>> Can you give some example of actual problem behind this change?
> 
> The situation here is that mdev imposes no restrictions regarding how
> mdev devices can be used by the user.  Multiple mdevs, regardless of
> type, can be combined and used in any way the user sees fit, at least
> that's the theory.  However, in practice, certain vendor drivers impose
> a limitation that prevents multiple mdev devices from being used in the
> same VM.  This is done by returning an error when the user attempts to
> open those additional devices.  Now we're in the very predictable
> situation that we want to consider that some of the vendor's mdev types
> might be combined in the same userspace instance, while others still
> impose a restriction, and how can we optionally expose such per mdev
> type restrictions to the userspace so we have something better than
> "try it and see if it works".
> 
> The below proposal assumes that a single instance per VM is the norm
> and that we add something to the API to indicate multiple instances are
> supported.  However, that's really never been the position of the mdev
> core, where there's no concept of limiting devices per user instance;
> it's a vendor driver restriction.  So I think the question is then why
> should we burden vendor drivers who do not impose a restriction with an
> additional field?  Does the below extension provide a better backwards
> compatibility scenario?

The vendor driver who do not want to impose restriction, doesn't need to
provide this attribute. In that case, behavior would remain same as it
is now, i.e. multiple mdev instances can be used by one instance of
application.


> With the current code, I think it's reasonable for userspace to assume
> there are no mdev limits per userspace instance, those that exist are
> vendor specific.  The below proposal is for an optional field, what
> does the management tool assume when it's not supplied?  We have both
> cases currently, mdev devices that allow multiple instances per VM and
> those that do not, so I don't see that the user is more informed with
> this addition and no attempt is made here to synchronously update all
> drivers to expose this new attribute.
> 

When this attribute is not provided, behavior should remain same as it
is now. But if this attribute is provided then management tool should
read this attribute to verify that such combination is supported by
vendor driver.

> Does it make more sense then to add an attribute to expose the
> restriction rather than the lack of restriction.  Perhaps something
> like "singleton_usage_restriction".

I would prefer to expose what is supported than what is restricted.

>  Also if the type is meant to be a
> boolean, I think we have Y/N support for that in sysfs rather than
> using integers

[libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Kirti Wankhede
Generally a single instance of mdev device, a share of physical device, is
assigned to user space application or a VM. There are cases when multiple
instances of mdev devices of same or different types are required by User
space application or VM. For example in case of vGPU, multiple mdev devices
of type which represents whole GPU can be assigned to one instance of
application or VM.

All types of mdev devices may not support assigning multiple mdev devices
to a user space application. In that case vendor driver can fail open()
call of mdev device. But there is no way to know User space application
about the configuration supported by vendor driver.

To expose supported configuration, vendor driver should add
'multiple_mdev_support' attribute to type-id directory if vendor driver
supports assigning multiple mdev devices of a particular type-id to one
instance of user space application or a VM.

User space application should read if 'multiple_mdev_support' attibute is
present in type-id directory of all mdev devices which are going to be
used. If all read 1 then user space application can proceed with multiple
mdev devices.

This is optional and readonly attribute.

Signed-off-by: Neo Jia 
Signed-off-by: Kirti Wankhede 
---
 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
index 452dbe39270e..69e1291479ce 100644
--- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
+++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
@@ -85,6 +85,19 @@ Users:
a particular  that can help in understanding the
features provided by that type of mediated device.
 
+What:   /sys/.../mdev_supported_types//multiple_mdev_support
+Date:   September 2018
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will return 0 or 1. Returning 1 indicates
+   multiple mdev devices of a particular  assigned to one
+   User space application is supported by vendor driver. This is
+   optional and readonly attribute.
+Users:
+   Userspace applications interested in knowing if multiple mdev
+   devices of a particular  can be assigned to one
+   instance of application.
+
 What:   /sys/...///
 Date:   October 2016
 Contact:Kirti Wankhede 
-- 
2.7.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Matching the type of mediated devices in the migration

2018-08-06 Thread Kirti Wankhede



On 8/3/2018 11:26 PM, Alex Williamson wrote:
> On Fri, 3 Aug 2018 12:07:58 +
> "Wang, Zhi A"  wrote:
> 
>> Hi:
>>
>> Thanks for unfolding your idea. The picture is clearer to me now. I didn't 
>> realize that you also want to support cross hardware migration. Well, I 
>> thought for a while, the cross hardware migration might be not popular in 
>> vGPU case but could be quite popular in other mdev cases.
> 
> Exactly, we need to think beyond the implementation for a specific
> vendor or class of device.
>  
>> Let me continue my summary:
>>
>> Mdev dev type has already included a parent driver name/a group 
>> name/physical device version/configuration type. For example i915-GVTg_V5_4. 
>> The driver name and the group name could already distinguish the vendor and 
>> the product between different mdevs, e.g. between Intel and Nvidia, between 
>> vGPU or vOther.
> 
> Note that there are only two identifiers here, a vendor driver and a
> type.  We included the vendor driver to avoid namespace collisions
> between vendors.  The type itself should be considered opaque regardless
> of how a specific vendor makes use of it.
> 
>> Each device provides a collection of the version of device state of data 
>> stream in a preferred order in a mdev type, as newer version of device state 
>> might contains more information which might help on performances. 
>>
>> Let's say a new device N and an old device O, they both support mdev_type M.
>>
>> For example:
>> Device N is newer and supports the versions of device state: [ 6.3  6.2 .6.1 
>> ] in mdev type M
>> Device O is older and supports the versions of device state: [ 5.3 5.2 5.1 ] 
>> in mdev type M
>>
>> - Version scheme of device state in backwards compatibility case: Migrate a 
>> VM from a VM with device O to a VM with device N, the mdev type is M.
>>
>> Device N: [ 6.3 6.2 6.1 5.3 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.3
>> The new device directly supports mdev_type M with the preferred version on 
>> Device O. Good, best situation.
>>
>> Device N: [ 6.3 6.2 6.1 5.2 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.2
>> The new device supports mdev_type M, but not the preferred version. After 
>> the migration, the vendor driver might have to disable some features which 
>> is not mentioned in 5.2 device state. But this totally depends on the vendor 
>> driver. If user wish to achieve the best experience, he should update the 
>> vendor driver in device N, which supports the preferred version on device O.
>>
>> Device N: [ 6.3 6.2 6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: None
>> No version is matched. Migration would fail. User should update the vendor 
>> driver on device N and device O.
>>
>> - Version scheme of device state in forwards compatibility case: Migrate a 
>> VM from a VM with N to a VM with device O, the mdev type is M.
>>
>> Device N: [ 6.3 6.2 .6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor driver on 
>> device O. Now device O could support [ 5.3 5.2 5.1 6.1 ] (As an old device, 
>> the Device O still prefers version 5.3)
>> Version used in migration: 6.1
>> As the new device states is going to migrate to an old device, the vendor 
>> driver on old device might have to specially dealing with the new version of 
>> device state. It depends on the vendor driver. 
>>
>> - QEMU has to figure out and choose the version of device states before 
>> reading device state from the region. (Perhaps we can put the option of 
>> selection in the control part of the region as well)
>> - Libvirt will check if there is any match of the version in the collection 
>> in device O and device N before migration.
>> - Each mdev_type has its own collection of versions. (Device can support 
>> different versions in different types)
>> - Better the collection is not a range, better they could be a collection of 
>> the version strings. (The vendor driver might drop some versions during the 
>> upgrade since they are not ideal)
> 
> I believe that QEMU has always avoided trying to negotiate a migration
> version.  We can only negotiate if the target is online and since a
> save/restore is essentially an offline migration, there's no
> opportunity for negotiation.  Therefore I think we need to assume the
> source version is fixed.  If we need to expose an older migration
> interface, I think we'd need to consider instantiating the mdev with
> that specification or configuring it via attributes before usage, just
> like QEMU does with specifying a machine type version.
> 
> Providing an explicit list of compatible versions also seems like it
> could quickly get out of hand, imagine a driver with regular releases
> that maintains compatibility for years.  The list could get
> unmanageable.
> 
> To be honest, I'm pretty dubious whether vendors will actually implement
> cross version migration, or really consider migration compatibility at
> all, 

Re: [libvirt] [RFC PATCH 0/2] Add new mdev type for aggregated resources

2018-06-22 Thread Kirti Wankhede



On 6/22/2018 1:12 PM, Zhenyu Wang wrote:
> On 2018.06.21 09:00:28 -0600, Alex Williamson wrote:
>> On Thu, 21 Jun 2018 19:57:38 +0530
>> Kirti Wankhede  wrote:
>>
>>> On 6/20/2018 1:10 PM, Zhenyu Wang wrote:
>>>> Current mdev device create interface depends on fixed mdev type, which get 
>>>> uuid
>>>> from user to create instance of mdev device. If user wants to use 
>>>> customized
>>>> number of resource for mdev device, then only can create new mdev type for 
>>>> that
>>>> which may not be flexible.
>>>>
>>>> To allow to create user defined resources for mdev, this RFC trys
>>>> to extend mdev create interface by adding new "instances=xxx" parameter
>>>> following uuid, for target mdev type if aggregation is supported, it can
>>>> create new mdev device which contains resources combined by number of
>>>> instances, e.g
>>>>
>>>> echo ",instances=10" > create  
>>>
>>> This seems orthogonal to the way mdev types are meant to be used. Vendor
>>> driver can provide the possible types to provide flexibility to the users.
>>
>> I think the goal here is to define how we create a type that supports
>> arbitrary combinations of resources where the total number of those
>> resources my be sufficiently large that the parent driver enumerating
>> them all is not feasible.
>>
>>> Secondly, not always all resources defined for a particular mdev type
>>> can be multiplied, for example, a mdev type for vGPU that supports 2
>>> heads, that can't be multiplied to use with 20 heads.
> 
> yeah, for vGPU we actually can have flexible config for memory size but
> currently fixed by type definition. Although like for display config, we
> are just sticking with 1 head config even for aggregate type.
> 

A mdev type is set of parameters. If aggregation is on one parameter,
how can user know which parameter can be aggregated?

>>
>> Not all types need to define themselves this way, aiui this is an
>> optional extension.  Userspace can determine if this feature is
>> available with the new attribute and if they're not aware of the new
>> attribute, we operate in a backwards compatible mode where 'echo $UUID >
>> create' consumes one instance of that type.  Mdev, like vfio, is device
>> agnostic, so while a vGPU example may have scaling issues that need to
>> be ironed out to implement this, those don't immediately negate the
>> value of this proposal.  Thanks,
>>
> 
> yep, backward compatible is always ensured by this, so for no
> aggregation attrib type, still keep current behavior, driver should
> refuse "instances=" if set by user. One thing I'm concern is that
> looks currently without changing mdev create ops interface, can't
> carry that option for driver, or should we do with more general parameter?
> 

I think, if aggregation is not supported then vendor driver should fail
'create' with instance >1. That means every vendor driver would need a
change to handle this case.

Other way could be, structure mdev_parent_ops can have other function
pointer 'create_with_instances' which has instances as an argument.
Vendor driver which support aggregation will have to provide this
callback and existing vendor driver will need no change. Then in mdev core:

if (instances > 1) {
if (parent->ops->create_with_instances)
ret = parent->ops->create_with_instances(kobj, mdev, instances);
else
return -EINVAL;
} else
 ret = parent->ops->create(kobj, mdev);

Thanks,
Kirti



> thanks
> 
>>
>>>> VM manager e.g libvirt can check mdev type with "aggregation" attribute
>>>> which can support this setting. And new sysfs attribute "instances" is
>>>> created for each mdev device to show allocated number. Default number
>>>> of 1 or no "instances" file can be used for compatibility check.
>>>>
>>>> This RFC trys to create new KVMGT type with minimal vGPU resources which
>>>> can be combined with "instances=x" setting to allocate for user wanted 
>>>> resources.
>>>>
>>>> Zhenyu Wang (2):
>>>>   vfio/mdev: Add new instances parameters for mdev create
>>>>   drm/i915/gvt: Add new aggregation type
>>>>
>>>>  drivers/gpu/drm/i915/gvt/gvt.c   | 26 ---
>>>>  drivers/gpu/drm/i915/gvt/gvt.h   | 14 +---
>>>>  drivers/gpu/drm/i915/gvt/kvmgt.c |  9 +++--
>>>>  drivers/gpu/drm/i915/gvt/vgpu.c  | 56 
>>>>  drivers/s390/cio/vfio_ccw_ops.c  |  3 +-
>>>>  drivers/vfio/mdev/mdev_core.c| 11 ---
>>>>  drivers/vfio/mdev/mdev_private.h |  6 +++-
>>>>  drivers/vfio/mdev/mdev_sysfs.c   | 42 
>>>>  include/linux/mdev.h |  3 +-
>>>>  samples/vfio-mdev/mbochs.c   |  3 +-
>>>>  samples/vfio-mdev/mdpy.c |  3 +-
>>>>  samples/vfio-mdev/mtty.c |  3 +-
>>>>  12 files changed, 141 insertions(+), 38 deletions(-)
>>>>   
>>
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 0/2] Add new mdev type for aggregated resources

2018-06-21 Thread Kirti Wankhede



On 6/20/2018 1:10 PM, Zhenyu Wang wrote:
> Current mdev device create interface depends on fixed mdev type, which get 
> uuid
> from user to create instance of mdev device. If user wants to use customized
> number of resource for mdev device, then only can create new mdev type for 
> that
> which may not be flexible.
> 
> To allow to create user defined resources for mdev, this RFC trys
> to extend mdev create interface by adding new "instances=xxx" parameter
> following uuid, for target mdev type if aggregation is supported, it can
> create new mdev device which contains resources combined by number of
> instances, e.g
> 
> echo ",instances=10" > create

This seems orthogonal to the way mdev types are meant to be used. Vendor
driver can provide the possible types to provide flexibility to the users.

Secondly, not always all resources defined for a particular mdev type
can be multiplied, for example, a mdev type for vGPU that supports 2
heads, that can't be multiplied to use with 20 heads.

Thanks,
Kirti

> 
> VM manager e.g libvirt can check mdev type with "aggregation" attribute
> which can support this setting. And new sysfs attribute "instances" is
> created for each mdev device to show allocated number. Default number
> of 1 or no "instances" file can be used for compatibility check.
> 
> This RFC trys to create new KVMGT type with minimal vGPU resources which
> can be combined with "instances=x" setting to allocate for user wanted 
> resources.
> 
> Zhenyu Wang (2):
>   vfio/mdev: Add new instances parameters for mdev create
>   drm/i915/gvt: Add new aggregation type
> 
>  drivers/gpu/drm/i915/gvt/gvt.c   | 26 ---
>  drivers/gpu/drm/i915/gvt/gvt.h   | 14 +---
>  drivers/gpu/drm/i915/gvt/kvmgt.c |  9 +++--
>  drivers/gpu/drm/i915/gvt/vgpu.c  | 56 
>  drivers/s390/cio/vfio_ccw_ops.c  |  3 +-
>  drivers/vfio/mdev/mdev_core.c| 11 ---
>  drivers/vfio/mdev/mdev_private.h |  6 +++-
>  drivers/vfio/mdev/mdev_sysfs.c   | 42 
>  include/linux/mdev.h |  3 +-
>  samples/vfio-mdev/mbochs.c   |  3 +-
>  samples/vfio-mdev/mdpy.c |  3 +-
>  samples/vfio-mdev/mtty.c |  3 +-
>  12 files changed, 141 insertions(+), 38 deletions(-)
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] sample: vfio mdev display devices.

2018-04-26 Thread Kirti Wankhede


On 4/26/2018 1:22 AM, Dr. David Alan Gilbert wrote:
> * Alex Williamson (alex.william...@redhat.com) wrote:
>> On Wed, 25 Apr 2018 21:00:39 +0530
>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>
>>> On 4/25/2018 4:29 AM, Alex Williamson wrote:
>>>> On Wed, 25 Apr 2018 01:20:08 +0530
>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>   
>>>>> On 4/24/2018 3:10 AM, Alex Williamson wrote:  
>>>>>> On Wed, 18 Apr 2018 12:31:53 -0600
>>>>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>>>> 
>>>>>>> On Mon,  9 Apr 2018 12:35:10 +0200
>>>>>>> Gerd Hoffmann <kra...@redhat.com> wrote:
>>>>>>>
>>>>>>>> This little series adds three drivers, for demo-ing and testing vfio
>>>>>>>> display interface code.  There is one mdev device for each interface
>>>>>>>> type (mdpy.ko for region and mbochs.ko for dmabuf).  
>>>>>>>
>>>>>>> Erik Skultety brought up a good question today regarding how libvirt is
>>>>>>> meant to handle these different flavors of display interfaces and
>>>>>>> knowing whether a given mdev device has display support at all.  It
>>>>>>> seems that we cannot simply use the default display=auto because
>>>>>>> libvirt needs to specifically configure gl support for a dmabuf type
>>>>>>> interface versus not having such a requirement for a region interface,
>>>>>>> perhaps even removing the emulated graphics in some cases (though I
>>>>>>> don't think we have boot graphics through either solution yet).
>>>>>>> Additionally, GVT-g seems to need the x-igd-opregion support
>>>>>>> enabled(?), which is a non-starter for libvirt as it's an experimental
>>>>>>> option!
>>>>>>>
>>>>>>> Currently the only way to determine display support is through the
>>>>>>> VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
>>>>>>> their own they'd need to get to the point where they could open the
>>>>>>> vfio device and perform the ioctl.  That means opening a vfio
>>>>>>> container, adding the group, setting the iommu type, and getting the
>>>>>>> device.  I was initially a bit appalled at asking libvirt to do that,
>>>>>>> but the alternative is to put this information in sysfs, but doing that
>>>>>>> we risk that we need to describe every nuance of the mdev device
>>>>>>> through sysfs and it becomes a dumping ground for every possible
>>>>>>> feature an mdev device might have.
>> ...
>>>>>>> So I was ready to return and suggest that maybe libvirt should probe
>>>>>>> the device to know about these ancillary configuration details, but
>>>>>>> then I remembered that both mdev vGPU vendors had external dependencies
>>>>>>> to even allow probing the device.  KVMGT will fail to open the device
>>>>>>> if it's not associated with an instance of KVM and NVIDIA vGPU, I
>>>>>>> believe, will fail if the vGPU manager process cannot find the QEMU
>>>>>>> instance to extract the VM UUID.  (Both of these were bad ideas)
>>>>>>
>>>>>> Here's another proposal that's really growing on me:
>>>>>>
>>>>>>  * Fix the vendor drivers!  Allow devices to be opened and probed
>>>>>>without these external dependencies.
>>>>>>  * Libvirt uses the existing vfio API to open the device and probe the
>>>>>>necessary ioctls, if it can't probe the device, the feature is
>>>>>>unavailable, ie. display=off, no migration.
>>>>>> 
>>>>>
>>>>> I'm trying to think simpler mechanism using sysfs that could work for
>>>>> any feature and knowing source-destination migration compatibility check
>>>>> by libvirt before initiating migration.
>>>>>
>>>>> I have another proposal:
>>>>> * Add a ioctl VFIO_DEVICE_PROBE_FEATURES
>>>>> struct vfio_device_features {
>>>>> __u32 argsz;
>>>>> __u32 features;
>>>>> }
>>>>>
>>>>> Defin

Re: [libvirt] [PATCH 0/3] sample: vfio mdev display devices.

2018-04-25 Thread Kirti Wankhede


On 4/25/2018 4:29 AM, Alex Williamson wrote:
> On Wed, 25 Apr 2018 01:20:08 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 4/24/2018 3:10 AM, Alex Williamson wrote:
>>> On Wed, 18 Apr 2018 12:31:53 -0600
>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>   
>>>> On Mon,  9 Apr 2018 12:35:10 +0200
>>>> Gerd Hoffmann <kra...@redhat.com> wrote:
>>>>  
>>>>> This little series adds three drivers, for demo-ing and testing vfio
>>>>> display interface code.  There is one mdev device for each interface
>>>>> type (mdpy.ko for region and mbochs.ko for dmabuf).
>>>>
>>>> Erik Skultety brought up a good question today regarding how libvirt is
>>>> meant to handle these different flavors of display interfaces and
>>>> knowing whether a given mdev device has display support at all.  It
>>>> seems that we cannot simply use the default display=auto because
>>>> libvirt needs to specifically configure gl support for a dmabuf type
>>>> interface versus not having such a requirement for a region interface,
>>>> perhaps even removing the emulated graphics in some cases (though I
>>>> don't think we have boot graphics through either solution yet).
>>>> Additionally, GVT-g seems to need the x-igd-opregion support
>>>> enabled(?), which is a non-starter for libvirt as it's an experimental
>>>> option!
>>>>
>>>> Currently the only way to determine display support is through the
>>>> VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
>>>> their own they'd need to get to the point where they could open the
>>>> vfio device and perform the ioctl.  That means opening a vfio
>>>> container, adding the group, setting the iommu type, and getting the
>>>> device.  I was initially a bit appalled at asking libvirt to do that,
>>>> but the alternative is to put this information in sysfs, but doing that
>>>> we risk that we need to describe every nuance of the mdev device
>>>> through sysfs and it becomes a dumping ground for every possible
>>>> feature an mdev device might have.
>>>>  
>>
>> One or two sysfs file for each feature shouldn't be that much of over
>> head? In kernel, other subsystem modules expose capability through
>> sysfs, like PCI subsystem adds 'boot_vga' file for VGA device which
>> returns 0/1 depending on if its boot VGA device. Similarly
>> 'd3cold_allowed', 'msi_bus'...
> 
> Obviously we could add sysfs files, but unlike properties that the PCI
> core exposes about struct pci_dev fields, the idea of a vfio_device is
> much more abstract.  Each bus driver creates its own device
> representation, so we have a top level vfio_device referencing through
> an opaque pointer a vfio_pci_device, vfio_platform_device, or
> mdev_device, and each mdev vendor driver creates its own private data
> structure below the mdev_device.  So it's not quite a simple as one new
> attribute "show" function to handle all devices of that bus_type.  We
> need a consistent implementation in each bus driver and vendor driver
> or we need to figure out how to percolate the information up to the
> vfio core.  Your idea below seems to take the percolate approach.
>  
>>>> So I was ready to return and suggest that maybe libvirt should probe
>>>> the device to know about these ancillary configuration details, but
>>>> then I remembered that both mdev vGPU vendors had external dependencies
>>>> to even allow probing the device.  KVMGT will fail to open the device
>>>> if it's not associated with an instance of KVM and NVIDIA vGPU, I
>>>> believe, will fail if the vGPU manager process cannot find the QEMU
>>>> instance to extract the VM UUID.  (Both of these were bad ideas)  
>>>
>>> Here's another proposal that's really growing on me:
>>>
>>>  * Fix the vendor drivers!  Allow devices to be opened and probed
>>>without these external dependencies.
>>>  * Libvirt uses the existing vfio API to open the device and probe the
>>>necessary ioctls, if it can't probe the device, the feature is
>>>unavailable, ie. display=off, no migration.
>>>   
>>
>> I'm trying to think simpler mechanism using sysfs that could work for
>> any feature and knowing source-destination migration compatibility check
>> by libvirt before initiating migration.
>>
>> I have another proposal:
>> * Add a ioctl VFIO_DEVI

Re: [libvirt] [PATCH 0/3] sample: vfio mdev display devices.

2018-04-24 Thread Kirti Wankhede


On 4/24/2018 3:10 AM, Alex Williamson wrote:
> On Wed, 18 Apr 2018 12:31:53 -0600
> Alex Williamson  wrote:
> 
>> On Mon,  9 Apr 2018 12:35:10 +0200
>> Gerd Hoffmann  wrote:
>>
>>> This little series adds three drivers, for demo-ing and testing vfio
>>> display interface code.  There is one mdev device for each interface
>>> type (mdpy.ko for region and mbochs.ko for dmabuf).  
>>
>> Erik Skultety brought up a good question today regarding how libvirt is
>> meant to handle these different flavors of display interfaces and
>> knowing whether a given mdev device has display support at all.  It
>> seems that we cannot simply use the default display=auto because
>> libvirt needs to specifically configure gl support for a dmabuf type
>> interface versus not having such a requirement for a region interface,
>> perhaps even removing the emulated graphics in some cases (though I
>> don't think we have boot graphics through either solution yet).
>> Additionally, GVT-g seems to need the x-igd-opregion support
>> enabled(?), which is a non-starter for libvirt as it's an experimental
>> option!
>>
>> Currently the only way to determine display support is through the
>> VFIO_DEVICE_QUERY_GFX_PLANE ioctl, but for libvirt to probe that on
>> their own they'd need to get to the point where they could open the
>> vfio device and perform the ioctl.  That means opening a vfio
>> container, adding the group, setting the iommu type, and getting the
>> device.  I was initially a bit appalled at asking libvirt to do that,
>> but the alternative is to put this information in sysfs, but doing that
>> we risk that we need to describe every nuance of the mdev device
>> through sysfs and it becomes a dumping ground for every possible
>> feature an mdev device might have.
>>

One or two sysfs file for each feature shouldn't be that much of over
head? In kernel, other subsystem modules expose capability through
sysfs, like PCI subsystem adds 'boot_vga' file for VGA device which
returns 0/1 depending on if its boot VGA device. Similarly
'd3cold_allowed', 'msi_bus'...

>> So I was ready to return and suggest that maybe libvirt should probe
>> the device to know about these ancillary configuration details, but
>> then I remembered that both mdev vGPU vendors had external dependencies
>> to even allow probing the device.  KVMGT will fail to open the device
>> if it's not associated with an instance of KVM and NVIDIA vGPU, I
>> believe, will fail if the vGPU manager process cannot find the QEMU
>> instance to extract the VM UUID.  (Both of these were bad ideas)
> 
> Here's another proposal that's really growing on me:
> 
>  * Fix the vendor drivers!  Allow devices to be opened and probed
>without these external dependencies.
>  * Libvirt uses the existing vfio API to open the device and probe the
>necessary ioctls, if it can't probe the device, the feature is
>unavailable, ie. display=off, no migration.
> 

I'm trying to think simpler mechanism using sysfs that could work for
any feature and knowing source-destination migration compatibility check
by libvirt before initiating migration.

I have another proposal:
* Add a ioctl VFIO_DEVICE_PROBE_FEATURES
struct vfio_device_features {
__u32 argsz;
__u32 features;
}

Define bit for each feature:
#define VFIO_DEVICE_FEATURE_DISPLAY_REGION  (1 << 0)
#define VFIO_DEVICE_FEATURE_DISPLAY_DMABUF  (1 << 1)
#define VFIO_DEVICE_FEATURE_MIGRATION   (1 << 2)

* Vendor driver returns bitmask of supported features during
initialization phase.

* In vfio core module, trap this ioctl for each device  in
vfio_device_fops_unl_ioctl(), check features bitmask returned by vendor
driver and add a sysfs file if feature is supported that device. This
sysfs file would return 0/1.

For migration this bit will only indicate if host driver supports
migration feature.

For source and destination compatibility check libvirt would need more
data/variables to check like,
* if same type of 'mdev_type' device create-able at destination,
   i.e. if ('mdev_type'->available_instances > 0)

* if host_driver_version at source and destination are compatible.
Host driver from same release branch should be mostly compatible, but if
there are major changes in structures or APIs, host drivers from
different branches might not be compatible, for example, if source and
destination are from different branches and one of the structure had
changed, then data collected at source might not be compatible with
structures at destination and typecasting it to changed structures would
mess up migrated data during restoration.

* if guest_driver_version is compatible with host driver at destination.
For mdev devices, guest driver communicates with host driver in some
form. If there are changes in structures/APIs of such communication,
guest driver at source might not be compatible with host driver at
destination.

'available_instances' sysfs already exist, later 

Re: [libvirt] [RFC]Add new mdev interface for QoS

2017-08-08 Thread Kirti Wankhede


On 8/7/2017 1:11 PM, Gao, Ping A wrote:
> 
> On 2017/8/4 5:11, Alex Williamson wrote:
>> On Thu, 3 Aug 2017 20:26:14 +0800
>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>
>>> On 2017/8/3 0:58, Alex Williamson wrote:
>>>> On Wed, 2 Aug 2017 21:16:28 +0530
>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>  
>>>>> On 8/2/2017 6:29 PM, Gao, Ping A wrote:  
>>>>>> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>>>>>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>>>>>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>>>
>>>>>>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>>>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>>>>>>> [cc +libvir-list]
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>>>>>>  
>>>>>>>>>>>> The vfio-mdev provide the capability to let different guest share 
>>>>>>>>>>>> the
>>>>>>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>>>>>>
>>>>>>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost 
>>>>>>>>>>>> has
>>>>>>>>>>>> different performance requirements, some guests may need higher 
>>>>>>>>>>>> priority
>>>>>>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>>>>>>> resource to get higher 3D performance, corresponding we can define 
>>>>>>>>>>>> some
>>>>>>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>>>>>>> single submission control.
>>>>>>>>>>>>
>>>>>>>>>>>> So I suggest to add some common attributes which are vendor 
>>>>>>>>>>>> agnostic in
>>>>>>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>>>>>>> attribute_group which a vendor can optionally include within the
>>>>>>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>>>>>>> transparently enable this, but it really only provides the standard,
>>>>>>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>>>>>>> but of course the trouble with and sort of standardization is 
>>>>>>>>>>> arriving
>>>>>>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>>>>>>> across any mdev device type?  Are there others that are more 
>>>>>>>>>>> specific
>>>>>>>>>>> to vGPU?  Are there existing examples of this that we can steal 
>>>>>>>>>>> their
>>>>>>>>>>> specification?  
>>>>>>>>>> Yes, you are right, standardization QoS knobs are exactly what I 
>>>>>>>>>> wanted.
>>>>>>>>>> Only when it become a part of the mdev framework and libvirt, then 
>>>>>>>>>> QoS
>>>>>>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>>>>>>> need to focus on the implementation of the corresponding QoS 
>>>>>>>>>> algorithm
>>>>>>>>>> in their back-end driver.
>>>>>>>>>>
>>>>>>>>>> Vfio-mdev framework provide the 

Re: [libvirt] [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 6:29 PM, Gao, Ping A wrote:
> 
> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>
>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>
>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>> [cc +libvir-list]
>>>>>>
>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>  
>>>>>>> The vfio-mdev provide the capability to let different guest share the
>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>
>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost has
>>>>>>> different performance requirements, some guests may need higher priority
>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>> resource to get higher 3D performance, corresponding we can define some
>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>> single submission control.
>>>>>>>
>>>>>>> So I suggest to add some common attributes which are vendor agnostic in
>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>> attribute_group which a vendor can optionally include within the
>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>> transparently enable this, but it really only provides the standard,
>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>> but of course the trouble with and sort of standardization is arriving
>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>> across any mdev device type?  Are there others that are more specific
>>>>>> to vGPU?  Are there existing examples of this that we can steal their
>>>>>> specification?  
>>>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>> need to focus on the implementation of the corresponding QoS algorithm
>>>>> in their back-end driver.
>>>>>
>>>>> Vfio-mdev framework provide the capability to share the device that lack
>>>>> of HW virtualization support to guests, no matter the device type,
>>>>> mediated sharing actually is a time sharing multiplex method, from this
>>>>> point of view, QoS can be take as a generic way about how to control the
>>>>> time assignment for virtual mdev device that occupy HW. As result we can
>>>>> define QoS knob generic across any device type by this way. Even if HW
>>>>> has build in with some kind of QoS support, I think it's not a problem
>>>>> for back-end driver to convert mdev standard QoS definition to their
>>>>> specification to reach the same performance expectation. Seems there are
>>>>> no examples for us to follow, we need define it from scratch.
>>>>>
>>>>> I proposal universal QoS control interfaces like below:
>>>>>
>>>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>>>> total physical resource.
>>>>>
>>>>> Weight: The weight define proportional control of the mdev device
>>>>> resource between guests, it’s orthogonal with Cap, to target load
>>>>> balancing. E.g. if guest 1 should take double mdev device resource
>>>>> compare with guest 2, need set weight ratio to 2:1.
>>>>>
>>>>> Priority: The guest who has higher priority will get execution first,
>>>>> target to some real time usage and speeding interactive response.
>>>>>
>>>>> Above QoS 

Re: [libvirt] [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 3:56 AM, Alex Williamson wrote:
> On Tue, 1 Aug 2017 13:54:27 +0800
> "Gao, Ping A"  wrote:
> 
>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>> On 2017/7/27 0:43, Alex Williamson wrote:  
 [cc +libvir-list]

 On Wed, 26 Jul 2017 21:16:59 +0800
 "Gao, Ping A"  wrote:
  
> The vfio-mdev provide the capability to let different guest share the
> same physical device through mediate sharing, as result it bring a
> requirement about how to control the device sharing, we need a QoS
> related interface for mdev to management virtual device resource.
>
> E.g. In practical use, vGPUs assigned to different quests almost has
> different performance requirements, some guests may need higher priority
> for real time usage, some other may need more portion of the GPU
> resource to get higher 3D performance, corresponding we can define some
> interfaces like weight/cap for overall budget control, priority for
> single submission control.
>
> So I suggest to add some common attributes which are vendor agnostic in
> mdev core sysfs for QoS purpose.  
 I think what you're asking for is just some standardization of a QoS
 attribute_group which a vendor can optionally include within the
 existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
 transparently enable this, but it really only provides the standard,
 all of the support code is left for the vendor.  I'm fine with that,
 but of course the trouble with and sort of standardization is arriving
 at an agreed upon standard.  Are there QoS knobs that are generic
 across any mdev device type?  Are there others that are more specific
 to vGPU?  Are there existing examples of this that we can steal their
 specification?  
>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>> need to focus on the implementation of the corresponding QoS algorithm
>>> in their back-end driver.
>>>
>>> Vfio-mdev framework provide the capability to share the device that lack
>>> of HW virtualization support to guests, no matter the device type,
>>> mediated sharing actually is a time sharing multiplex method, from this
>>> point of view, QoS can be take as a generic way about how to control the
>>> time assignment for virtual mdev device that occupy HW. As result we can
>>> define QoS knob generic across any device type by this way. Even if HW
>>> has build in with some kind of QoS support, I think it's not a problem
>>> for back-end driver to convert mdev standard QoS definition to their
>>> specification to reach the same performance expectation. Seems there are
>>> no examples for us to follow, we need define it from scratch.
>>>
>>> I proposal universal QoS control interfaces like below:
>>>
>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>> total physical resource.
>>>
>>> Weight: The weight define proportional control of the mdev device
>>> resource between guests, it’s orthogonal with Cap, to target load
>>> balancing. E.g. if guest 1 should take double mdev device resource
>>> compare with guest 2, need set weight ratio to 2:1.
>>>
>>> Priority: The guest who has higher priority will get execution first,
>>> target to some real time usage and speeding interactive response.
>>>
>>> Above QoS interfaces cover both overall budget control and single
>>> submission control. I will sent out detail design later once get aligned.  
>>
>> Hi Alex,
>> Any comments about the interface mentioned above?
> 
> Not really.
> 
> Kirti, are there any QoS knobs that would be interesting
> for NVIDIA devices?
> 

We have different types of vGPU for different QoS factors.

When mdev devices are created, its resources are allocated irrespective
of which VM/userspace app is going to use that mdev device. Any
parameter we add here should be tied to particular mdev device and not
to the guest/app that are going to use it. 'Cap' and 'Priority' are
along that line. All mdev device might not need/use these parameters,
these can be made optional interfaces.

In the above proposal, I'm not sure how 'Weight' would work for mdev
devices on same physical device.

In the above example, "if guest 1 should take double mdev device
resource compare with guest 2" but what if guest 2 never booted, how
will you calculate resources?

If libvirt/other toolstack decides to do smart allocation based on type
name without taking physical host device as input, guest 1 and guest 2
might get mdev devices created on different physical device. Then would
weightage matter here?

Thanks,
Kirti


> Implementing libvirt support at the same time might be an interesting
> exercise if we don't 

Re: [libvirt] [RFC PATCH v2 REBASE 17/18] docs: Document the new hostdev and address type 'mdev'

2017-02-22 Thread Kirti Wankhede


On 2/20/2017 7:58 PM, Erik Skultety wrote:
> Signed-off-by: Erik Skultety 
> ---
>  docs/formatdomain.html.in | 48 
> +++
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index b69bd4c..13cb767 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -3277,8 +3277,20 @@
>  attributes: iobase and irq.
>  Since 1.2.1
>
> +  mdev
> +  Mediated devices' addresses have so far only one mandatory 
> attribute
> +uuid (since 3.1.0) which
> +uniquely identifies a mediated device under the syfs file system.
> +  
>  
>  
> +
> +Note: Due to nature of mediated devices, being only software devices
> +defining an allocation of resources on the physical parent device, 
> the
> +address type mdev is supposed to be used to identify a
> +device on the host only, rather than identifying it in the guest.
> +
> +
>  Controllers
>  
>  
> @@ -3774,6 +3786,19 @@
>/devices
>...
>  
> +or:
> +
> +
> +  ...
> +  devices
> +hostdev mode='subsystem' type='mdev' model='vfio-pci'
> +source
> +  address type='mdev' uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'
> +/source
> +/hostdev
> +  /devices
> +  ...
> +

Hi Erik,

Changes looks good so far. I did some basic testing with these patches.
I used above example snippet of hostdev element and it throws below
error. Add a '/' before '' and after uuid in address line.

error: (domain_definition):74: Opening and ending tag mismatch: address
line 73 and source
  
---^
Failed. Try again? [y,n,i,f,?]:

Thanks,
Kirti

>  
>hostdev
>The hostdev element is the main container for 
> describing
> @@ -3818,12 +3843,23 @@
>  type passes all LUNs presented by a single HBA to
>  the guest.
>
> +  mdev
> +  For mediated devices (Since 3.1.0)
> +  the model attribute specifies the device API which
> +  determines how the host's vfio driver will expose the device to the
> +  guest. Currently, only vfio-pci model is supported.
> +  The model also has implications on the guest's address type, i.e.
> +  for vfio-pci device API any address type other than 
> PCI
> +  will result in an error.
> +  
>  
>  
> -  Note: The managed attribute is only used with PCI 
> devices
> -  and is ignored by all the other device types, thus setting
> -  managed explicitly with other than PCI device has the 
> same
> -  effect as omitting it.
> +  Note: The managed attribute is only used with PCI and 
> is
> +  ignored by all the other device types, thus setting
> +  managed explicitly with other than a PCI device has 
> the
> +  same effect as omitting it. Similarly, model 
> attribute is
> +  only supported by mediated devices and ignored by all other device
> +  types.
>  
>
>source
> @@ -3888,6 +3924,10 @@
>  is the vhost_scsi wwpn (16 hexadecimal digits with a prefix of
>  "naa.") established in the host configfs.
>
> +  mdev
> +  Mediated devices (Since 3.1.0) are
> +  described by the address element.
> +  
>  
>
>vendor, product
> 

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-10-06 Thread Kirti Wankhede
Ping..

Pulling the questions at the top.

>> Will libvirt report 'description' RO attribute, its output would be
>> string, so that user could be able to see the configuration of that
>> profile?
>>
>
> Daniel,
> Waiting for your input on this.
>
>> We can have 'class' as optional attribute. So Intel don't have to
>> provide 'class' attribute and they don't have to specify mandatory
>> attributes of that class. We would provide 'class' attribute and provide
>> mandatory attributes.
>

Thanks,
Kirti


On 10/3/2016 1:50 PM, Kirti Wankhede wrote:
> 
> 
> On 9/30/2016 10:49 AM, Kirti Wankhede wrote:
>>
> ...
> 
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> Here you are proposing to add a class named "gpu", which will make all 
>>>>>>> those gpu
>>>>>>> related attributes mandatory, which libvirt can allow user to better
>>>>>>> parse/present a particular mdev configuration?
>>>>>>>
>>>>>>> I am just wondering if there is another option that we just make all 
>>>>>>> those
>>>>>>> attributes that a mdev device can have as optional but still meaningful 
>>>>>>> to
>>>>>>> libvirt, so libvirt can still parse / recognize them as an class "mdev".
>>>>>>
>>>>>> 'mdev' isn't a class - mdev is the name of the kernel module. The class
>>>>>> refers to the broad capability of the device. class would be things
>>>>>> like "gpu", "nic", "fpga" or other such things. The point of the class
>>>>>> is to identify which other attributes will be considered mandatory.
>>>>>>
>>>>>>
>>>>>
>>>>> Thanks Daniel. This class definition makes sense to me.
>>>>>
>>>>> However I'm not sure whether we should define such common mandatory 
>>>>> attributes
>>>>> of a 'gpu' class now. Intel will go with a 2's power sharing of type 
>>>>> definition... actual
>>>>> type name to be finalized, but an example looks like below:
>>>>>
>>>>> [GVTG-SKL-x2]: available instances (2)
>>>>> [GVTG-SKL-x4]: available instances (4)
>>>>> [GVTG-SKL-x8]: available instances (8)
>>>>> ...
>>>>>
>>>>> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 
>>>>> type
>>>>> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type 
>>>>> will
>>>>> get a quarter. However it's unclear to me how we want to enumerate those
>>>>> resources into resolution or heads. I feel it'd be more reasonable for us 
>>>>> to push
>>>>> initial libvirt mdev support w/o vgpu specific class definition, until we 
>>>>> see
>>>>> a clear value of doing so (at that time we then follow Daniel's guideline 
>>>>> to define
>>>>> mandatory attributes common to all GPU vendors).
>>>>
>>>> Libvirt won't report arbitrary vendor define attributes. So if we are not
>>>> going to define a gpu class & associated attributes, then there will be
>>>> no reporting of the 'heads', 'resolution', 'fb_length' data described
>>>> above.
>>>>
>>>
>>> yes, that's my point. I think nvidia may put them into the 'description' 
>>> attribute
>>> just for descriptive purpose for now.
>>
>>
>> Will libvirt report 'description' RO attribute, its output would be
>> string, so that user could be able to see the configuration of that
>> profile?
>>
> 
> Daniel,
> Waiting for your input on this.
> 
>> We can have 'class' as optional attribute. So Intel don't have to
>> provide 'class' attribute and they don't have to specify mandatory
>> attributes of that class. We would provide 'class' attribute and provide
>> mandatory attributes.
> 
> 
> Thanks,
> Kirti
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-10-03 Thread Kirti Wankhede


On 9/30/2016 10:49 AM, Kirti Wankhede wrote:
> 
...

>>>>>> Hi Daniel,
>>>>>>
>>>>>> Here you are proposing to add a class named "gpu", which will make all 
>>>>>> those gpu
>>>>>> related attributes mandatory, which libvirt can allow user to better
>>>>>> parse/present a particular mdev configuration?
>>>>>>
>>>>>> I am just wondering if there is another option that we just make all 
>>>>>> those
>>>>>> attributes that a mdev device can have as optional but still meaningful 
>>>>>> to
>>>>>> libvirt, so libvirt can still parse / recognize them as an class "mdev".
>>>>>
>>>>> 'mdev' isn't a class - mdev is the name of the kernel module. The class
>>>>> refers to the broad capability of the device. class would be things
>>>>> like "gpu", "nic", "fpga" or other such things. The point of the class
>>>>> is to identify which other attributes will be considered mandatory.
>>>>>
>>>>>
>>>>
>>>> Thanks Daniel. This class definition makes sense to me.
>>>>
>>>> However I'm not sure whether we should define such common mandatory 
>>>> attributes
>>>> of a 'gpu' class now. Intel will go with a 2's power sharing of type 
>>>> definition... actual
>>>> type name to be finalized, but an example looks like below:
>>>>
>>>> [GVTG-SKL-x2]: available instances (2)
>>>> [GVTG-SKL-x4]: available instances (4)
>>>> [GVTG-SKL-x8]: available instances (8)
>>>> ...
>>>>
>>>> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
>>>> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type 
>>>> will
>>>> get a quarter. However it's unclear to me how we want to enumerate those
>>>> resources into resolution or heads. I feel it'd be more reasonable for us 
>>>> to push
>>>> initial libvirt mdev support w/o vgpu specific class definition, until we 
>>>> see
>>>> a clear value of doing so (at that time we then follow Daniel's guideline 
>>>> to define
>>>> mandatory attributes common to all GPU vendors).
>>>
>>> Libvirt won't report arbitrary vendor define attributes. So if we are not
>>> going to define a gpu class & associated attributes, then there will be
>>> no reporting of the 'heads', 'resolution', 'fb_length' data described
>>> above.
>>>
>>
>> yes, that's my point. I think nvidia may put them into the 'description' 
>> attribute
>> just for descriptive purpose for now.
> 
> 
> Will libvirt report 'description' RO attribute, its output would be
> string, so that user could be able to see the configuration of that
> profile?
>

Daniel,
Waiting for your input on this.

> We can have 'class' as optional attribute. So Intel don't have to
> provide 'class' attribute and they don't have to specify mandatory
> attributes of that class. We would provide 'class' attribute and provide
> mandatory attributes.


Thanks,
Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-09-29 Thread Kirti Wankhede


On 9/29/2016 8:12 PM, Tian, Kevin wrote:
>> From: Daniel P. Berrange [mailto:berra...@redhat.com]
>> Sent: Thursday, September 29, 2016 10:39 PM
>>
>> On Thu, Sep 29, 2016 at 02:35:48PM +, Tian, Kevin wrote:
>>>> From: Daniel P. Berrange [mailto:berra...@redhat.com]
>>>> Sent: Thursday, September 29, 2016 4:06 PM
>>>>
>>>> On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
>>>>> On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
>>>>>> On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
>>>>>>>
>>>>>>> Hi libvirt experts,
>>>>>>>
>>>>>>> Thanks for valuable input on v1 version of RFC.
>>>>>>>
>>>>>>> Quick brief, VFIO based mediated device framework provides a way to
>>>>>>> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
>>>>>>> and IBM's channel IO. This framework reuses VFIO APIs for all the
>>>>>>> functionalities for mediated devices which are currently being used for
>>>>>>> pass through devices. This framework introduces a set of new sysfs files
>>>>>>> for device creation and its life cycle management.
>>>>>>>
>>>>>>> Here is the summary of discussion on v1:
>>>>>>> 1. Discover mediated device:
>>>>>>> As part of physical device initialization process, vendor driver will
>>>>>>> register their physical devices, which will be used to create virtual
>>>>>>> device (mediated device, aka mdev) to the mediated framework.
>>>>>>>
>>>>>>> Vendor driver should specify mdev_supported_types in directory format.
>>>>>>> This format is class based, for example, display class directory format
>>>>>>> should be as below. We need to define such set for each class of devices
>>>>>>> which would be supported by mediated device framework.
>>>>>>>
>>>>>>>  --- mdev_destroy
>>>>>>>  --- mdev_supported_types
>>>>>>>  |-- 11
>>>>>>>  |   |-- create
>>>>>>>  |   |-- name
>>>>>>>  |   |-- fb_length
>>>>>>>  |   |-- resolution
>>>>>>>  |   |-- heads
>>>>>>>  |   |-- max_instances
>>>>>>>  |   |-- params
>>>>>>>  |   |-- requires_group
>>>>>>>  |-- 12
>>>>>>>  |   |-- create
>>>>>>>  |   |-- name
>>>>>>>  |   |-- fb_length
>>>>>>>  |   |-- resolution
>>>>>>>  |   |-- heads
>>>>>>>  |   |-- max_instances
>>>>>>>  |   |-- params
>>>>>>>  |   |-- requires_group
>>>>>>>  |-- 13
>>>>>>>  |-- create
>>>>>>>  |-- name
>>>>>>>  |-- fb_length
>>>>>>>  |-- resolution
>>>>>>>  |-- heads
>>>>>>>  |-- max_instances
>>>>>>>  |-- params
>>>>>>>  |-- requires_group
>>>>>>>
>>>>>>>
>>>>>>> In the above example directory '11' represents a type id of mdev device.
>>>>>>> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
>>>>>>> 'requires_group' would be Read-Only files that vendor would provide to
>>>>>>> describe about that type.
>>>>>>>
>>>>>>> 'create':
>>>>>>> Write-only file. Mandatory.
>>>>>>> Accepts string to create mediated device.
>>>>>>>
>>>>>>> 'name':
>>>>>>> Read-Only file. Mandatory.
>>>>>>> Returns string, the name of that type id.
>>>>>>
>>>>>> Presumably this is a human-targetted title/description of
>>>>>> the device.
>>>>>>
>>>>>>>
>>>>>>> 'fb_length':
>>>>>>> Read-only file. Mandatory.
>>>>>>> Returns {K,M,G}, size of framebuffer.
>>>>>>>
>&g

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-29 Thread Kirti Wankhede


 7. Hot-plug

 It is same syntax to create a virtual device for hot-plug.  
>>>
>>> How do groups work with hotplug?  Can a device be creating into an
>>> existing, running group?  Can a device be removed from an existing,
>>> running group?  Thanks,
>>>   
>>
>> Group is for vendor driver to decide when to commit resources for a
>> group of devices to support multiple devices in a domain. It will be
>> upto vendor driver how it manage it.
>
> Then it's not sufficiently specified for libvirt to use.  Thanks,
> 

 Can you clarify what if required for libvirt to use?  
>>>
>>> Everything we're discussing above.  We can't simply say that group
>>> management is defined by the vendor driver, that means that libvirt
>>> needs to know how to uniquely behave for each vendor driver.  We want
>>> to create a consistent framework where libvirt doesn't care what the
>>> vendor driver is.  Therefore we need to make group manipulation fully
>>> specified, regardless of the vendor driver.  Thanks,
>>>  
>>
>> There is no communication between different vendor drivers. So if
>> libvirt creates a group with one device from one vendor and one device
>> from another vendor, both vendor driver would think they have a single
>> device and accordingly commit resources for that single device from its
>> own driver. Ideally nothing would fail. But that is not the main aim of
>> grouping, right?
>>
>> To make this independent of vendor driver, we had initial proposal of
>> having same UUID and different instance numbers. But that was also not
>> acceptable.
> 
> We're defining the infrastructure, we get to define the rules, but we
> need to define all the rules so that libvirt can actually implement to
> them.  We can't say "groups are this nebulous thing and each vendor
> will manage them however they want", nobody wants to care that NVIDIA
> did it one way and Intel did it another and we need to support both.
> That's a design failure.  Therefore we \must\ make the group definition
> we need well specified, any hopefully generic enough that others can
> use it.  If they cannot then we'd need to add a new type of group.
> Therefore we need to \specify\ things like for what set of mdev should
> a group be created?  Do I need a group for a single device?  How do we
> introspect a group to know which mdevs it contains?  Can we dynamically
> add or remove mdev devices from a group?  Can we attach multiple
> groups to the same user?  Can we change the association of an mdev
> device without removing it and recreating it? etc, etc, etc.  We don't
> need vendor coordination, we need a specification of exactly how this
> type of group is meant to work. Thanks,
> 

We decided to handle "multiple vGPUs per VM" through internal design
changes. With that we don't need grouping in mdev framework. Hope this
helps to accelerate and settle on sysfs interface soon.

Thanks,
Kirti



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] summary of current vfio mdev upstreaming status

2016-09-29 Thread Kirti Wankhede


On 9/29/2016 3:16 PM, Xiao Guangrong wrote:
> 
> 
> On 09/29/2016 05:36 PM, Neo Jia wrote:
>> On Thu, Sep 29, 2016 at 05:05:47PM +0800, Xiao Guangrong wrote:
>>>
>>>
>>> On 09/29/2016 04:55 PM, Jike Song wrote:
 Hi all,

 In order to have a clear understanding about the VFIO mdev upstreaming
 status, I'd like to summarize it. Please share your opinions on this,
 and correct my misunderstandings.

 The whole vfio mdev series can be logically divided into several parts,
 they work together to provide the mdev support.
>>>
>>> I think what Jike want to suggest is how about partially push/develop
>>> the
>>> mdev. As jike listed, there are some parts can be independent and
>>> they have
>>> mostly been agreed.
>>>
>>> Such development plan can make the discussion be much efficient in the
>>> community. Also it make the possibility that Intel, Nvdia, IBM can focus
>>> on different parts and co-develop it.
>>
>> Hi Guangrong,
>>
>> JFYI. we are preparing v8 patches to accommodate most comments we have
>> discussed
>> so far and we will also include several things that we have decided on
>> sysfs.
>>
>> I definitely would like to see more interactive discussions especially
>> on the
>> sysfs class front from intel folks.
>>
>> Regarding the patch development and given the current status,
>> especially where
>> we are and what we have been through, I am very confident that we
>> should be able
>> to fully handle this ourselves, but thanks for offering help anyway!
>>
>> We should be able to react as fast as possible based on the public
>> mailing list
>> discussions, so again I don't think that part is an issue.
> 
> We understand the progress goes okay. However, splitting such big patchset
> into small parts is a better way to push large code to upstream - it is
> better for review and validation and we can quickly iterate the code if
> there are new issues exposed during the review of new version.
> 
> Particularly, based on the current situation that the sysfs interfaces are
> far from the way to be decided, it is definitely a good idea to push the
> basic infrastructure first, later let's focal on the ABI part - sysfs
> interface design.
> 

Yes, we will have v8 series soon out for review even if sysfs interface
discussion is not settled down to get other pieces reviewed and tested.
In this patch set, we will have basic set of sysfs interface which we
all have agreed on until now like 'create' and 'remove'.

Thanks,
Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] summary of current vfio mdev upstreaming status

2016-09-29 Thread Kirti Wankhede


On 9/29/2016 2:47 PM, Neo Jia wrote:
> On Thu, Sep 29, 2016 at 04:55:39PM +0800, Jike Song wrote:
>> Hi all,
>>
>> In order to have a clear understanding about the VFIO mdev upstreaming
>> status, I'd like to summarize it. Please share your opinions on this,
>> and correct my misunderstandings.
>>
>> The whole vfio mdev series can be logically divided into several parts,
>> they work together to provide the mdev support.
> 

Thanks Jike for summarizing. We already have separate patch for each of
these logical parts. I had maintained patch sequence in incremental
depending order.

> Hi Jike,
> 
> Thanks for summarizing this, but I will defer to Kirti to comment on the 
> actual
> upstream status of her patches, couples things to note though:
> 
> 1) iommu type1 patches have been extensively reviewed by Alex already and we
> have one action item left to implement which is already queued up in v8 
> patchset.
> 

That's right Neo.

> 2) regarding the sysfs interface and libvirt discussion, I would like to hear
> what kind of attributes Intel folks are having so far as Daniel is
> asking about adding a class "gpu" which will pull several attributes as 
> mandatory.
> 
> Thanks,
> Neo
> 
>>
>>
>>
>> PART 1: mdev core driver
>>
>>  [task]
>>  -   the mdev bus/device support
>>  -   the utilities of mdev lifecycle management
>>  -   the physical device register/unregister interfaces
>>
>>  [status]
>>  -   basically agreed by community
>>
>>
>> PART 2: vfio bus driver for mdev
>>
>>  [task]
>>  -   interfaces with vendor drivers
>>  -   the vfio bus implementation
>>
>>  [status]
>>
>>  -   basically agreed by community
>>

I'm working on v8 version for above patches based on previous discussions.

>>
>> PART 3: iommu support for mdev
>>
>>  [task]
>>  -   iommu support for mdev
>>
>>  [status]
>>  -   Kirti's v7 implementation, not yet fully reviewed
>>
>>
>> PART 4: sysfs interfaces for mdev
>>
>>  [task]
>>  -   define the hierarchy of minimal sysfs directories/files
>>  -   check the validity from vendor drivers, init/de-init 
>> them
>>  [status]
>>  -   interfaces are in discussion
>>
>>

>From coding perspective, this is part of mdev core module. I think we
can't completely separate this part from mdev core module while coding
it. Yes, this interface is still in discussion and we need to settle
down on that soon.

>> PART 6: Documentation
>>
>>  [task]
>>  -   clearly document the architecture and interfaces
>>  -   coding example for vendor drivers
>>
>>  [status]
>>  -   N/A
>>

I had tried to maintain the document as per changes going on in above
patches from v6 onward and will continue to update it for each version
accordingly.

I had sent out patch with sample driver few hours back wrt v7 patchset.
And henceforth I'll keep on updating sample driver as per changes in
mdev modules and add it in my patch series.

Thanks,
Kirti

>>
>> What I'm curious here is 'PART 4', which is needed by other parts to
>> perform further steps, is it possible to accelerate the process somehow? :-)
>>
>>
>> --
>> Thanks,
>> Jike
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/1] Add simple sample driver for mediated device framework

2016-09-29 Thread Kirti Wankhede
Sample driver creates mdev device that simulates serial port over PCI card.

Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
Signed-off-by: Neo Jia <c...@nvidia.com>
Change-Id: I857f8f12f8b275f2498dfe8c628a5cdc7193b1b2
---
 Documentation/mdev/Makefile   |   14 +
 Documentation/mdev/mtty.c | 1202 +
 Documentation/{ => mdev}/vfio-mediated-device.txt |   61 ++
 3 files changed, 1277 insertions(+)
 create mode 100644 Documentation/mdev/Makefile
 create mode 100644 Documentation/mdev/mtty.c
 rename Documentation/{ => mdev}/vfio-mediated-device.txt (78%)

diff --git a/Documentation/mdev/Makefile b/Documentation/mdev/Makefile
new file mode 100644
index ..ff6f8a324c85
--- /dev/null
+++ b/Documentation/mdev/Makefile
@@ -0,0 +1,14 @@
+#
+# Makefile for mtty.c file
+#
+KDIR:=/lib/modules/$(shell uname -r)/build
+
+obj-m:=mtty.o
+
+default:
+   $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+
+clean:
+   @rm -rf .*.cmd *.mod.c *.o *.ko .tmp*
+   @rm -rf Module.* Modules.* modules.* .tmp_versions
+
diff --git a/Documentation/mdev/mtty.c b/Documentation/mdev/mtty.c
new file mode 100644
index ..ce29d54b4275
--- /dev/null
+++ b/Documentation/mdev/mtty.c
@@ -0,0 +1,1202 @@
+/*
+ * Mediated virtual PCI serial host device driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia <c...@nvidia.com>
+ *     Kirti Wankhede <kwankh...@nvidia.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Sample driver that creates mdev device that simulates serial port over PCI
+ * card.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+/*
+ * #defines
+ */
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+
+#define MTTY_CLASS_NAME "mtty"
+
+#define MTTY_NAME   "mtty"
+
+#define MTTY_CONFIG_SPACE_SIZE  0xff
+#define MTTY_IO_BAR_SIZE0x8
+#define MTTY_MMIO_BAR_SIZE  0x10
+
+#define STORE_LE16(addr, val)   (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)   (*(u32 *)addr = val)
+
+#define MAX_FIFO_SIZE   16
+
+#define CIRCULAR_BUF_INC_IDX(idx)(idx = (idx + 1) & (MAX_FIFO_SIZE - 1))
+
+#define MTTY_VFIO_PCI_OFFSET_SHIFT   40
+
+#define MTTY_VFIO_PCI_OFFSET_TO_INDEX(off)   (off >> 
MTTY_VFIO_PCI_OFFSET_SHIFT)
+#define MTTY_VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << 
MTTY_VFIO_PCI_OFFSET_SHIFT)
+#define MTTY_VFIO_PCI_OFFSET_MASK(((u64)(1) << MTTY_VFIO_PCI_OFFSET_SHIFT) 
- 1)
+
+
+/*
+ * Global Structures
+ */
+
+struct mtty_dev {
+dev_t   vd_devt;
+struct class*vd_class;
+struct cdev vd_cdev;
+struct idr  vd_idr;
+struct device   dev;
+} mtty_dev;
+
+struct mdev_region_info {
+u64 start;
+u64 phys_start;
+u32 size;
+u64 vfio_offset;
+};
+
+#if defined(DEBUG_REGS)
+const char *wr_reg[] = {
+"TX",
+"IER",
+"FCR",
+"LCR",
+"MCR",
+"LSR",
+"MSR",
+"SCR"
+};
+
+const char *rd_reg[] = {
+"RX",
+"IER",
+"IIR",
+"LCR",
+"MCR",
+"LSR",
+"MSR",
+"SCR"
+};
+#endif
+
+// loop back buffer
+struct rxtx {
+u8 fifo[MAX_FIFO_SIZE];
+u8 head, tail;
+u8 count;
+};
+
+struct serial_port {
+u8 uart_reg[8]; /* 8 registers */
+struct rxtx rxtx;   /* loop back buffer */
+bool dlab;
+bool overrun;
+u16 divisor;
+u8 fcr; /* FIFO control register */
+u8 max_fifo_size;
+u8 intr_trigger_level;  /* interrupt trigger level */
+};
+
+/* State of each mdev device */
+struct mdev_state {
+int irq_fd;
+struct file *intx_file;
+struct file *msi_file;
+int irq_index;
+u8 *vconfig;
+struct mutex ops_lock;
+struct mdev_device *mdev;
+struct mdev_region_info region_info[VFIO_PCI_NUM_REGIONS];
+u32 bar_mask[VFIO_PCI_NUM_REGIONS];
+struct list_head next;
+struct serial_port s[2];
+struct mutex rxtx_lock;
+};
+
+struct mutex mdev_list_lock;
+struct list_head mdev_devices_list;
+
+static struct file_operations vd_fops = {
+.owner  = THIS_MODULE,
+};
+
+/* function prototypes */
+
+static int mtty_dev_mdev_trigger_interrupt(uuid_le uuid);
+
+/* Helper functions */
+static struct mdev_state *find_mdev_state_by_uuid(uuid_le uuid)
+{
+struct mdev_state *mds;
+
+list_for_each_entry(mds, _devices_list, next) {
+if (uuid_l

[libvirt] [PATCH 0/1] Sample driver for mediated device framework

2016-09-29 Thread Kirti Wankhede
This sample driver is with reference to v7 patch of Mediated device support:
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03798.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03799.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03800.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03811.html
https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03802.html

We are working on v8 version of this patchset. This sample is to get the feel of
how mediated device framework works.

This is a simple sample driver that creates device that simulates serial
port over PCI card. Updated vfio-mediated-device.txt with details of how it
works with mdev framework.

We will have v8 version of mediated device framework soon and this sample driver
will get updated accordingly. But this will enable people/public to get more
insights of this framework and how it is supposed to work.

Kirti Wankhede (1):
  Add simple sample driver for mediated device framework

 Documentation/mdev/Makefile   |   14 +
 Documentation/mdev/mtty.c | 1202 +
 Documentation/{ => mdev}/vfio-mediated-device.txt |   61 ++
 3 files changed, 1277 insertions(+)
 create mode 100644 Documentation/mdev/Makefile
 create mode 100644 Documentation/mdev/mtty.c
 rename Documentation/{ => mdev}/vfio-mediated-device.txt (78%)

-- 
2.7.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-23 Thread Kirti Wankhede


On 9/23/2016 12:55 AM, Tian, Kevin wrote:
>> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
>> Sent: Wednesday, September 21, 2016 12:23 AM
>>>
>>>>>  I have
>>>>> a hard time believing that a given vendor can even allocate unique type
>>>>> ids for their own devices.  Unique type id across vendors is not
>>>>> practical.  So which attribute are we actually using to identify which
>>>>> type of mdev device we need and why would we ever specify additional
>>>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>>>> M60-0B" has a fixed setup of those attributes?
>>>>>
>>>>
>>>> Specifying attributes here is not our requirement. Yes we have fixed set
>>>> of attributes for "GRID-M60-0B" and on.
>>>> We are defining the attributed here for "display" class for all other
>>>> vendor of gpu can use.
>>>>
> 
> Hi, Kirti, 
> 
> We decide to go with above type-based interface for KVMGT, with fixed setup 
> of attributes too. If both Intel and NVIDIA solutions use such fixed manner,
> can we go with a proposal which doesn't include 'class' extension for now?
> Later if there is a concrete usage requiring such class-specific attribute 
> setting,
> then it's easy to extend following discussion in this thread. I'm thinking 
> how we
> can converge the discussion here into something simple enough (and extensible)
> to accelerate upstreaming of both Intel/NVIDIA solutions...
> 

Hi Kevin,

We have fixed set of attributes which are GPU/graphics specific, like
framebuffer_length, resolution, number of heads, max supported instances.
If you are going with fixed set of attributes, how are the vGPU types on
KVMGT looks like from attributes point of view? attributes are graphics
specific attributes?

Thanks,
Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-21 Thread Kirti Wankhede



> My concern is that a type id seems arbitrary but we're specifying that
> it be unique.  We already have something unique, the name.  So why try
> to make the type id unique as well?  A vendor can accidentally create
> their vendor driver so that a given name means something very
> specific.  On the other hand they need to be extremely deliberate to
> coordinate that a type id means a unique thing across all their product
> lines.
> 

 Let me clarify, type id should be unique in the list of
 mdev_supported_types. You can't have 2 directories in with same name.  
>>>
>>> Of course, but does that mean it's only unique to the machine I'm
>>> currently running on?  Let's say I have a Tesla P100 on my system and
>>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
>>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
>>> new card still going to be a "GRID-M60-0B"?  If not then we've based
>>> our XML on the wrong attribute.  If the new device does not support
>>> "GRID-M60-0B" then we should generate an error, not simply initialize
>>> whatever type-id 11 happens to be on this new card.
>>>   
>>
>> If there are 2 M60 in the system then you would find '11' type directory
>> in mdev_supported_types of both M60. If you have P100, '11' type would
>> not be there in its mdev_supported_types, it will have different types.
>>
>> For example, if you replace M60 with P100, but XML is not updated. XML
>> have type '11'. When libvirt would try to create mdev device, libvirt
>> would have to find 'create' file in sysfs in following directory format:
>>
>>  --- mdev_supported_types
>>  |-- 11
>>  |   |-- create
>>
>> but now for P100, '11' directory is not there, so libvirt should throw
>> error on not able to find '11' directory.
> 
> This really seems like an accident waiting to happen.  What happens
> when the user replaces their M60 with an Intel XYZ device that happens
> to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> know that the XML used to refer to a GRID-M60-0B and now it's an
> INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> yet another arbitrary requirement that we have some sort of globally
> unique type-id database make a lot of sense?  The same issue applies
> for simple debug-ability, if I'm reviewing the XML for a domain and the
> name is the primary index for the mdev device, I know what it is.
> Seeing type-id='11' is meaningless.
>

Let me clarify again, type '11' is a string that vendor driver would
define (see my previous reply below) it could be "11" or "GRID-M60-0B".
If 2 vendors used same string we can't control that. right?


 Lets remove 'id' from type id in XML if that is the concern. Supported
 types is going to be defined by vendor driver, so let vendor driver
 decide what to use for directory name and same should be used in device
 xml file, it could be '11' or "GRID M60-0B":

 
   my-vgpu
   pci__86_00_0
   
 

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-21 Thread Kirti Wankhede


On 9/20/2016 10:20 PM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 21:53:16 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/20/2016 8:13 PM, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 19:51:58 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 9/20/2016 3:06 AM, Alex Williamson wrote:  
>>>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>> 
>>>>>> Hi libvirt experts,
>>>>>>
>>>>>>
>>>>>> 'create':
>>>>>> Write-only file. Mandatory.
>>>>>> Accepts string to create mediated device.
>>>>>>
>>>>>> 'name':
>>>>>> Read-Only file. Mandatory.
>>>>>> Returns string, the name of that type id.
>>>>>>
>>>>>> 'fb_length':
>>>>>> Read-only file. Mandatory.
>>>>>> Returns {K,M,G}, size of framebuffer.
>>>>>
>>>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>>>> just one user of mediated devices.
>>>>> 
>>>>
>>>> As per our discussion in BOF, we would define class and its attributes.
>>>> As I mentioned above these are attributes of "display" class.  
>>>
>>> Just as I didn't know here, how does libvirt know the class of a given
>>> type id?
>>>
>>
>> Yes, we need some way to identify the class as Daniel mentioned in his
>> suggestion. Add a file 'class' in mdev_supported_types that would return
>> the class.
>>
>>
>>>>>  Can all parameters be changed dynamically?
>>>>
>>>> Parameters here would be extra parameters which are not listed above in
>>>> mandatory list. If such parameters are required to set, these parameters
>>>> should be set per mdev device before VM is booted.
>>>>  
>>>>>  Do parameter changes apply to existing devices or only future devices?   
>>>>>  
>>>>
>>>> No, it should be applied at the time when mdev device is created or
>>>> after mdev device is created but before VM is booted. It will not be
>>>> applicable to existing devices.
>>>>  
>>>>>  This is a poorly specified
>>>>> interface.  I'd prefer this be done via module options on the vendor
>>>>> driver.
>>>>> 
>>>>
>>>> Module option is not suitable here, in that case such parameters would
>>>> be applied to all mdev device created for vendor driver and that is not
>>>> what we want to.  
>>>
>>> Then use attributes on the mdev device itself, this is not an
>>> acceptable interface.
>>>   
>>>>>>
>>>>>> With above example, vGPU device XML would look like:
>>>>>>
>>>>>>
>>>>>>  my-vgpu
>>>>>>  pci__86_00_0
>>>>>>  
>>>>>>
>>>>>>1
>>>>>>'frame_rate_limiter=0'
>>>>>>  
>>>>>>
>>>>>>
>>>>>> 'type id' is mandatory.
>>>>>
>>>>> I was under the impression that the vendor supplied "name" was the one
>>>>> unique identifier.  We're certainly not going to create a registrar to
>>>>> hand out type ids to each vendor, so what makes type id unique?
>>>>
>>>> Type id is unique, 'name' is the name (string) of device that would be
>>>> displayed in guest OS for that type of mdev device.  
>>>
>>> We were told at the BoF that name was the unique string which would be
>>> consistent and you again indicate below that "GRID-M60-0B" has a fixed
>>> set of attributes, so why do we care what type id is associated with
>>> that name?
>>>
>>>>>  I have
>>>>> a hard time believing that a given vendor can even allocate unique type
>>>>> ids for their own devices.  Unique type id across vendors is not
>>>>> practical.  So which attribute are we actually using to identify which
>>>>> type of mdev device we need and why would we ever specify additional
>>>>> attributes like fb_length?  Does

Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-09-20 Thread Kirti Wankhede


On 9/20/2016 10:06 PM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 10:01:18PM +0530, Kirti Wankhede wrote:
>>
>>
>> On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
>>> On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 20/09/2016 16:58, Daniel P. Berrange wrote:
>>>>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
>>>>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
>>>>>>> that appears in the XML must be *fully* specified and explicitly
>>>>>>> represented in the XML  as a distinct attribute or element.
>>>>>>
>>>>>> Are generic key/value attributes (e.g. a  element) acceptable?
>>>>>
>>>>> Only if libvirt has a known list of valid attribute key names upfront.
>>>>> We don't want to just blindly expose arbitary vendor specific keys exposed
>>>>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
>>>>> portable
>>>>
>>
>> In key/value attributes (taking example from proposed xml file)
>>
>> 2560x1600
>>
>> 'Key' (i.e. 'resolution') should be known upfront, not the value, right?
> 
> Yes, the actual value is not important - only its structured.
> ie, libvirt would check that it is in the format '$WIDTHx$HEIGHT'
> and reject it if not.
> 

In this particular example, libvirt checks if its integer? or value
could be 2560x1600 or 4096x4096 both are valid?

Does libvirt accept string value?

Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration

2016-09-20 Thread Kirti Wankhede


On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 20/09/2016 16:58, Daniel P. Berrange wrote:
> As I've said in my earlier reply - libvirt will *NOT* support passing
> arbitrary vendor specific parameters as a blob via the XML. Everything
> that appears in the XML must be *fully* specified and explicitly
> represented in the XML  as a distinct attribute or element.

 Are generic key/value attributes (e.g. a  element) acceptable?
>>>
>>> Only if libvirt has a known list of valid attribute key names upfront.
>>> We don't want to just blindly expose arbitary vendor specific keys exposed
>>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
>>> portable
>>

In key/value attributes (taking example from proposed xml file)

2560x1600

'Key' (i.e. 'resolution') should be known upfront, not the value, right?

Kirti


>> Ok, then I guess vendor-specific mdev parameters are out completely.  Or
>> could they be added under a separate namespace, like QEMU passthrough?
> 
> The QEMU XML namespace that allows passthrough is intended either as a
> short term hack to let people experiment with new QEMU features, before
> they get explicitly represented in the main XML namespace, or in a few
> cases, for things we never want to support officially.  Any VM which
> uses the separate namespace is tainted, which means if theres a bug
> report we'll require the reported to remove whatever config caused
> the tainting and then reproduce the problem.
> 
> If the vendor specific mdev parameters are things that apps will often
> need to be able to set, then allowing it via a separate namespace is
> just providing a backdoor that everyone needs to use, defeating the
> point of using a separate namespace.
> 
> We don't want to knowingly design a system which defacto requires
> the app to use the separate namespace most/all of the time.
> 
> 
> Regards,
> Daniel
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-20 Thread Kirti Wankhede


On 9/20/2016 8:13 PM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 19:51:58 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/20/2016 3:06 AM, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> Hi libvirt experts,
>>>>
>>>>
>>>> 'create':
>>>> Write-only file. Mandatory.
>>>> Accepts string to create mediated device.
>>>>
>>>> 'name':
>>>> Read-Only file. Mandatory.
>>>> Returns string, the name of that type id.
>>>>
>>>> 'fb_length':
>>>> Read-only file. Mandatory.
>>>> Returns {K,M,G}, size of framebuffer.  
>>>
>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>> just one user of mediated devices.
>>>   
>>
>> As per our discussion in BOF, we would define class and its attributes.
>> As I mentioned above these are attributes of "display" class.
> 
> Just as I didn't know here, how does libvirt know the class of a given
> type id?
>  

Yes, we need some way to identify the class as Daniel mentioned in his
suggestion. Add a file 'class' in mdev_supported_types that would return
the class.


>>>  Can all parameters be changed dynamically?  
>>
>> Parameters here would be extra parameters which are not listed above in
>> mandatory list. If such parameters are required to set, these parameters
>> should be set per mdev device before VM is booted.
>>
>>>  Do parameter changes apply to existing devices or only future devices?  
>>
>> No, it should be applied at the time when mdev device is created or
>> after mdev device is created but before VM is booted. It will not be
>> applicable to existing devices.
>>
>>>  This is a poorly specified
>>> interface.  I'd prefer this be done via module options on the vendor
>>> driver.
>>>   
>>
>> Module option is not suitable here, in that case such parameters would
>> be applied to all mdev device created for vendor driver and that is not
>> what we want to.
> 
> Then use attributes on the mdev device itself, this is not an
> acceptable interface.
> 
>>>>
>>>> With above example, vGPU device XML would look like:
>>>>
>>>>
>>>>  my-vgpu
>>>>  pci__86_00_0
>>>>  
>>>>
>>>>1
>>>>'frame_rate_limiter=0'
>>>>  
>>>>
>>>>
>>>> 'type id' is mandatory.  
>>>
>>> I was under the impression that the vendor supplied "name" was the one
>>> unique identifier.  We're certainly not going to create a registrar to
>>> hand out type ids to each vendor, so what makes type id unique?  
>>
>> Type id is unique, 'name' is the name (string) of device that would be
>> displayed in guest OS for that type of mdev device.
> 
> We were told at the BoF that name was the unique string which would be
> consistent and you again indicate below that "GRID-M60-0B" has a fixed
> set of attributes, so why do we care what type id is associated with
> that name?
>  
>>>  I have
>>> a hard time believing that a given vendor can even allocate unique type
>>> ids for their own devices.  Unique type id across vendors is not
>>> practical.  So which attribute are we actually using to identify which
>>> type of mdev device we need and why would we ever specify additional
>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>> M60-0B" has a fixed setup of those attributes?
>>>   
>>
>> Specifying attributes here is not our requirement. Yes we have fixed set
>> of attributes for "GRID-M60-0B" and on.
>> We are defining the attributed here for "display" class for all other
>> vendor of gpu can use.
>>
>>
>>>> 'group' is optional. It should be a unique number in the system among
>>>> all the groups created for mdev devices. Its usage is:
>>>>   - not needed if single vGPU device is being assigned to a domain.
>>>>   - only need to be set if multiple vGPUs need to be assigned to a
>>>> domain and vendor driver have 'requires_group' file in type id directory.
>>>>   - if type id directory include 'requires_group' and user tries to
>>>> assign multiple vGPUs to a domain without having  field in

Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-20 Thread Kirti Wankhede


On 9/20/2016 3:55 AM, Alex Williamson wrote:
> On Mon, 19 Sep 2016 23:50:56 +0200
> Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
>> On 19/09/2016 23:36, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:  
>>>> 'fb_length':
>>>> Read-only file. Mandatory.
>>>> Returns {K,M,G}, size of framebuffer.  
>>>
>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>> just one user of mediated devices. [...]
>>>   
>>>> 'params'
>>>> Write-Only file. Optional.
>>>> String input. Libvirt would pass the string given in XML file to
>>>> this file and then create mdev device. Set empty string to clear params.
>>>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
>>>> limiter for performance benchmarking, then create device of type 11. The
>>>> device created would have that parameter set by vendor driver.  
>>>
>>> Might as well just call it "magic", there's absolutely no ability to
>>> introspect what parameters are allowed or even valid here.  Can all
>>> parameters be changed dynamically?  Do parameter changes apply to
>>> existing devices or only future devices?  This is a poorly specified
>>> interface.  I'd prefer this be done via module options on the vendor
>>> driver.  
>>
>> Or additional files in the mdev's sysfs directory?
> 
> Sure, the vendor driver could certainly support that, it'd be vendor
> specific of course.
>

Right, it would be vendor specific. But if this is not set through mdev
device XML, user have to create mdev device with XML first and then
manually set such parameter. Also these parameters would not retain if
mdev device is destroyed and created again. That's why if libvirt sets
these parameters by reading it from XML file, it would be simple for user.

If we move 'params' file to mdev device, could libvirt do the following
to create mdev device:

* Read /sys/../\:86\:00.0/11/max_instances. If it is greater than 0,
then only proceed else fail.

* Autogenerate UUID
* Create device:

echo "$UUID" > /sys/../\:86\:00.0/11/create

* Set extra params if 'params' field exist in device XML and 'params'
file exist mdev device directory:

echo "frame_rate_limiter=0" > /sys/bus/mdev/devices/$UUID/params

Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-20 Thread Kirti Wankhede


On 9/20/2016 3:06 AM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 02:05:52 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> Hi libvirt experts,
>>
>> Thanks for valuable input on v1 version of RFC.
>>
>> Quick brief, VFIO based mediated device framework provides a way to
>> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
>> and IBM's channel IO. This framework reuses VFIO APIs for all the
>> functionalities for mediated devices which are currently being used for
>> pass through devices. This framework introduces a set of new sysfs files
>> for device creation and its life cycle management.
>>
>> Here is the summary of discussion on v1:
>> 1. Discover mediated device:
>> As part of physical device initialization process, vendor driver will
>> register their physical devices, which will be used to create virtual
>> device (mediated device, aka mdev) to the mediated framework.
>>
>> Vendor driver should specify mdev_supported_types in directory format.
>> This format is class based, for example, display class directory format
>> should be as below. We need to define such set for each class of devices
>> which would be supported by mediated device framework.
>>
>>  --- mdev_destroy
> 
> I thought we replaced mdev_destroy with a remove attribute for each
> mdev device.
> 
>>  --- mdev_supported_types
>>  |-- 11
>>  |   |-- create
>>  |   |-- name
>>  |   |-- fb_length
>>  |   |-- resolution
>>  |   |-- heads
>>  |   |-- max_instances
>>  |   |-- params
>>  |   |-- requires_group
>>  |-- 12
>>  |   |-- create
>>  |   |-- name
>>  |   |-- fb_length
>>  |   |-- resolution
>>  |   |-- heads
>>  |   |-- max_instances
>>  |   |-- params
>>  |   |-- requires_group
>>  |-- 13
>>  |-- create
>>  |-- name
>>  |-- fb_length
>>  |-- resolution
>>  |-- heads
>>  |-- max_instances
>>  |-- params
>>  |-- requires_group
>>
>>
>> In the above example directory '11' represents a type id of mdev device.
>> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
>> 'requires_group' would be Read-Only files that vendor would provide to
>> describe about that type.
>>
>> 'create':
>> Write-only file. Mandatory.
>> Accepts string to create mediated device.
>>
>> 'name':
>> Read-Only file. Mandatory.
>> Returns string, the name of that type id.
>>
>> 'fb_length':
>> Read-only file. Mandatory.
>> Returns {K,M,G}, size of framebuffer.
> 
> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> just one user of mediated devices.
> 

As per our discussion in BOF, we would define class and its attributes.
As I mentioned above these are attributes of "display" class.

>>
>> 'resolution':
>> Read-Only file. Mandatory.
>> Returns 'hres x vres' format. Maximum supported resolution.
> 
> Same.
> 
>>
>> 'heads':
>> Read-Only file. Mandatory.
>> Returns integer. Number of maximum heads supported.
> 
> Same.
> 
>>
>> 'max_instance':
>> Read-Only file. Mandatory.
>> Returns integer.  Returns maximum mdev device could be created
>> at the moment when this file is read. This count would be updated by
>> vendor driver. Before creating mdev device of this type, check if
>> max_instance is > 0.
> 
> Didn't we discuss this being being something like "available_instances"
> with a "devices" sub-directory linking current devices of that type?
>

I'm ok with 'available_instances' as name of this file.
Yes, mdev device directory will have links to devices of that type. I
think that is part of mdev core module discussion. I'm trying to list
sysfs files for libvirt interface here to focus more on libvirt
interface. Hence didn't add that. I'll make sure to add all such details
in future.


>> 'params'
>> Write-Only file. Optional.
>> String input. Libvirt would pass the string given in XML file to
>> this file and then create mdev device. Set empty string to clear params.
>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
>> limiter for performance benchmarking, then create device of type 11. The
>> device created would have that parameter set by vendor driver.
> 
> Might as well just call it "magic", there's absolutely no a

[libvirt] [RFC v2] libvirt vGPU QEMU integration

2016-09-19 Thread Kirti Wankhede

Hi libvirt experts,

Thanks for valuable input on v1 version of RFC.

Quick brief, VFIO based mediated device framework provides a way to
virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
and IBM's channel IO. This framework reuses VFIO APIs for all the
functionalities for mediated devices which are currently being used for
pass through devices. This framework introduces a set of new sysfs files
for device creation and its life cycle management.

Here is the summary of discussion on v1:
1. Discover mediated device:
As part of physical device initialization process, vendor driver will
register their physical devices, which will be used to create virtual
device (mediated device, aka mdev) to the mediated framework.

Vendor driver should specify mdev_supported_types in directory format.
This format is class based, for example, display class directory format
should be as below. We need to define such set for each class of devices
which would be supported by mediated device framework.

 --- mdev_destroy
 --- mdev_supported_types
 |-- 11
 |   |-- create
 |   |-- name
 |   |-- fb_length
 |   |-- resolution
 |   |-- heads
 |   |-- max_instances
 |   |-- params
 |   |-- requires_group
 |-- 12
 |   |-- create
 |   |-- name
 |   |-- fb_length
 |   |-- resolution
 |   |-- heads
 |   |-- max_instances
 |   |-- params
 |   |-- requires_group
 |-- 13
 |-- create
 |-- name
 |-- fb_length
 |-- resolution
 |-- heads
 |-- max_instances
 |-- params
 |-- requires_group


In the above example directory '11' represents a type id of mdev device.
'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
'requires_group' would be Read-Only files that vendor would provide to
describe about that type.

'create':
Write-only file. Mandatory.
Accepts string to create mediated device.

'name':
Read-Only file. Mandatory.
Returns string, the name of that type id.

'fb_length':
Read-only file. Mandatory.
Returns {K,M,G}, size of framebuffer.

'resolution':
Read-Only file. Mandatory.
Returns 'hres x vres' format. Maximum supported resolution.

'heads':
Read-Only file. Mandatory.
Returns integer. Number of maximum heads supported.

'max_instance':
Read-Only file. Mandatory.
Returns integer.  Returns maximum mdev device could be created
at the moment when this file is read. This count would be updated by
vendor driver. Before creating mdev device of this type, check if
max_instance is > 0.

'params'
Write-Only file. Optional.
String input. Libvirt would pass the string given in XML file to
this file and then create mdev device. Set empty string to clear params.
For example, set parameter 'frame_rate_limiter=0' to disable frame rate
limiter for performance benchmarking, then create device of type 11. The
device created would have that parameter set by vendor driver.

'requires_group'
Read-Only file. Optional.
This should be provided by vendor driver if vendor driver need to
group mdev devices in one domain so that vendor driver can use 'first
open' to commit resources of all mdev devices associated to that domain
and 'last close' to free those.

The parent device would look like:

   
 pci__86_00_0
 
   0
   134
   0
   0
   
 
 
   
   GRID M60-0B
   512M
   2560x1600
   2
   16
   1
 
   
   GRID M60
   NVIDIA
 
   

2. Create/destroy mediated device

With above example, vGPU device XML would look like:

   
 my-vgpu
 pci__86_00_0
 
   
   1
   'frame_rate_limiter=0'
 
   

'type id' is mandatory.
'group' is optional. It should be a unique number in the system among
all the groups created for mdev devices. Its usage is:
  - not needed if single vGPU device is being assigned to a domain.
  - only need to be set if multiple vGPUs need to be assigned to a
domain and vendor driver have 'requires_group' file in type id directory.
  - if type id directory include 'requires_group' and user tries to
assign multiple vGPUs to a domain without having  field in XML,
it will create single vGPU.

'params' is optional field. User should set this field if extra
parameters need to be set for a particular vGPU device. Libvirt don't
need to parse these params. These are meant for vendor driver.

Libvirt need to follow the sequence to create device:
* Read /sys/../\:86\:00.0/11/max_instances. If it is greater than 0,
then only proceed else fail.

* Set extra params if 'params' field exist in device XML and 'params'
file exist in type id directory

echo "frame_rate_limiter=0" > /sys/../\:86\:00.0/11/params

* Autogenerate UUID
* Create device:

echo "$UUID:" > /sys/../\:86\:00.0/11/create

where  is optional. Group should be unique number among all
the 

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-08 Thread Kirti Wankhede


On 9/8/2016 3:43 AM, Alex Williamson wrote:
> On Wed, 7 Sep 2016 23:36:28 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/7/2016 10:14 PM, Alex Williamson wrote:
>>> On Wed, 7 Sep 2016 21:45:31 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 9/7/2016 2:58 AM, Alex Williamson wrote:  
>>>>> On Wed, 7 Sep 2016 01:05:11 +0530
>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>> 
>>>>>> On 9/6/2016 11:10 PM, Alex Williamson wrote:
>>>>>>> On Sat, 3 Sep 2016 22:04:56 +0530
>>>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>>>>   
>>>>>>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:  
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 02/09/2016 20:33, Kirti Wankhede wrote:

...

> 
> Philosophically, mdev devices should be entirely independent of one
> another.  A user can set the same iommu context for multiple mdevs
> by placing them in the same container.  A user should be able to
> stop using an mdev in one place and start using it somewhere else.
> It should be a fungible $TYPE device.  It's an NVIDIA-only requirement
> that imposes this association of mdev devices into groups and I don't
> particularly see it as beneficial to the mdev architecture.  So why
> make it a standard part of the interface?
> 

Yes, I agree. This might not be each vendor's requirement.


> We could do keying at the layer you suggest, assuming we can find
> something that doesn't restrict the user, but we could make that
> optional.  

We can key on 'container'. Devices should be in same VFIO 'container'.
open() call should fail if they are found to be in different containers.

> For instance, say we did key on pid, there could be an
> attribute in the supported types hierarchy to indicate this type
> supports(requires) pid-sets.  Each mdev device with this attribute
> would create a pid-group file in sysfs where libvirt could associate
> the device.  Only for those mdev devices requiring it.
> 

We are OK with this suggestion if this works of libvirt integration.
We can have file in types directory in supported types as 'requires_group'.

Thanks,
Kirti

> The alternative is that we need to find some mechanism for this
> association that doesn't impose arbitrary requirements, and potentially
> usage restrictions on vendors that don't have this need.  Thanks,
> 
> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-07 Thread Kirti Wankhede


On 9/7/2016 10:14 PM, Alex Williamson wrote:
> On Wed, 7 Sep 2016 21:45:31 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/7/2016 2:58 AM, Alex Williamson wrote:
>>> On Wed, 7 Sep 2016 01:05:11 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 9/6/2016 11:10 PM, Alex Williamson wrote:  
>>>>> On Sat, 3 Sep 2016 22:04:56 +0530
>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>> 
>>>>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 02/09/2016 20:33, Kirti Wankhede wrote:  
>>>>>>>>  We could even do:  
>>>>>>>>>>
>>>>>>>>>> echo $UUID1:$GROUPA > create
>>>>>>>>>>
>>>>>>>>>> where $GROUPA is the group ID of a previously created mdev device 
>>>>>>>>>> into
>>>>>>>>>> which $UUID1 is to be created and added to the same group.  
>>>>>>>>   
>>>>>>>
>>>>>>> From the point of view of libvirt, I think I prefer Alex's idea.
>>>>>>>  could be an additional element in the nodedev-create XML:
>>>>>>>
>>>>>>> 
>>>>>>>   my-vgpu
>>>>>>>   pci__86_00_0
>>>>>>>   
>>>>>>> 
>>>>>>> 0695d332-7831-493f-9e71-1c85c8911a08
>>>>>>> group1
>>>>>>>   
>>>>>>> 
>>>>>>>
>>>>>>> (should group also be a UUID?)
>>>>>>>   
>>>>>>
>>>>>> No, this should be a unique number in a system, similar to iommu_group.  
>>>>>>   
>>>>>
>>>>> Sorry, just trying to catch up on this thread after a long weekend.
>>>>>
>>>>> We're talking about iommu groups here, we're not creating any sort of
>>>>> parallel grouping specific to mdev devices.
>>>>
>>>> I thought we were talking about group of mdev devices and not iommu
>>>> group. IIRC, there were concerns about it (this would be similar to
>>>> UUID+instance) and that would (ab)use iommu groups.  
>>>
>>> What constraints does a group, which is not an iommu group, place on the
>>> usage of the mdev devices?  What happens if we put two mdev devices in
>>> the same "mdev group" and then assign them to separate VMs/users?  I
>>> believe that the answer is that this theoretical "mdev group" doesn't
>>> actually impose any constraints on the devices within the group or how
>>> they're used.
>>>   
>>
>> We feel its not a good idea to try to associate device's iommu groups
>> with mdev device groups. That adds more complications.
>>
>> As in above nodedev-create xml, 'group1' could be a unique number that
>> can be generated by libvirt. Then to create mdev device:
>>
>>   echo $UUID1:group1 > create
>>
>> If user want to add more mdev devices to same group, he/she should use
>> same group number in next nodedev-create devices. So create commands
>> would be:
>>   echo $UUID2:group1 > create
>>   echo $UUID3:group1 > create
> 
> So groups return to being static, libvirt would need to destroy and
> create mdev devices specifically for use within the predefined group?

Yes.

> This imposes limitations on how mdev devices can be used (ie. the mdev
> pool option is once again removed).  We're also back to imposing
> grouping semantics on mdev devices that may not need them.  Do all mdev
> devices for a given user need to be put into the same group?  

Yes.

> Do groups
> span parent devices?  Do they span different vendor drivers?
> 

Yes and yes. Group number would be associated with mdev device
irrespective of its parent.


>> Each mdev device would store this group number in its mdev_device
>> structure.
>>
>> With this, we would add open() and close() callbacks from vfio_mdev
>> module for vendor driver to commit resources. Then we don't need
>> 'start'/'stop' or online/offline interface.
>>
>> To commit resources for all devices associated to that domain/user space
>> application, vendor driver can use 'first open()' and 'last close()' to
>> free those. Or

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-07 Thread Kirti Wankhede


On 9/7/2016 2:58 AM, Alex Williamson wrote:
> On Wed, 7 Sep 2016 01:05:11 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/6/2016 11:10 PM, Alex Williamson wrote:
>>> On Sat, 3 Sep 2016 22:04:56 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:  
>>>>>
>>>>>
>>>>> On 02/09/2016 20:33, Kirti Wankhede wrote:
>>>>>>  We could even do:
>>>>>>>>
>>>>>>>> echo $UUID1:$GROUPA > create
>>>>>>>>
>>>>>>>> where $GROUPA is the group ID of a previously created mdev device into
>>>>>>>> which $UUID1 is to be created and added to the same group.
>>>>>> 
>>>>>
>>>>> From the point of view of libvirt, I think I prefer Alex's idea.
>>>>>  could be an additional element in the nodedev-create XML:
>>>>>
>>>>> 
>>>>>   my-vgpu
>>>>>   pci__86_00_0
>>>>>   
>>>>> 
>>>>> 0695d332-7831-493f-9e71-1c85c8911a08
>>>>> group1
>>>>>   
>>>>> 
>>>>>
>>>>> (should group also be a UUID?)
>>>>> 
>>>>
>>>> No, this should be a unique number in a system, similar to iommu_group.  
>>>
>>> Sorry, just trying to catch up on this thread after a long weekend.
>>>
>>> We're talking about iommu groups here, we're not creating any sort of
>>> parallel grouping specific to mdev devices.  
>>
>> I thought we were talking about group of mdev devices and not iommu
>> group. IIRC, there were concerns about it (this would be similar to
>> UUID+instance) and that would (ab)use iommu groups.
> 
> What constraints does a group, which is not an iommu group, place on the
> usage of the mdev devices?  What happens if we put two mdev devices in
> the same "mdev group" and then assign them to separate VMs/users?  I
> believe that the answer is that this theoretical "mdev group" doesn't
> actually impose any constraints on the devices within the group or how
> they're used.
> 

We feel its not a good idea to try to associate device's iommu groups
with mdev device groups. That adds more complications.

As in above nodedev-create xml, 'group1' could be a unique number that
can be generated by libvirt. Then to create mdev device:

  echo $UUID1:group1 > create

If user want to add more mdev devices to same group, he/she should use
same group number in next nodedev-create devices. So create commands
would be:
  echo $UUID2:group1 > create
  echo $UUID3:group1 > create

Each mdev device would store this group number in its mdev_device
structure.

With this, we would add open() and close() callbacks from vfio_mdev
module for vendor driver to commit resources. Then we don't need
'start'/'stop' or online/offline interface.

To commit resources for all devices associated to that domain/user space
application, vendor driver can use 'first open()' and 'last close()' to
free those. Or if vendor driver want to commit resources for each device
separately, they can do in each device's open() call. It will depend on
vendor driver how they want to implement.

Libvirt don't have to do anything about assigned group numbers while
managing mdev devices.

QEMU commandline parameter would be same as earlier (don't have to
mention group number here):

  -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID1 \
  -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID2

In case if two mdev devices from same groups are assigned to different
domains, we can fail open() call of second device. How would driver know
that those are being used by different domain? By checking <group1, pid>
of first device of 'group1'. The two devices in same group should have
same pid in their open() call.

To hot-plug mdev device to a domain in which there is already a mdev
device assigned, mdev device should be created with same group number as
the existing devices are and then hot-plug it. If there is no mdev
device in that domain, then group number should be a unique number.

This simplifies the mdev grouping and also provide flexibility for
vendor driver implementation.

Thanks,
Kirti

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-06 Thread Kirti Wankhede


On 9/6/2016 11:10 PM, Alex Williamson wrote:
> On Sat, 3 Sep 2016 22:04:56 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/09/2016 20:33, Kirti Wankhede wrote:  
>>>>  We could even do:  
>>>>>>
>>>>>> echo $UUID1:$GROUPA > create
>>>>>>
>>>>>> where $GROUPA is the group ID of a previously created mdev device into
>>>>>> which $UUID1 is to be created and added to the same group.  
>>>>   
>>>
>>> From the point of view of libvirt, I think I prefer Alex's idea.
>>>  could be an additional element in the nodedev-create XML:
>>>
>>> 
>>>   my-vgpu
>>>   pci__86_00_0
>>>   
>>> 
>>> 0695d332-7831-493f-9e71-1c85c8911a08
>>> group1
>>>   
>>> 
>>>
>>> (should group also be a UUID?)
>>>   
>>
>> No, this should be a unique number in a system, similar to iommu_group.
> 
> Sorry, just trying to catch up on this thread after a long weekend.
> 
> We're talking about iommu groups here, we're not creating any sort of
> parallel grouping specific to mdev devices.

I thought we were talking about group of mdev devices and not iommu
group. IIRC, there were concerns about it (this would be similar to
UUID+instance) and that would (ab)use iommu groups.

I'm thinking about your suggestion, but would also like to know your
thought how sysfs interface would look like? Its still no clear to me.
Or will it be better to have grouping at mdev layer?

Kirti.

>  This is why my example
> created a device and then required the user to go find the group number
> given to that device in order to create another device within the same
> group.  iommu group numbering is not within the user's control and is
> not a uuid.  libvirt can refer to the group as anything it wants in the
> xml, but the host group number is allocated by the host, not under user
> control, is not persistent.  libvirt would just be giving it a name to
> know which devices are part of the same group.  Perhaps the runtime xml
> would fill in the group number once created.
> 
> There were also a lot of unanswered questions in my proposal, it's not
> clear that there's a standard algorithm for when mdev devices need to
> be grouped together.  Should we even allow groups to span multiple host
> devices?  Should they be allowed to span devices from different
> vendors?
>
> If we imagine a scenario of a group composed of a mix of Intel and
> NVIDIA vGPUs, what happens when an Intel device is opened first?  The
> NVIDIA driver wouldn't know about this, but it would know when the
> first NVIDIA device is opened and be able to establish p2p for the
> NVIDIA devices at that point.  Can we do what we need with that model?
> What if libvirt is asked to hot-add an NVIDIA vGPU?  It would need to
> do a create on the NVIDIA parent device with the existing group id, at
> which point the NVIDIA vendor driver could fail the device create if
> the p2p setup has already been done.  The Intel vendor driver might
> allow it.  Similar to open, the last close of the mdev device for a
> given vendor (which might not be the last close of mdev devices within
> the group) would need to trigger the offline process for that vendor.
> 
> That all sounds well and good... here's the kicker: iommu groups
> necessarily need to be part of the same iommu context, ie.
> vfio container.  How do we deal with vIOMMUs within the guest when we
> are intentionally forcing a set of devices within the same context?
> This is why it's _very_ beneficial on the host to create iommu groups
> with the smallest number of devices we can reasonably trust to be
> isolated.  We're backing ourselves into a corner if we tell libvirt
> that the standard process is to put all mdev devices into a single
> group.  The grouping/startup issue is still unresolved in my head.
> Thanks,
> 
> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-03 Thread Kirti Wankhede


On 9/3/2016 6:37 PM, Paolo Bonzini wrote:
> 
> 
> On 03/09/2016 13:56, John Ferlan wrote:
>> On 09/02/2016 05:48 PM, Paolo Bonzini wrote:
>>> On 02/09/2016 20:33, Kirti Wankhede wrote:
>>>>  We could even do:
>>>>>>
>>>>>> echo $UUID1:$GROUPA > create
>>>>>>
>>>>>> where $GROUPA is the group ID of a previously created mdev device into
>>>>>> which $UUID1 is to be created and added to the same group.
>>>> 
>>>
>>> >From the point of view of libvirt, I think I prefer Alex's idea.
>>>  could be an additional element in the nodedev-create XML:
>>>
>>> 
>>>   my-vgpu
>>>   pci__86_00_0
>>>   
>>> 
>>> 0695d332-7831-493f-9e71-1c85c8911a08
>>> group1
>>>   
>>> 
>>>
>>> (should group also be a UUID?)
>>

I replied to earlier mail too, group number doesn't need to be UUID. It
should be a unique number. I think in the discussion in bof someone
mentioned about using domain's unique number that libvirt generates.
That should also work.

>> As long as create_group handles all the work and all libvirt does is
>> call it, get the return status/error, and handle deleting the vGPU on
>> error, then I guess it's doable.
>>

Yes that is the idea. Libvirt doesn't have to care about the groups.
With Alex's proposal, as you mentioned above, libvirt have to provide
group number to mdev_create, check return status and handle error case.

  echo $UUID1:$GROUP1 > mdev_create
  echo $UUID2:$GROUP1 > mdev_create
would create two mdev devices assigned to same domain.


>> Alternatively having multiple  in the XML and performing a
>> single *mdev/create_group is an option.
> 
> I don't really like the idea of a single nodedev-create creating
> multiple devices, but that would work too.
> 
>> That is, what is the "output" from create_group that gets added to the
>> domain XML?  How is that found?
> 
> A new sysfs path is created, whose name depends on the UUID.  The UUID
> is used in a  element in the domain XML and the sysfs path
> appears in the QEMU command line.  Kirti and Neo had examples in their
> presentation at KVM Forum.
> 
> If you create multiple devices in the same group, they are added to the
> same IOMMU group so they must be used by the same VM.  However they
> don't have to be available from the beginning; they could be
> hotplugged/hot-unplugged later, since from the point of view of the VM
> those are just another PCI device.
> 
>> Also, once the domain is running can a
>> vGPU be added to the group?  Removed?  What allows/prevents?
> 
> Kirti?... :)

Yes, vGPU could be hot-plugged or hot-unplugged. This also depends on
does vendor driver want to support that. For example, domain is running
with two vGPUs $UUID1 and $UUID2 and user tried to hot-unplug vGPU
$UUID2, vendor driver knows that domain is running and vGPU is being
used in guest, so vendor driver can fail offline/close() call if they
don't support hot-unplug. Similarly for hot-plug vendor driver can fail
create call to not to support hot-plug.

> 
> In principle I don't think anything should block vGPUs from different
> groups being added to the same VM, but I have to defer to Alex and Kirti
> again on this.
> 

No, there should be one group per VM.

>>> Since John brought up the topic of minimal XML, in this case it will be
>>> like this:
>>>
>>> 
>>>   my-vgpu
>>>   pci__86_00_0
>>>   
>>> 
>>>   
>>> 
>>>
>>> The uuid will be autogenerated by libvirt and if there's no  (as
>>> is common for VMs with only 1 vGPU) it will be a single-device group.
>>
>> The  could be ignored as it seems existing libvirt code wants to
>> generate a name via udevGenerateDeviceName for other devices. I haven't
>> studied it long enough, but I believe that's how those pci_* names
>> created.
> 
> Yeah that makes sense.  So we get down to a minimal XML that has just
> parent, and capability with type in it; additional elements could be
> name (ignored anyway), and within capability uuid and group.
>

Yes, this seems good.
I would like to have one more capability here. Pulling here some
suggestion from my previous mail:
In the directory structure, a 'params' can take optional parameters.
Libvirt then can set 'params' and then create mdev device. For example,
param say 'disable_console_vnc=1' is set for type 11, then devices
created of type 11 will have that param set unless it is cleared.

 └── mdev_supp

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-03 Thread Kirti Wankhede


On 9/3/2016 5:27 AM, Laine Stump wrote:
> On 09/02/2016 05:44 PM, Paolo Bonzini wrote:
>>
>>
>> On 02/09/2016 22:19, John Ferlan wrote:
>>> We don't have such a pool for GPU's (yet) - although I suppose they
>>> could just become a class of storage pools.
>>>
>>> The issue being nodedev device objects are not saved between reboots.
>>> They are generated on the fly. Hence the "create-nodedev' API - notice
>>> there's no "define-nodedev' API, although I suppose one could be
>>> created. It's just more work to get this all to work properly.
>>
>> It can all be made transient to begin with.  The VM can be defined but
>> won't start unless the mdev(s) exist with the right UUIDs.
>>
 After creating the vGPU, if required by the host driver, all the other
 type ids would disappear from "virsh nodedev-dumpxml
 pci__86_00_0" too.
>>>
>>> Not wanting to make assumptions, but this reads as if I create one type
>>> 11 vGPU, then I can create no others on the host.  Maybe I'm reading it
>>> wrong - it's been a long week.
>>
>> Correct, at least for NVIDIA.
>>
>>> PCI devices have the "managed='yes|no'" attribute as well. That's what
>>> determines whether the device is to be detached from the host or not.
>>> That's been something very painful to manage for vfio and well libvirt!
>>
>> mdevs do not exist on the host (they do not have a driver on the host
>> because they are not PCI devices) so they do need any management.  At
>> least I hope that's good news. :)
> 
> What's your definition of "management"? They don't need the same type of
> management as a traditional hostdev, but they certainly don't just
> appear by magic! :-)
> 
> For standard PCI devices, the managed attribute says whether or not the
> device needs to be detached from the host driver and attached to
> vfio-pci. For other kinds of hostdev devices, we could decide that it
> meant something different. In this case, perhaps managed='yes' could
> mean that the vGPU will be created as needed, and destroyed when the
> guest is finished with it, and managed='no' could mean that we expect a
> vGPU to already exist, and just need starting.
> 
> Or not. Maybe that's a pointless distinction in this case. Just pointing
> out the option...
> 

Mediated devices are like virtual device, there could be no direct
physical device associated with it. All mdev devices are owned by
vfio_mdev module, which is similar to vfio_pci module. I don't think we
need  to interpret 'managed' attribute for mdev devices same as standard
PCI devices.
If mdev device is created, you would find the device directory in
/sys/bus/mdev/devices/ directory.

Kirti.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-03 Thread Kirti Wankhede


On 9/3/2016 3:18 AM, Paolo Bonzini wrote:
> 
> 
> On 02/09/2016 20:33, Kirti Wankhede wrote:
>>  We could even do:
>>>>
>>>> echo $UUID1:$GROUPA > create
>>>>
>>>> where $GROUPA is the group ID of a previously created mdev device into
>>>> which $UUID1 is to be created and added to the same group.
>> 
> 
> From the point of view of libvirt, I think I prefer Alex's idea.
>  could be an additional element in the nodedev-create XML:
> 
> 
>   my-vgpu
>   pci__86_00_0
>   
> 
> 0695d332-7831-493f-9e71-1c85c8911a08
> group1
>   
> 
> 
> (should group also be a UUID?)
> 

No, this should be a unique number in a system, similar to iommu_group.

> Since John brought up the topic of minimal XML, in this case it will be
> like this:
> 
> 
>   my-vgpu
>   pci__86_00_0
>   
> 
>   
> 
> 
> The uuid will be autogenerated by libvirt and if there's no  (as
> is common for VMs with only 1 vGPU) it will be a single-device group.
> 

Right.

Kirti.

> Thanks,
> 
> Paolo
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-03 Thread Kirti Wankhede


On 9/3/2016 1:59 AM, John Ferlan wrote:
> 
> 
> On 09/02/2016 02:33 PM, Kirti Wankhede wrote:
>>
>> On 9/2/2016 10:55 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/09/2016 19:15, Kirti Wankhede wrote:
>>>> On 9/2/2016 3:35 PM, Paolo Bonzini wrote:
>>>>>
>>>>>  my-vgpu
>>>>>  pci__86_00_0
>>>>>  
>>>>>
>>>>>0695d332-7831-493f-9e71-1c85c8911a08
>>>>>  
>>>>>
>>>>>
>>>>> After creating the vGPU, if required by the host driver, all the other
>>>>> type ids would disappear from "virsh nodedev-dumpxml pci__86_00_0" 
>>>>> too.
>>>>
>>>> Thanks Paolo for details.
>>>> 'nodedev-create' parse the xml file and accordingly write to 'create'
>>>> file in sysfs to create mdev device. Right?
>>>> At this moment, does libvirt know which VM this device would be
>>>> associated with?
>>>
>>> No, the VM will associate to the nodedev through the UUID.  The nodedev
>>> is created separately from the VM.
>>>
>>>>> When dumping the mdev with nodedev-dumpxml, it could show more complete
>>>>> info, again taken from sysfs:
>>>>>
>>>>>
>>>>>  my-vgpu
>>>>>  pci__86_00_0
>>>>>  
>>>>>0695d332-7831-493f-9e71-1c85c8911a08
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>  
>>>>>  
>>>>>  ...
>>>>>  NVIDIA
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>> Notice how the parent has mdev inside pci; the vGPU, if it has to have
>>>>> pci at all, would have it inside mdev.  This represents the difference
>>>>> between the mdev provider and the mdev device.
>>>>
>>>> Parent of mdev device might not always be a PCI device. I think we
>>>> shouldn't consider it as PCI capability.
>>>
>>> The  in the vGPU means that it _will_ be exposed
>>> as a PCI device by VFIO.
>>>
>>> The  in the physical GPU means that the GPU is a
>>> PCI device.
>>>
>>
>> Ok. Got that.
>>
>>>>> Random proposal for the domain XML too:
>>>>>
>>>>>   
>>>>> 
>>>>>   
>>>>>   0695d332-7831-493f-9e71-1c85c8911a08
>>>>> 
>>>>> 
>>>>>   
>>>>>
>>>>
>>>> When user wants to assign two mdev devices to one VM, user have to add
>>>> such two entries or group the two devices in one entry?
>>>
>>> Two entries, one per UUID, each with its own PCI address in the guest.
>>>
>>>> On other mail thread with same subject we are thinking of creating group
>>>> of mdev devices to assign multiple mdev devices to one VM.
>>>
>>> What is the advantage in managing mdev groups?  (Sorry didn't follow the
>>> other thread).
>>>
>>
>> When mdev device is created, resources from physical device is assigned
>> to this device. But resources are committed only when device goes
>> 'online' ('start' in v6 patch)
>> In case of multiple vGPUs in a VM for Nvidia vGPU solution, resources
>> for all vGPU devices in a VM are committed at one place. So we need to
>> know the vGPUs assigned to a VM before QEMU starts.
>>
>> Grouping would help here as Alex suggested in that mail. Pulling only
>> that part of discussion here:
>>
>>  It seems then that the grouping needs to affect the iommu group
>> so that
>>> you know that there's only a single owner for all the mdev devices
>>> within the group.  IIRC, the bus drivers don't have any visibility
>>> to opening and releasing of the group itself to trigger the
>>> online/offline, but they can track opening of the device file
>>> descriptors within the group.  Within the VFIO API the user cannot
>>> access the device without the device file descriptor, so a "first
>>> device opened" and "last device closed" trigger would provide the
>>> trigger points you need.  Some sort of new sysfs interface would need
>>> to be invented to allow this sort of man

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-02 Thread Kirti Wankhede

On 9/2/2016 10:55 PM, Paolo Bonzini wrote:
> 
> 
> On 02/09/2016 19:15, Kirti Wankhede wrote:
>> On 9/2/2016 3:35 PM, Paolo Bonzini wrote:
>>>
>>>  my-vgpu
>>>  pci__86_00_0
>>>  
>>>
>>>0695d332-7831-493f-9e71-1c85c8911a08
>>>  
>>>
>>>
>>> After creating the vGPU, if required by the host driver, all the other
>>> type ids would disappear from "virsh nodedev-dumpxml pci__86_00_0" too.
>>
>> Thanks Paolo for details.
>> 'nodedev-create' parse the xml file and accordingly write to 'create'
>> file in sysfs to create mdev device. Right?
>> At this moment, does libvirt know which VM this device would be
>> associated with?
> 
> No, the VM will associate to the nodedev through the UUID.  The nodedev
> is created separately from the VM.
> 
>>> When dumping the mdev with nodedev-dumpxml, it could show more complete
>>> info, again taken from sysfs:
>>>
>>>
>>>  my-vgpu
>>>  pci__86_00_0
>>>  
>>>0695d332-7831-493f-9e71-1c85c8911a08
>>>
>>>
>>>  
>>>
>>>
>>>  
>>>  
>>>  ...
>>>  NVIDIA
>>>
>>>  
>>>
>>>
>>> Notice how the parent has mdev inside pci; the vGPU, if it has to have
>>> pci at all, would have it inside mdev.  This represents the difference
>>> between the mdev provider and the mdev device.
>>
>> Parent of mdev device might not always be a PCI device. I think we
>> shouldn't consider it as PCI capability.
> 
> The  in the vGPU means that it _will_ be exposed
> as a PCI device by VFIO.
> 
> The  in the physical GPU means that the GPU is a
> PCI device.
> 

Ok. Got that.

>>> Random proposal for the domain XML too:
>>>
>>>   
>>> 
>>>   
>>>   0695d332-7831-493f-9e71-1c85c8911a08
>>> 
>>> 
>>>   
>>>
>>
>> When user wants to assign two mdev devices to one VM, user have to add
>> such two entries or group the two devices in one entry?
> 
> Two entries, one per UUID, each with its own PCI address in the guest.
> 
>> On other mail thread with same subject we are thinking of creating group
>> of mdev devices to assign multiple mdev devices to one VM.
> 
> What is the advantage in managing mdev groups?  (Sorry didn't follow the
> other thread).
> 

When mdev device is created, resources from physical device is assigned
to this device. But resources are committed only when device goes
'online' ('start' in v6 patch)
In case of multiple vGPUs in a VM for Nvidia vGPU solution, resources
for all vGPU devices in a VM are committed at one place. So we need to
know the vGPUs assigned to a VM before QEMU starts.

Grouping would help here as Alex suggested in that mail. Pulling only
that part of discussion here:

 It seems then that the grouping needs to affect the iommu group
so that
> you know that there's only a single owner for all the mdev devices
> within the group.  IIRC, the bus drivers don't have any visibility
> to opening and releasing of the group itself to trigger the
> online/offline, but they can track opening of the device file
> descriptors within the group.  Within the VFIO API the user cannot
> access the device without the device file descriptor, so a "first
> device opened" and "last device closed" trigger would provide the
> trigger points you need.  Some sort of new sysfs interface would need
> to be invented to allow this sort of manipulation.
> Also we should probably keep sight of whether we feel this is
> sufficiently necessary for the complexity.  If we can get by with only
> doing this grouping at creation time then we could define the "create"
> interface in various ways.  For example:
>
> echo $UUID0 > create
>
> would create a single mdev named $UUID0 in it's own group.
>
> echo {$UUID0,$UUID1} > create
>
> could create mdev devices $UUID0 and $UUID1 grouped together.
>



I think this would create mdev device of same type on same parent
device. We need to consider the case of multiple mdev devices of
different types and with different parents to be grouped together.


 We could even do:
>
> echo $UUID1:$GROUPA > create
>
> where $GROUPA is the group ID of a previously created mdev device into
> which $UUID1 is to be created and added to the same group.



I was thinking about:

  echo $UUID0 > create

would create mdev device

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-02 Thread Kirti Wankhede
On 9/2/2016 3:35 PM, Paolo Bonzini wrote:
> 
> 
> On 02/09/2016 07:21, Kirti Wankhede wrote:
>> On 9/2/2016 10:18 AM, Michal Privoznik wrote:
>>> Okay, maybe I'm misunderstanding something. I just thought that users
>>> will consult libvirt's nodedev driver (e.g. virsh nodedev-list && virsh
>>> nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
>>> to construct domain XML.
>>
>> I'm not familiar with libvirt code, curious how libvirt's nodedev driver
>> enumerates devices in the system?
> 
> It looks at sysfs and/or the udev database and transforms what it finds
> there to XML.
> 
> I think people would consult the nodedev driver to fetch vGPU
> capabilities, use "virsh nodedev-create" to create the vGPU device on
> the host, and then somehow refer to the nodedev in the domain XML.
> 
> There isn't very much documentation on nodedev-create, but it's used
> mostly for NPIV (virtual fibre channel adapter) and the XML looks like this:
> 
>
>  scsi_host6
>  scsi_host5
>  
>
>  2001001b32a9da5e
>  2101001b32a9da5e
>
>  
>
> 
> so I suppose for vGPU it would look like this:
> 
>
>  my-vgpu
>  pci__86_00_0
>  
>
>0695d332-7831-493f-9e71-1c85c8911a08
>  
>
> 
> while the parent would have:
> 
>
>  pci__86_00_0
>  
>0
>134
>0
>0
>
>  
>  
>
>GRID M60-0B
>2
>45
>524288
>2560
>1600
>  
>
>GRID M60
>NVIDIA
>  
>
> 
> After creating the vGPU, if required by the host driver, all the other
> type ids would disappear from "virsh nodedev-dumpxml pci__86_00_0" too.
>

Thanks Paolo for details.
'nodedev-create' parse the xml file and accordingly write to 'create'
file in sysfs to create mdev device. Right?
At this moment, does libvirt know which VM this device would be
associated with?

> When dumping the mdev with nodedev-dumpxml, it could show more complete
> info, again taken from sysfs:
> 
>
>  my-vgpu
>  pci__86_00_0
>  
>0695d332-7831-493f-9e71-1c85c8911a08
>
>
>  GRID M60-0B
>  2
>  45
>  524288
>  2560
>  1600
>
>
>  
>  
>  ...
>  NVIDIA
>
>  
>
> 
> Notice how the parent has mdev inside pci; the vGPU, if it has to have
> pci at all, would have it inside mdev.  This represents the difference
> between the mdev provider and the mdev device.
>

Parent of mdev device might not always be a PCI device. I think we
shouldn't consider it as PCI capability.

> Random proposal for the domain XML too:
> 
>   
> 
>   
>   0695d332-7831-493f-9e71-1c85c8911a08
> 
> 
>   
>

When user wants to assign two mdev devices to one VM, user have to add
such two entries or group the two devices in one entry?
On other mail thread with same subject we are thinking of creating group
of mdev devices to assign multiple mdev devices to one VM. Libvirt don't
have to know about group number but libvirt should add all mdev devices
in a group. Is that possible to do before starting QEMU process?

Thanks,
Kirti


> Paolo
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-09-02 Thread Kirti Wankhede


On 9/2/2016 1:31 AM, Alex Williamson wrote:
> On Thu, 1 Sep 2016 23:52:02 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> Alex,
>> Thanks for summarizing the discussion.
>>
>> On 8/31/2016 9:18 PM, Alex Williamson wrote:
>>> On Wed, 31 Aug 2016 15:04:13 +0800
>>> Jike Song <jike.s...@intel.com> wrote:
>>>   
>>>> On 08/31/2016 02:12 PM, Tian, Kevin wrote:  
>>>>>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>>>>>> Sent: Wednesday, August 31, 2016 12:17 AM
>>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> At KVM Forum we had a BoF session primarily around the mediated device
>>>>>> sysfs interface.  I'd like to share what I think we agreed on and the
>>>>>> "problem areas" that still need some work so we can get the thoughts
>>>>>> and ideas from those who weren't able to attend.
>>>>>>
>>>>>> DanPB expressed some concern about the mdev_supported_types sysfs
>>>>>> interface, which exposes a flat csv file with fields like "type",
>>>>>> "number of instance", "vendor string", and then a bunch of type
>>>>>> specific fields like "framebuffer size", "resolution", "frame rate
>>>>>> limit", etc.  This is not entirely machine parsing friendly and sort of
>>>>>> abuses the sysfs concept of one value per file.  Example output taken
>>>>>> from Neo's libvirt RFC:
>>>>>>
>>>>>> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>>>>>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
>>>>>> framebuffer,
>>>>>> max_resolution
>>>>>> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>>>>>> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>>>>>> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>>>>>> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>>>>>> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>>>>>> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>>>>>> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>>>>>> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>>>>>>
>>>>>> The create/destroy then looks like this:
>>>>>>
>>>>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>>>>>  /sys/bus/pci/devices/.../mdev_create
>>>>>>
>>>>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>>>>>  /sys/bus/pci/devices/.../mdev_destroy
>>>>>>
>>>>>> "vendor_specific_argument_list" is nebulous.
>>>>>>
>>>>>> So the idea to fix this is to explode this into a directory structure,
>>>>>> something like:
>>>>>>
>>>>>> ├── mdev_destroy
>>>>>> └── mdev_supported_types
>>>>>> ├── 11
>>>>>> │   ├── create
>>>>>> │   ├── description
>>>>>> │   └── max_instances
>>>>>> ├── 12
>>>>>> │   ├── create
>>>>>> │   ├── description
>>>>>> │   └── max_instances
>>>>>> └── 13
>>>>>> ├── create
>>>>>> ├── description
>>>>>> └── max_instances
>>>>>>
>>>>>> Note that I'm only exposing the minimal attributes here for simplicity,
>>>>>> the other attributes would be included in separate files and we would
>>>>>> require vendors to create standard attributes for common device classes. 
>>>>>>
>>>>>
>>>>> I like this idea. All standard attributes are reflected into this 
>>>>> hierarchy.
>>>>> In the meantime, can we still allow optional vendor string in create 
>>>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>>>> layer to do some vendor specific tweak 

Re: [libvirt] [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede


On 9/2/2016 10:18 AM, Michal Privoznik wrote:
> On 01.09.2016 18:59, Alex Williamson wrote:
>> On Thu, 1 Sep 2016 18:47:06 +0200
>> Michal Privoznik  wrote:
>>
>>> On 31.08.2016 08:12, Tian, Kevin wrote:
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 12:17 AM
>
> Hi folks,
>
> At KVM Forum we had a BoF session primarily around the mediated device
> sysfs interface.  I'd like to share what I think we agreed on and the
> "problem areas" that still need some work so we can get the thoughts
> and ideas from those who weren't able to attend.
>
> DanPB expressed some concern about the mdev_supported_types sysfs
> interface, which exposes a flat csv file with fields like "type",
> "number of instance", "vendor string", and then a bunch of type
> specific fields like "framebuffer size", "resolution", "frame rate
> limit", etc.  This is not entirely machine parsing friendly and sort of
> abuses the sysfs concept of one value per file.  Example output taken
> from Neo's libvirt RFC:
>
> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> framebuffer,
> max_resolution
> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>
> The create/destroy then looks like this:
>
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_create
>
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_destroy
>
> "vendor_specific_argument_list" is nebulous.
>
> So the idea to fix this is to explode this into a directory structure,
> something like:
>
> ├── mdev_destroy
> └── mdev_supported_types
> ├── 11
> │   ├── create
> │   ├── description
> │   └── max_instances
> ├── 12
> │   ├── create
> │   ├── description
> │   └── max_instances
> └── 13
> ├── create
> ├── description
> └── max_instances
>
> Note that I'm only exposing the minimal attributes here for simplicity,
> the other attributes would be included in separate files and we would
> require vendors to create standard attributes for common device classes.  

 I like this idea. All standard attributes are reflected into this 
 hierarchy.
 In the meantime, can we still allow optional vendor string in create 
 interface? libvirt doesn't need to know the meaning, but allows upper
 layer to do some vendor specific tweak if necessary.  
>>>
>>> This is not the best idea IMO. Libvirt is there to shadow differences
>>> between hypervisors. While doing that, we often hide differences between
>>> various types of HW too. Therefore in order to provide good abstraction
>>> we should make vendor specific string as small as possible (ideally an
>>> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
>>> example above in domain XML. What I think the better idea is if we let
>>> users chose resolution and frame buffer size, e.g.: >> resolution="1024x768" framebuffer="16"/> (just the first idea that came
>>> to my mind while writing this e-mail). The point is, XML part is
>>> completely free of any vendor-specific knobs.
>>
>> That's not really what you want though, a user actually cares whether
>> they get an Intel of NVIDIA vGPU, we can't specify it as just a
>> resolution and framebuffer size.  The user also doesn't want the model
>> changing each time the VM is started, so not only do you *need* to know
>> the vendor, you need to know the vendor model.  This is the only way to
>> provide a consistent VM.  So as we discussed at the BoF, the libvirt
>> xml will likely reference the vendor string, which will be a unique
>> identifier that encompasses all the additional attributes we expose.
>> Really the goal of the attributes is simply so you don't need a per
>> vendor magic decoder ring to figure out the basic features of a given
>> vendor string.  Thanks,
> 
> Okay, maybe I'm misunderstanding something. I just thought that users
> will consult libvirt's nodedev driver (e.g. virsh nodedev-list && virsh
> nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
> to construct 

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede

Alex,
Thanks for summarizing the discussion.

On 8/31/2016 9:18 PM, Alex Williamson wrote:
> On Wed, 31 Aug 2016 15:04:13 +0800
> Jike Song  wrote:
> 
>> On 08/31/2016 02:12 PM, Tian, Kevin wrote:
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 31, 2016 12:17 AM

 Hi folks,

 At KVM Forum we had a BoF session primarily around the mediated device
 sysfs interface.  I'd like to share what I think we agreed on and the
 "problem areas" that still need some work so we can get the thoughts
 and ideas from those who weren't able to attend.

 DanPB expressed some concern about the mdev_supported_types sysfs
 interface, which exposes a flat csv file with fields like "type",
 "number of instance", "vendor string", and then a bunch of type
 specific fields like "framebuffer size", "resolution", "frame rate
 limit", etc.  This is not entirely machine parsing friendly and sort of
 abuses the sysfs concept of one value per file.  Example output taken
 from Neo's libvirt RFC:

 cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
 # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
 framebuffer,
 max_resolution
 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

 The create/destroy then looks like this:

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

 "vendor_specific_argument_list" is nebulous.

 So the idea to fix this is to explode this into a directory structure,
 something like:

 ├── mdev_destroy
 └── mdev_supported_types
 ├── 11
 │   ├── create
 │   ├── description
 │   └── max_instances
 ├── 12
 │   ├── create
 │   ├── description
 │   └── max_instances
 └── 13
 ├── create
 ├── description
 └── max_instances

 Note that I'm only exposing the minimal attributes here for simplicity,
 the other attributes would be included in separate files and we would
 require vendors to create standard attributes for common device classes.  
>>>
>>> I like this idea. All standard attributes are reflected into this hierarchy.
>>> In the meantime, can we still allow optional vendor string in create 
>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>> layer to do some vendor specific tweak if necessary.
>>>   
>>
>> Not sure whether this can done within MDEV framework (attrs provided by
>> vendor driver of course), or must be within the vendor driver.
> 
> The purpose of the sub-directories is that libvirt doesn't need to pass
> arbitrary, vendor strings to the create function, the attributes of the
> mdev device created are defined by the attributes in the sysfs
> directory where the create is done.  The user only provides a uuid for
> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> need to know the meaning, but would need to know when to apply them,
> which is just as bad.  Ultimately we want libvirt to be able to
> interact with sysfs without having an vendor specific knowledge.
> 

Above directory hierarchy looks fine to me. Along with the fixed set of
parameter, a optional field of extra parameter is also required. Such
parameters are required for some specific testing or running benchmarks,
for example to disable FRL (framerate limiter) or to disable console vnc
when not required. Libvirt don't need to know its details, its just a
string that user can provide and libvirt need to pass the string as it
is to vendor driver, vendor driver would act accordingly.


 For vGPUs like NVIDIA where we don't support multiple types
 concurrently, this directory structure would update as mdev devices are
 created, removing no longer available types.  I carried forward  
>>>
>>> or keep the type with max_instances cleared to ZERO.
>>>  
>>
>> +1 :)
> 
> Possible yes, but why would the vendor driver report types that the
> user cannot create?  It just seems like superfluous information (well,
> except for the use I discover below).
> 

The directory structure for a physical GPU will be defined when device
is register to