Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Neo Jia
On Tue, Aug 16, 2016 at 02:51:03PM -0600, Alex Williamson wrote:
> On Tue, 16 Aug 2016 13:30:06 -0700
> Neo Jia  wrote:
> 
> > On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote:
> > > On Mon, 15 Aug 2016 12:59:08 -0700
> > > Neo Jia  wrote:
> > >   
> > > > > > >
> > > > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > > > probably want to start/stop them individually.  Actually, why is 
> > > > > > > it
> > > > > > > that we can't use the mediated device being opened and released to
> > > > > > > automatically signal to the backend vendor driver to commit and 
> > > > > > > release
> > > > > > > resources? I don't fully understand why userspace needs this 
> > > > > > > interface.
> > > > >   
> > > 
> > > That doesn't give an individual user the ability to stop and start
> > > their devices though, because in order for a user to have write
> > > permissions there, they get permission to DoS other users by pumping
> > > arbitrary UUIDs into those files.  By placing start/stop per mdev, we
> > > have mdev level granularity of granting start/stop privileges.  Really
> > > though, do we want QEMU fumbling around through sysfs or do we want an
> > > interface through the vfio API to perform start/stop?  Thanks,  
> > 
> > Hi Alex,
> > 
> > I think those two suggests make sense, so we will move the "start/stop"
> > under mdev sysfs. 
> > 
> > This will be incorporated in our next v7 patch and by doing that, it will 
> > make
> > the locking scheme easier.
> 
> Thanks Neo.  Also note that the semantics change when we move to per
> device control.  It would be redundant to 'echo $UUID' into a start
> file which only controls a single device.  So that means we probably
> just want an 'echo 1'.  But if we can 'echo 1' then we can also 'echo
> 0', so we can reduce this to a single sysfs file.  Sysfs already has a
> common interface for starting and stopping devices, the "online" file.
> So I think we should probably move in that direction.  Additionally, an
> "online" file should support a _show() function, so if we have an Intel
> vGPU that perhaps does not need start/stop support, online could report
> "1" after create to show that it's already online, possibly even
> generate an error trying to change the online state.  Thanks,

Agree. We will adopt the similar syntax and support _show() function.

Thanks,
Neo

> 
> Alex



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Alex Williamson
On Mon, 15 Aug 2016 23:13:20 -0700
Neo Jia  wrote:

> On Tue, Aug 16, 2016 at 05:58:54AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, August 16, 2016 1:44 PM
> > > 
> > > On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote:  
> > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > Sent: Tuesday, August 16, 2016 12:17 PM
> > > > >
> > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:  
> > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > > > > >
> > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:  
> > > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM  
> > > > > > >  
> > > > > > > > > > >
> > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices 
> > > > > > > > > > > assigned to a VM  
> > > in  
> > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a 
> > > > > > > > > > > VM along with
> > > > > > > > > > > some common resources.  
> > > > > > > > > >
> > > > > > > > > > Kirti, can you elaborate the background about above 
> > > > > > > > > > one-shot commit
> > > > > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > > > > >
> > > > > > > > > > As I relied in another mail, I really hope start/stop 
> > > > > > > > > > become a per-mdev
> > > > > > > > > > attribute instead of global one, e.g.:
> > > > > > > > > >
> > > > > > > > > > echo "0/1" >  
> > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start  
> > > > > > > > > >
> > > > > > > > > > In many scenario the user space client may only want to 
> > > > > > > > > > talk to mdev
> > > > > > > > > > instance directly, w/o need to contact its parent device. 
> > > > > > > > > > Still take
> > > > > > > > > > live migration for example, I don't think Qemu wants to 
> > > > > > > > > > know parent
> > > > > > > > > > device of assigned mdev instances.  
> > > > > > > > >
> > > > > > > > > Hi Kevin,
> > > > > > > > >
> > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require 
> > > > > > > > > anybody to  
> > > know  
> > > > > > > > > parent device. you can just do
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > > > > >
> > > > > > > > > or
> > > > > > > > >
> > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > > > > >
> > > > > > > > > without knowing the parent device.
> > > > > > > > >  
> > > > > > > >
> > > > > > > > You can look at some existing sysfs example, e.g.:
> > > > > > > >
> > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > > > > >
> > > > > > > > You may also argue why not using a global style:
> > > > > > > >
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > > > > >
> > > > > > > > There are many similar examples...  
> > > > > > >
> > > > > > > Hi Kevin,
> > > > > > >
> > > > > > > My response above is to your question about using the global 
> > > > > > > sysfs entry as you
> > > > > > > don't want to have the global path because
> > > > > > >
> > > > > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > > > > instances.".
> > > > > > >
> > > > > > > So I just want to confirm with you that (in case you miss):
> > > > > > >
> > > > > > > /sys/class/mdev/mdev_start | mdev_stop
> > > > > > >
> > > > > > > doesn't require the knowledge of parent device.
> > > > > > >  
> > > > > >
> > > > > > Qemu is just one example, where your explanation of parent device
> > > > > > makes sense but still it's not good for Qemu to populate 
> > > > > > /sys/class/mdev
> > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > > > > instance, so any mdev attributes touched by Qemu should be put under
> > > > > > that node (e.g. start/stop for live migration usage as I explained 
> > > > > > earlier).  
> > > > >
> > > > > Exactly, qemu is passed with the actual sysfs path.
> > > > >
> > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | 
> > > > > mdev_stop at all.
> > > > >
> > > > > QEMU will take the sysfs path as input:
> > > > >
> > > > >  -device
> > > > >  
> > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > >   
> > > > > d=vgpu0  
> > > >
> > > > no need of passing "id=vgpu0" here. If necessary you can put id as an 
> > > > attribute
> > > > under sysfs mdev node:
> > > >
> > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id  
> > > 
> > > I think we have moved away from the device index based on Alex's comment, 
> > > so the
> > > device path will be:
> > > 
> > >  /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818  
> > 
> > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> > as parameter, and then 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Alex Williamson
On Tue, 16 Aug 2016 13:30:06 -0700
Neo Jia  wrote:

> On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote:
> > On Mon, 15 Aug 2016 12:59:08 -0700
> > Neo Jia  wrote:
> >   
> > > > > >
> > > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > > that we can't use the mediated device being opened and released to
> > > > > > automatically signal to the backend vendor driver to commit and 
> > > > > > release
> > > > > > resources? I don't fully understand why userspace needs this 
> > > > > > interface.
> > > >   
> > 
> > That doesn't give an individual user the ability to stop and start
> > their devices though, because in order for a user to have write
> > permissions there, they get permission to DoS other users by pumping
> > arbitrary UUIDs into those files.  By placing start/stop per mdev, we
> > have mdev level granularity of granting start/stop privileges.  Really
> > though, do we want QEMU fumbling around through sysfs or do we want an
> > interface through the vfio API to perform start/stop?  Thanks,  
> 
> Hi Alex,
> 
> I think those two suggests make sense, so we will move the "start/stop"
> under mdev sysfs. 
> 
> This will be incorporated in our next v7 patch and by doing that, it will make
> the locking scheme easier.

Thanks Neo.  Also note that the semantics change when we move to per
device control.  It would be redundant to 'echo $UUID' into a start
file which only controls a single device.  So that means we probably
just want an 'echo 1'.  But if we can 'echo 1' then we can also 'echo
0', so we can reduce this to a single sysfs file.  Sysfs already has a
common interface for starting and stopping devices, the "online" file.
So I think we should probably move in that direction.  Additionally, an
"online" file should support a _show() function, so if we have an Intel
vGPU that perhaps does not need start/stop support, online could report
"1" after create to show that it's already online, possibly even
generate an error trying to change the online state.  Thanks,

Alex



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Neo Jia
On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote:
> On Mon, 15 Aug 2016 12:59:08 -0700
> Neo Jia  wrote:
> 
> > > > >
> > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > that we can't use the mediated device being opened and released to
> > > > > automatically signal to the backend vendor driver to commit and 
> > > > > release
> > > > > resources? I don't fully understand why userspace needs this 
> > > > > interface.  
> > > 
> 
> That doesn't give an individual user the ability to stop and start
> their devices though, because in order for a user to have write
> permissions there, they get permission to DoS other users by pumping
> arbitrary UUIDs into those files.  By placing start/stop per mdev, we
> have mdev level granularity of granting start/stop privileges.  Really
> though, do we want QEMU fumbling around through sysfs or do we want an
> interface through the vfio API to perform start/stop?  Thanks,

Hi Alex,

I think those two suggests make sense, so we will move the "start/stop"
under mdev sysfs. 

This will be incorporated in our next v7 patch and by doing that, it will make
the locking scheme easier.

Thanks,
Neo

> 
> Alex



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Alex Williamson
On Tue, 16 Aug 2016 04:52:30 +
"Tian, Kevin"  wrote:

> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, August 16, 2016 12:17 PM
> > 
> > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:  
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > >
> > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:  
> > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > Sent: Tuesday, August 16, 2016 3:59 AM  
> > > >  
> > > > > > > >
> > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned 
> > > > > > > > to a VM in
> > > > > > > > one shot to commit resources of all vGPUs assigned to a VM 
> > > > > > > > along with
> > > > > > > > some common resources.  
> > > > > > >
> > > > > > > Kirti, can you elaborate the background about above one-shot 
> > > > > > > commit
> > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > >
> > > > > > > As I relied in another mail, I really hope start/stop become a 
> > > > > > > per-mdev
> > > > > > > attribute instead of global one, e.g.:
> > > > > > >
> > > > > > > echo "0/1" >  
> > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start  
> > > > > > >
> > > > > > > In many scenario the user space client may only want to talk to 
> > > > > > > mdev
> > > > > > > instance directly, w/o need to contact its parent device. Still 
> > > > > > > take
> > > > > > > live migration for example, I don't think Qemu wants to know 
> > > > > > > parent
> > > > > > > device of assigned mdev instances.  
> > > > > >
> > > > > > Hi Kevin,
> > > > > >
> > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody 
> > > > > > to know
> > > > > > parent device. you can just do
> > > > > >
> > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > >
> > > > > > or
> > > > > >
> > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > >
> > > > > > without knowing the parent device.
> > > > > >  
> > > > >
> > > > > You can look at some existing sysfs example, e.g.:
> > > > >
> > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > >
> > > > > You may also argue why not using a global style:
> > > > >
> > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > >
> > > > > There are many similar examples...  
> > > >
> > > > Hi Kevin,
> > > >
> > > > My response above is to your question about using the global sysfs 
> > > > entry as you
> > > > don't want to have the global path because
> > > >
> > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > instances.".
> > > >
> > > > So I just want to confirm with you that (in case you miss):
> > > >
> > > > /sys/class/mdev/mdev_start | mdev_stop
> > > >
> > > > doesn't require the knowledge of parent device.
> > > >  
> > >
> > > Qemu is just one example, where your explanation of parent device
> > > makes sense but still it's not good for Qemu to populate /sys/class/mdev
> > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > instance, so any mdev attributes touched by Qemu should be put under
> > > that node (e.g. start/stop for live migration usage as I explained 
> > > earlier).  
> > 
> > Exactly, qemu is passed with the actual sysfs path.
> > 
> > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at 
> > all.
> > 
> > QEMU will take the sysfs path as input:
> > 
> >  -device
> > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > d=vgpu0  
> 
> no need of passing "id=vgpu0" here. If necessary you can put id as an 
> attribute 
> under sysfs mdev node:
> 
> /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id

QEMU needs an id parameter for devices, libvirt gives devices arbitrary
names, typically hostdev# for assigned devices.  This id is used to
reference the device for hmp/qmp commands.  This is not something the
mdev infrastructure should define.  Thanks,

Alex



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Neo Jia
On Tue, Aug 16, 2016 at 05:58:54AM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, August 16, 2016 1:44 PM
> > 
> > On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, August 16, 2016 12:17 PM
> > > >
> > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:
> > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > > > >
> > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > > > > >
> > > > > > > > > >
> > > > > > > > > > For NVIDIA vGPU solution we need to know all devices 
> > > > > > > > > > assigned to a VM
> > in
> > > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM 
> > > > > > > > > > along with
> > > > > > > > > > some common resources.
> > > > > > > > >
> > > > > > > > > Kirti, can you elaborate the background about above one-shot 
> > > > > > > > > commit
> > > > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > > > >
> > > > > > > > > As I relied in another mail, I really hope start/stop become 
> > > > > > > > > a per-mdev
> > > > > > > > > attribute instead of global one, e.g.:
> > > > > > > > >
> > > > > > > > > echo "0/1" >
> > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > > > > > >
> > > > > > > > > In many scenario the user space client may only want to talk 
> > > > > > > > > to mdev
> > > > > > > > > instance directly, w/o need to contact its parent device. 
> > > > > > > > > Still take
> > > > > > > > > live migration for example, I don't think Qemu wants to know 
> > > > > > > > > parent
> > > > > > > > > device of assigned mdev instances.
> > > > > > > >
> > > > > > > > Hi Kevin,
> > > > > > > >
> > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require 
> > > > > > > > anybody to
> > know
> > > > > > > > parent device. you can just do
> > > > > > > >
> > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > > > >
> > > > > > > > or
> > > > > > > >
> > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > > > >
> > > > > > > > without knowing the parent device.
> > > > > > > >
> > > > > > >
> > > > > > > You can look at some existing sysfs example, e.g.:
> > > > > > >
> > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > > > >
> > > > > > > You may also argue why not using a global style:
> > > > > > >
> > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > > > >
> > > > > > > There are many similar examples...
> > > > > >
> > > > > > Hi Kevin,
> > > > > >
> > > > > > My response above is to your question about using the global sysfs 
> > > > > > entry as you
> > > > > > don't want to have the global path because
> > > > > >
> > > > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > > > instances.".
> > > > > >
> > > > > > So I just want to confirm with you that (in case you miss):
> > > > > >
> > > > > > /sys/class/mdev/mdev_start | mdev_stop
> > > > > >
> > > > > > doesn't require the knowledge of parent device.
> > > > > >
> > > > >
> > > > > Qemu is just one example, where your explanation of parent device
> > > > > makes sense but still it's not good for Qemu to populate 
> > > > > /sys/class/mdev
> > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > > > instance, so any mdev attributes touched by Qemu should be put under
> > > > > that node (e.g. start/stop for live migration usage as I explained 
> > > > > earlier).
> > > >
> > > > Exactly, qemu is passed with the actual sysfs path.
> > > >
> > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop 
> > > > at all.
> > > >
> > > > QEMU will take the sysfs path as input:
> > > >
> > > >  -device
> > > >
> > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > > > d=vgpu0
> > >
> > > no need of passing "id=vgpu0" here. If necessary you can put id as an 
> > > attribute
> > > under sysfs mdev node:
> > >
> > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id
> > 
> > I think we have moved away from the device index based on Alex's comment, 
> > so the
> > device path will be:
> > 
> >  /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> 
> pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
> as parameter, and then Qemu can access 'id' under that path. You
> don't need to pass a separate 'id' field. That's my point.
> 
> 
> > 
> > >
> > > >
> > > > As you are saying in live migration, QEMU needs to access "start" and 
> > > > "stop".  Could
> > you
> > > > please share more details, such as how QEMU access the "start" and 
> > > > "stop" sysfs,
> > > > when and what 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-16 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, August 16, 2016 1:44 PM
> 
> On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, August 16, 2016 12:17 PM
> > >
> > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:
> > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > > >
> > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > > > >
> > > > > > > > >
> > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned 
> > > > > > > > > to a VM
> in
> > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM 
> > > > > > > > > along with
> > > > > > > > > some common resources.
> > > > > > > >
> > > > > > > > Kirti, can you elaborate the background about above one-shot 
> > > > > > > > commit
> > > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > > >
> > > > > > > > As I relied in another mail, I really hope start/stop become a 
> > > > > > > > per-mdev
> > > > > > > > attribute instead of global one, e.g.:
> > > > > > > >
> > > > > > > > echo "0/1" >
> > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > > > > >
> > > > > > > > In many scenario the user space client may only want to talk to 
> > > > > > > > mdev
> > > > > > > > instance directly, w/o need to contact its parent device. Still 
> > > > > > > > take
> > > > > > > > live migration for example, I don't think Qemu wants to know 
> > > > > > > > parent
> > > > > > > > device of assigned mdev instances.
> > > > > > >
> > > > > > > Hi Kevin,
> > > > > > >
> > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require 
> > > > > > > anybody to
> know
> > > > > > > parent device. you can just do
> > > > > > >
> > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > > >
> > > > > > > without knowing the parent device.
> > > > > > >
> > > > > >
> > > > > > You can look at some existing sysfs example, e.g.:
> > > > > >
> > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > > >
> > > > > > You may also argue why not using a global style:
> > > > > >
> > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > > >
> > > > > > There are many similar examples...
> > > > >
> > > > > Hi Kevin,
> > > > >
> > > > > My response above is to your question about using the global sysfs 
> > > > > entry as you
> > > > > don't want to have the global path because
> > > > >
> > > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > > instances.".
> > > > >
> > > > > So I just want to confirm with you that (in case you miss):
> > > > >
> > > > > /sys/class/mdev/mdev_start | mdev_stop
> > > > >
> > > > > doesn't require the knowledge of parent device.
> > > > >
> > > >
> > > > Qemu is just one example, where your explanation of parent device
> > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev
> > > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > > instance, so any mdev attributes touched by Qemu should be put under
> > > > that node (e.g. start/stop for live migration usage as I explained 
> > > > earlier).
> > >
> > > Exactly, qemu is passed with the actual sysfs path.
> > >
> > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at 
> > > all.
> > >
> > > QEMU will take the sysfs path as input:
> > >
> > >  -device
> > >
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > > d=vgpu0
> >
> > no need of passing "id=vgpu0" here. If necessary you can put id as an 
> > attribute
> > under sysfs mdev node:
> >
> > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id
> 
> I think we have moved away from the device index based on Alex's comment, so 
> the
> device path will be:
> 
>  /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818

pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818
as parameter, and then Qemu can access 'id' under that path. You
don't need to pass a separate 'id' field. That's my point.


> 
> >
> > >
> > > As you are saying in live migration, QEMU needs to access "start" and 
> > > "stop".  Could
> you
> > > please share more details, such as how QEMU access the "start" and "stop" 
> > > sysfs,
> > > when and what triggers that?
> > >
> >
> > A conceptual flow as below:
> >
> > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait 
> > for
> > in-flight DMA completed, etc.)
> >
> > echo "0" > 
> > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start
> >
> > 2. Save mdev state:
> >
> > cat 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, August 16, 2016 12:17 PM
> > 
> > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, August 16, 2016 11:46 AM
> > > >
> > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > > >
> > > > > > > >
> > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned 
> > > > > > > > to a VM in
> > > > > > > > one shot to commit resources of all vGPUs assigned to a VM 
> > > > > > > > along with
> > > > > > > > some common resources.
> > > > > > >
> > > > > > > Kirti, can you elaborate the background about above one-shot 
> > > > > > > commit
> > > > > > > requirement? It's hard to understand such a requirement.
> > > > > > >
> > > > > > > As I relied in another mail, I really hope start/stop become a 
> > > > > > > per-mdev
> > > > > > > attribute instead of global one, e.g.:
> > > > > > >
> > > > > > > echo "0/1" >
> > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > > > >
> > > > > > > In many scenario the user space client may only want to talk to 
> > > > > > > mdev
> > > > > > > instance directly, w/o need to contact its parent device. Still 
> > > > > > > take
> > > > > > > live migration for example, I don't think Qemu wants to know 
> > > > > > > parent
> > > > > > > device of assigned mdev instances.
> > > > > >
> > > > > > Hi Kevin,
> > > > > >
> > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody 
> > > > > > to know
> > > > > > parent device. you can just do
> > > > > >
> > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > > >
> > > > > > or
> > > > > >
> > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > > >
> > > > > > without knowing the parent device.
> > > > > >
> > > > >
> > > > > You can look at some existing sysfs example, e.g.:
> > > > >
> > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > > >
> > > > > You may also argue why not using a global style:
> > > > >
> > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > > >
> > > > > There are many similar examples...
> > > >
> > > > Hi Kevin,
> > > >
> > > > My response above is to your question about using the global sysfs 
> > > > entry as you
> > > > don't want to have the global path because
> > > >
> > > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > > instances.".
> > > >
> > > > So I just want to confirm with you that (in case you miss):
> > > >
> > > > /sys/class/mdev/mdev_start | mdev_stop
> > > >
> > > > doesn't require the knowledge of parent device.
> > > >
> > >
> > > Qemu is just one example, where your explanation of parent device
> > > makes sense but still it's not good for Qemu to populate /sys/class/mdev
> > > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > > instance, so any mdev attributes touched by Qemu should be put under
> > > that node (e.g. start/stop for live migration usage as I explained 
> > > earlier).
> > 
> > Exactly, qemu is passed with the actual sysfs path.
> > 
> > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at 
> > all.
> > 
> > QEMU will take the sysfs path as input:
> > 
> >  -device
> > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> > d=vgpu0
> 
> no need of passing "id=vgpu0" here. If necessary you can put id as an 
> attribute 
> under sysfs mdev node:
> 
> /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id

I think we have moved away from the device index based on Alex's comment, so the
device path will be:

 /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818

> 
> > 
> > As you are saying in live migration, QEMU needs to access "start" and 
> > "stop".  Could you
> > please share more details, such as how QEMU access the "start" and "stop" 
> > sysfs,
> > when and what triggers that?
> > 
> 
> A conceptual flow as below:
> 
> 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait 
> for
> in-flight DMA completed, etc.)
> 
> echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start
> 
> 2. Save mdev state:
> 
> cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx
> 
> 3. xxx will be part of the final VM image and copied to a new machine
> 
> 4. Allocate/prepare mdev on the new machine for this VM
> 
> 5. Restore mdev state:
> 
> cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state
> (might be a different path name)
> 
> 6. start mdev on the new parent device:
> 
> echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start

Thanks for the sequence, so based on above live migration, the access 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, August 16, 2016 12:17 PM
> 
> On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, August 16, 2016 11:46 AM
> > >
> > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > >
> > > > > > >
> > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to 
> > > > > > > a VM in
> > > > > > > one shot to commit resources of all vGPUs assigned to a VM along 
> > > > > > > with
> > > > > > > some common resources.
> > > > > >
> > > > > > Kirti, can you elaborate the background about above one-shot commit
> > > > > > requirement? It's hard to understand such a requirement.
> > > > > >
> > > > > > As I relied in another mail, I really hope start/stop become a 
> > > > > > per-mdev
> > > > > > attribute instead of global one, e.g.:
> > > > > >
> > > > > > echo "0/1" >
> /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > > >
> > > > > > In many scenario the user space client may only want to talk to mdev
> > > > > > instance directly, w/o need to contact its parent device. Still take
> > > > > > live migration for example, I don't think Qemu wants to know parent
> > > > > > device of assigned mdev instances.
> > > > >
> > > > > Hi Kevin,
> > > > >
> > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to 
> > > > > know
> > > > > parent device. you can just do
> > > > >
> > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > > >
> > > > > or
> > > > >
> > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > > >
> > > > > without knowing the parent device.
> > > > >
> > > >
> > > > You can look at some existing sysfs example, e.g.:
> > > >
> > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > > >
> > > > You may also argue why not using a global style:
> > > >
> > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > > >
> > > > There are many similar examples...
> > >
> > > Hi Kevin,
> > >
> > > My response above is to your question about using the global sysfs entry 
> > > as you
> > > don't want to have the global path because
> > >
> > > "I don't think Qemu wants to know parent device of assigned mdev 
> > > instances.".
> > >
> > > So I just want to confirm with you that (in case you miss):
> > >
> > > /sys/class/mdev/mdev_start | mdev_stop
> > >
> > > doesn't require the knowledge of parent device.
> > >
> >
> > Qemu is just one example, where your explanation of parent device
> > makes sense but still it's not good for Qemu to populate /sys/class/mdev
> > directly. Qemu is passed with the actual sysfs path of assigned mdev
> > instance, so any mdev attributes touched by Qemu should be put under
> > that node (e.g. start/stop for live migration usage as I explained earlier).
> 
> Exactly, qemu is passed with the actual sysfs path.
> 
> So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all.
> 
> QEMU will take the sysfs path as input:
> 
>  -device
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i
> d=vgpu0

no need of passing "id=vgpu0" here. If necessary you can put id as an attribute 
under sysfs mdev node:

/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id

> 
> As you are saying in live migration, QEMU needs to access "start" and "stop". 
>  Could you
> please share more details, such as how QEMU access the "start" and "stop" 
> sysfs,
> when and what triggers that?
> 

A conceptual flow as below:

1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait for
in-flight DMA completed, etc.)

echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start

2. Save mdev state:

cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx

3. xxx will be part of the final VM image and copied to a new machine

4. Allocate/prepare mdev on the new machine for this VM

5. Restore mdev state:

cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state
(might be a different path name)

6. start mdev on the new parent device:

echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start

Thanks
Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, August 16, 2016 11:46 AM
> > 
> > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, August 16, 2016 3:59 AM
> > 
> > > > > >
> > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a 
> > > > > > VM in
> > > > > > one shot to commit resources of all vGPUs assigned to a VM along 
> > > > > > with
> > > > > > some common resources.
> > > > >
> > > > > Kirti, can you elaborate the background about above one-shot commit
> > > > > requirement? It's hard to understand such a requirement.
> > > > >
> > > > > As I relied in another mail, I really hope start/stop become a 
> > > > > per-mdev
> > > > > attribute instead of global one, e.g.:
> > > > >
> > > > > echo "0/1" > 
> > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > > >
> > > > > In many scenario the user space client may only want to talk to mdev
> > > > > instance directly, w/o need to contact its parent device. Still take
> > > > > live migration for example, I don't think Qemu wants to know parent
> > > > > device of assigned mdev instances.
> > > >
> > > > Hi Kevin,
> > > >
> > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to 
> > > > know
> > > > parent device. you can just do
> > > >
> > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > > >
> > > > or
> > > >
> > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > > >
> > > > without knowing the parent device.
> > > >
> > >
> > > You can look at some existing sysfs example, e.g.:
> > >
> > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> > >
> > > You may also argue why not using a global style:
> > >
> > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> > >
> > > There are many similar examples...
> > 
> > Hi Kevin,
> > 
> > My response above is to your question about using the global sysfs entry as 
> > you
> > don't want to have the global path because
> > 
> > "I don't think Qemu wants to know parent device of assigned mdev 
> > instances.".
> > 
> > So I just want to confirm with you that (in case you miss):
> > 
> > /sys/class/mdev/mdev_start | mdev_stop
> > 
> > doesn't require the knowledge of parent device.
> > 
> 
> Qemu is just one example, where your explanation of parent device
> makes sense but still it's not good for Qemu to populate /sys/class/mdev
> directly. Qemu is passed with the actual sysfs path of assigned mdev
> instance, so any mdev attributes touched by Qemu should be put under 
> that node (e.g. start/stop for live migration usage as I explained earlier).

Exactly, qemu is passed with the actual sysfs path.

So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all.

QEMU will take the sysfs path as input:

 -device 
vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,id=vgpu0

As you are saying in live migration, QEMU needs to access "start" and "stop".  
Could you 
please share more details, such as how QEMU access the "start" and "stop" sysfs,
when and what triggers that?

Thanks,
Neo

> 

> Thanks
> Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, August 16, 2016 11:46 AM
> 
> On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, August 16, 2016 3:59 AM
> 
> > > > >
> > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM 
> > > > > in
> > > > > one shot to commit resources of all vGPUs assigned to a VM along with
> > > > > some common resources.
> > > >
> > > > Kirti, can you elaborate the background about above one-shot commit
> > > > requirement? It's hard to understand such a requirement.
> > > >
> > > > As I relied in another mail, I really hope start/stop become a per-mdev
> > > > attribute instead of global one, e.g.:
> > > >
> > > > echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > > >
> > > > In many scenario the user space client may only want to talk to mdev
> > > > instance directly, w/o need to contact its parent device. Still take
> > > > live migration for example, I don't think Qemu wants to know parent
> > > > device of assigned mdev instances.
> > >
> > > Hi Kevin,
> > >
> > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to know
> > > parent device. you can just do
> > >
> > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > >
> > > or
> > >
> > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > >
> > > without knowing the parent device.
> > >
> >
> > You can look at some existing sysfs example, e.g.:
> >
> > echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> >
> > You may also argue why not using a global style:
> >
> > echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> >
> > There are many similar examples...
> 
> Hi Kevin,
> 
> My response above is to your question about using the global sysfs entry as 
> you
> don't want to have the global path because
> 
> "I don't think Qemu wants to know parent device of assigned mdev instances.".
> 
> So I just want to confirm with you that (in case you miss):
> 
> /sys/class/mdev/mdev_start | mdev_stop
> 
> doesn't require the knowledge of parent device.
> 

Qemu is just one example, where your explanation of parent device
makes sense but still it's not good for Qemu to populate /sys/class/mdev
directly. Qemu is passed with the actual sysfs path of assigned mdev
instance, so any mdev attributes touched by Qemu should be put under 
that node (e.g. start/stop for live migration usage as I explained earlier).

Thanks
Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, August 16, 2016 3:59 AM

> > > >
> > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in
> > > > one shot to commit resources of all vGPUs assigned to a VM along with
> > > > some common resources.
> > >
> > > Kirti, can you elaborate the background about above one-shot commit
> > > requirement? It's hard to understand such a requirement.
> > >
> > > As I relied in another mail, I really hope start/stop become a per-mdev
> > > attribute instead of global one, e.g.:
> > >
> > > echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start
> > >
> > > In many scenario the user space client may only want to talk to mdev
> > > instance directly, w/o need to contact its parent device. Still take
> > > live migration for example, I don't think Qemu wants to know parent
> > > device of assigned mdev instances.
> > 
> > Hi Kevin,
> > 
> > Having a global /sys/class/mdev/mdev_start doesn't require anybody to know
> > parent device. you can just do
> > 
> > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > 
> > or
> > 
> > echo "mdev_UUID" > /sys/class/mdev/mdev_stop
> > 
> > without knowing the parent device.
> > 
> 
> You can look at some existing sysfs example, e.g.:
> 
> echo "0/1" > /sys/bus/cpu/devices/cpu1/online
> 
> You may also argue why not using a global style:
> 
> echo "cpu1" > /sys/bus/cpu/devices/cpu_online
> echo "cpu1" > /sys/bus/cpu/devices/cpu_offline
> 
> There are many similar examples...

Hi Kevin,

My response above is to your question about using the global sysfs entry as you
don't want to have the global path because

"I don't think Qemu wants to know parent device of assigned mdev instances.".

So I just want to confirm with you that (in case you miss):

/sys/class/mdev/mdev_start | mdev_stop 

doesn't require the knowledge of parent device.

Thanks,
Neo

> 
> Thanks
> Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, August 16, 2016 7:24 AM
> 
> > > >
> > > > > >
> > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a 
> > > > > > VM in
> > > > > > one shot to commit resources of all vGPUs assigned to a VM along 
> > > > > > with
> > > > > > some common resources.
> > > > >
> > > > > Kirti, can you elaborate the background about above one-shot commit
> > > > > requirement? It's hard to understand such a requirement.
> > > >
> > > > Agree, I know NVIDIA isn't planning to support hotplug initially, but
> > > > this seems like we're precluding hotplug from the design.  I don't
> > > > understand what's driving this one-shot requirement.
> > >
> > > Hi Alex,
> > >
> > > The requirement here is based on how our internal vGPU device model 
> > > designed and
> > > with this we are able to pre-allocate resources required for multiple 
> > > virtual
> > > devices within same domain.
> > >
> > > And I don't think this syntax will stop us from supporting hotplug at all.
> > >
> > > For example, you can always create a virtual mdev and then do
> > >
> > > echo "mdev_UUID" > /sys/class/mdev/mdev_start
> > >
> > > then use QEMU monitor to add the device for hotplug.
> >
> > Hi Neo,
> >
> > I'm still not understanding the advantage you get from the "one-shot"
> > approach then if we can always add more mdevs by starting them later.
> > Are the hotplug mdevs somehow less capable than the initial set of
> > mdevs added in a single shot?  If the initial set is allocated
> > from the "same domain", does that give them some sort of hardware
> > locality/resource benefit?  Thanks,
> 
> Hi Alex,
> 
> At least we will not able to guarantee some special hardware resource for the
> hotplug devices.
> 
> So from our point of view, we also have dedicated internal SW entity to 
> manage all
> virtual devices for each "domain/virtual machine", and such SW entity will be 
> created
> at virtual device start time.
> 
> This is why we need to do this in one-shot to support multiple virtual device
> per VM case.
> 

Is pre-allocation of special hardware resource done one-time for all mdev 
instances?
Can it be done one-by-one as long as mdev is started early before VM is 
launched?

If such one-shot requirement is really required, it would be cleaner to me to
introduce a mdev group concept, so mdev instances with one-short start 
requirements can be put under a mdev group. Then you can do one-shot start
by:

echo "0/1" > /sys/class/mdev/group/0/start

Thanks
Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, August 16, 2016 3:59 AM
> 
> On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote:
> > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > Sent: Saturday, August 13, 2016 8:37 AM
> > >
> > >
> > >
> > > On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > Kirti Wankhede  wrote:
> > > >
> > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > >>> Kirti Wankhede  wrote:
> > > >>>
> > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > >>> get
> > > >>> the parent_device so it can call the start and stop ops callbacks
> > > >>> respectively.  That seems to imply that all of instances for a given
> > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > >>> still having a hard time buying into the uuid+instance plan when it
> > > >>> seems like each mdev_device should have an actual unique uuid.
> > > >>> Userspace tools can figure out which uuids to start for a given user, 
> > > >>> I
> > > >>> don't see much value in collecting them to instances within a uuid.
> > > >>>
> > > >>
> > > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > > >> instance was introduced to support multiple devices in a VM.
> > > >
> > > > The instance number was never required in order to support multiple
> > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > devices with that same UUID and therefore associate udev events to a
> > > > given VM.  Only then does an instance number become necessary since the
> > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > like a very dodgy solution when we should probably just be querying
> > > > libvirt to give us a device to VM association.
> >
> > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > tightly.
> >
> > I'm OK to give enough flexibility for various upper level management stacks,
> > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > option where either UUID or STRING could be optional? Upper management
> > stack can choose its own policy to identify a mdev:
> >
> > a) $UUID only, so each mdev is allocated with a unique UUID
> > b) STRING only, which could be an index (0, 1, 2, ...), or any combination
> > (vgpu0, vgpu1, etc.)
> > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > a numeric index
> >
> > > >
> > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > > >> all instances of similar devices assigned to VM.
> > > >>
> > > >> For example, to create 2 devices:
> > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > >>
> > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > >>
> > > >> Commit resources for above devices with single 'mdev_start':
> > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > >> 'instance', so 'mdev_create' would look like:
> > > >>
> > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > >>
> > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > > >> would be vendor specific parameters.
> > > >>
> > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > >>
> > > >> Then 'mdev_start' would be:
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > >>
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> > > >
> > > > I'm not sure a comma separated list makes sense here, for both
> > > > simplicity in the kernel and more fine grained error reporting, we
> > > > probably want to start/stop them individually.  Actually, why is it
> > > > that we can't use the mediated device being opened and released to
> > > > automatically signal to the backend vendor driver to commit and release
> > > > resources? I don't fully understand why userspace needs this interface.
> >
> > There is a meaningful use of start/stop interface, as required in live
> > migration support. Such interface allows vendor driver to quiescent
> > mdev activity on source device before mdev hardware state is snapshot,
> > and then resume mdev activity on dest device after its state is recovered.
> > Intel has implemented experimental live migration support in KVMGT (soon
> > to release), based on above two interfaces (plus another two to get/set
> > mdev state).
> >
> > > >
> > >
> > > For NVIDIA vGPU solution we need to know all 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, August 16, 2016 6:48 AM
> 
> On Mon, 15 Aug 2016 12:59:08 -0700
> Neo Jia  wrote:
> 
> > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote:
> > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > >
> > > >
> > > >
> > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > Kirti Wankhede  wrote:
> > > > >
> > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > >>> Kirti Wankhede  wrote:
> > > > >>>
> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > > >>> get
> > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > >>> respectively.  That seems to imply that all of instances for a given
> > > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > > >>> still having a hard time buying into the uuid+instance plan when it
> > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > >>> Userspace tools can figure out which uuids to start for a given 
> > > > >>> user, I
> > > > >>> don't see much value in collecting them to instances within a uuid.
> > > > >>>
> > > > >>
> > > > >> Initially we started discussion with VM_UUID+instance suggestion, 
> > > > >> where
> > > > >> instance was introduced to support multiple devices in a VM.
> > > > >
> > > > > The instance number was never required in order to support multiple
> > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > devices with that same UUID and therefore associate udev events to a
> > > > > given VM.  Only then does an instance number become necessary since 
> > > > > the
> > > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > > like a very dodgy solution when we should probably just be querying
> > > > > libvirt to give us a device to VM association.
> > >
> > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > > tightly.
> > >
> > > I'm OK to give enough flexibility for various upper level management 
> > > stacks,
> > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > option where either UUID or STRING could be optional? Upper management
> > > stack can choose its own policy to identify a mdev:
> > >
> > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination
> > > (vgpu0, vgpu1, etc.)
> > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > a numeric index
> > >
> > > > >
> > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources 
> > > > >> of
> > > > >> all instances of similar devices assigned to VM.
> > > > >>
> > > > >> For example, to create 2 devices:
> > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > >>
> > > > >> Commit resources for above devices with single 'mdev_start':
> > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > >> 'instance', so 'mdev_create' would look like:
> > > > >>
> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 
> > > > >> 'params'
> > > > >> would be vendor specific parameters.
> > > > >>
> > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > >>
> > > > >> Then 'mdev_start' would be:
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > >>
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> > > > >
> > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > that we can't use the mediated device being opened and released to
> > > > > automatically signal to the backend vendor driver to commit and 
> > > > > release
> > > > > resources? I don't fully understand why userspace needs this 
> > > > > interface.
> > >
> > > There is a meaningful use of start/stop interface, as required in live
> > > migration support. Such interface allows vendor driver to quiescent
> > > mdev activity on source device 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote:
> On Mon, 15 Aug 2016 12:59:08 -0700
> Neo Jia  wrote:
> 
> > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote:
> > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > > 
> > > > 
> > > > 
> > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:  
> > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > Kirti Wankhede  wrote:
> > > > >  
> > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:  
> > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > >>> Kirti Wankhede  wrote:
> > > > >>>
> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > > >>> get
> > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > >>> respectively.  That seems to imply that all of instances for a given
> > > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > > >>> still having a hard time buying into the uuid+instance plan when it
> > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > >>> Userspace tools can figure out which uuids to start for a given 
> > > > >>> user, I
> > > > >>> don't see much value in collecting them to instances within a uuid.
> > > > >>>  
> > > > >>
> > > > >> Initially we started discussion with VM_UUID+instance suggestion, 
> > > > >> where
> > > > >> instance was introduced to support multiple devices in a VM.  
> > > > >
> > > > > The instance number was never required in order to support multiple
> > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > devices with that same UUID and therefore associate udev events to a
> > > > > given VM.  Only then does an instance number become necessary since 
> > > > > the
> > > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > > like a very dodgy solution when we should probably just be querying
> > > > > libvirt to give us a device to VM association.  
> > > 
> > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > > tightly.
> > > 
> > > I'm OK to give enough flexibility for various upper level management 
> > > stacks,
> > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > option where either UUID or STRING could be optional? Upper management 
> > > stack can choose its own policy to identify a mdev:
> > > 
> > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > b) STRING only, which could be an index (0, 1, 2, ...), or any 
> > > combination 
> > > (vgpu0, vgpu1, etc.)
> > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > a numeric index
> > >   
> > > > >  
> > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources 
> > > > >> of
> > > > >> all instances of similar devices assigned to VM.
> > > > >>
> > > > >> For example, to create 2 devices:
> > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > >>
> > > > >> Commit resources for above devices with single 'mdev_start':
> > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > >> 'instance', so 'mdev_create' would look like:
> > > > >>
> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 
> > > > >> 'params'
> > > > >> would be vendor specific parameters.
> > > > >>
> > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > >>
> > > > >> Then 'mdev_start' would be:
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > >>
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop  
> > > > >
> > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > that we can't use the mediated device being opened and released to
> > > > > automatically signal to the backend vendor driver to commit and 
> > > > > release
> > > > > resources? I don't fully understand why userspace needs this 
> > > > > interface.  
> > > 
> > > There is a meaningful use of start/stop interface, as required in live
> > > migration support. Such interface allows vendor driver to quiescent 
> > > mdev activity on source device 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Mon, Aug 15, 2016 at 04:52:39PM -0600, Alex Williamson wrote:
> On Mon, 15 Aug 2016 15:09:30 -0700
> Neo Jia  wrote:
> 
> > On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote:
> > > On Mon, 15 Aug 2016 09:38:52 +
> > > "Tian, Kevin"  wrote:
> > >   
> > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > > > 
> > > > > 
> > > > > 
> > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > > Kirti Wankhede  wrote:
> > > > > >
> > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > > >>> Kirti Wankhede  wrote:
> > > > > >>>
> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() 
> > > > > >>> to get
> > > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > > >>> respectively.  That seems to imply that all of instances for a 
> > > > > >>> given
> > > > > >>> uuid come from the same parent_device.  Where is that enforced?  
> > > > > >>> I'm
> > > > > >>> still having a hard time buying into the uuid+instance plan when 
> > > > > >>> it
> > > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > > >>> Userspace tools can figure out which uuids to start for a given 
> > > > > >>> user, I
> > > > > >>> don't see much value in collecting them to instances within a 
> > > > > >>> uuid.
> > > > > >>>
> > > > > >>
> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, 
> > > > > >> where
> > > > > >> instance was introduced to support multiple devices in a VM.
> > > > > >
> > > > > > The instance number was never required in order to support multiple
> > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease 
> > > > > > NVIDIA
> > > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > > devices with that same UUID and therefore associate udev events to a
> > > > > > given VM.  Only then does an instance number become necessary since 
> > > > > > the
> > > > > > UUID needs to be static for a vGPUs within a VM.  This has always 
> > > > > > felt
> > > > > > like a very dodgy solution when we should probably just be querying
> > > > > > libvirt to give us a device to VM association.
> > > > 
> > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > > > tightly.
> > > > 
> > > > I'm OK to give enough flexibility for various upper level management 
> > > > stacks,
> > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > > option where either UUID or STRING could be optional? Upper management 
> > > > stack can choose its own policy to identify a mdev:
> > > > 
> > > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > > b) STRING only, which could be an index (0, 1, 2, ...), or any 
> > > > combination 
> > > > (vgpu0, vgpu1, etc.)
> > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > > a numeric index
> > > >   
> > > > > >
> > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit 
> > > > > >> resources of
> > > > > >> all instances of similar devices assigned to VM.
> > > > > >>
> > > > > >> For example, to create 2 devices:
> > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > > >>
> > > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > > >>
> > > > > >> Commit resources for above devices with single 'mdev_start':
> > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > > >>
> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > > >> 'instance', so 'mdev_create' would look like:
> > > > > >>
> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > > >>
> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 
> > > > > >> 'params'
> > > > > >> would be vendor specific parameters.
> > > > > >>
> > > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > > >>
> > > > > >> Then 'mdev_start' would be:
> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > > >>
> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > > >>
> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> > > > > >
> > > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > > that we can't use the mediated device being opened and 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Alex Williamson
On Mon, 15 Aug 2016 15:09:30 -0700
Neo Jia  wrote:

> On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote:
> > On Mon, 15 Aug 2016 09:38:52 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > > 
> > > > 
> > > > 
> > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > Kirti Wankhede  wrote:
> > > > >
> > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > >>> Kirti Wankhede  wrote:
> > > > >>>
> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > > >>> get
> > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > >>> respectively.  That seems to imply that all of instances for a given
> > > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > > >>> still having a hard time buying into the uuid+instance plan when it
> > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > >>> Userspace tools can figure out which uuids to start for a given 
> > > > >>> user, I
> > > > >>> don't see much value in collecting them to instances within a uuid.
> > > > >>>
> > > > >>
> > > > >> Initially we started discussion with VM_UUID+instance suggestion, 
> > > > >> where
> > > > >> instance was introduced to support multiple devices in a VM.
> > > > >
> > > > > The instance number was never required in order to support multiple
> > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > devices with that same UUID and therefore associate udev events to a
> > > > > given VM.  Only then does an instance number become necessary since 
> > > > > the
> > > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > > like a very dodgy solution when we should probably just be querying
> > > > > libvirt to give us a device to VM association.
> > > 
> > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > > tightly.
> > > 
> > > I'm OK to give enough flexibility for various upper level management 
> > > stacks,
> > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > option where either UUID or STRING could be optional? Upper management 
> > > stack can choose its own policy to identify a mdev:
> > > 
> > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > b) STRING only, which could be an index (0, 1, 2, ...), or any 
> > > combination 
> > > (vgpu0, vgpu1, etc.)
> > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > a numeric index
> > >   
> > > > >
> > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources 
> > > > >> of
> > > > >> all instances of similar devices assigned to VM.
> > > > >>
> > > > >> For example, to create 2 devices:
> > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > >>
> > > > >> Commit resources for above devices with single 'mdev_start':
> > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > >> 'instance', so 'mdev_create' would look like:
> > > > >>
> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 
> > > > >> 'params'
> > > > >> would be vendor specific parameters.
> > > > >>
> > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > >>
> > > > >> Then 'mdev_start' would be:
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > >>
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> > > > >
> > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > that we can't use the mediated device being opened and released to
> > > > > automatically signal to the backend vendor driver to commit and 
> > > > > release
> > > > > resources? I don't fully understand why userspace needs this 
> > > > > interface.
> > > 
> > > There is a meaningful use of start/stop interface, as required in live
> > > migration support. Such interface allows vendor driver to 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Alex Williamson
On Mon, 15 Aug 2016 12:59:08 -0700
Neo Jia  wrote:

> On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote:
> > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > Sent: Saturday, August 13, 2016 8:37 AM
> > > 
> > > 
> > > 
> > > On 8/13/2016 2:46 AM, Alex Williamson wrote:  
> > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > Kirti Wankhede  wrote:
> > > >  
> > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:  
> > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > >>> Kirti Wankhede  wrote:
> > > >>>
> > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > >>> get
> > > >>> the parent_device so it can call the start and stop ops callbacks
> > > >>> respectively.  That seems to imply that all of instances for a given
> > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > >>> still having a hard time buying into the uuid+instance plan when it
> > > >>> seems like each mdev_device should have an actual unique uuid.
> > > >>> Userspace tools can figure out which uuids to start for a given user, 
> > > >>> I
> > > >>> don't see much value in collecting them to instances within a uuid.
> > > >>>  
> > > >>
> > > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > > >> instance was introduced to support multiple devices in a VM.  
> > > >
> > > > The instance number was never required in order to support multiple
> > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > devices with that same UUID and therefore associate udev events to a
> > > > given VM.  Only then does an instance number become necessary since the
> > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > like a very dodgy solution when we should probably just be querying
> > > > libvirt to give us a device to VM association.  
> > 
> > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > tightly.
> > 
> > I'm OK to give enough flexibility for various upper level management stacks,
> > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > option where either UUID or STRING could be optional? Upper management 
> > stack can choose its own policy to identify a mdev:
> > 
> > a) $UUID only, so each mdev is allocated with a unique UUID
> > b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
> > (vgpu0, vgpu1, etc.)
> > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > a numeric index
> >   
> > > >  
> > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > > >> all instances of similar devices assigned to VM.
> > > >>
> > > >> For example, to create 2 devices:
> > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > >>
> > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > >>
> > > >> Commit resources for above devices with single 'mdev_start':
> > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > >> 'instance', so 'mdev_create' would look like:
> > > >>
> > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > >>
> > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > > >> would be vendor specific parameters.
> > > >>
> > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > >>
> > > >> Then 'mdev_start' would be:
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > >>
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop  
> > > >
> > > > I'm not sure a comma separated list makes sense here, for both
> > > > simplicity in the kernel and more fine grained error reporting, we
> > > > probably want to start/stop them individually.  Actually, why is it
> > > > that we can't use the mediated device being opened and released to
> > > > automatically signal to the backend vendor driver to commit and release
> > > > resources? I don't fully understand why userspace needs this interface. 
> > > >  
> > 
> > There is a meaningful use of start/stop interface, as required in live
> > migration support. Such interface allows vendor driver to quiescent 
> > mdev activity on source device before mdev hardware state is snapshot,
> > and then resume mdev activity on dest device after its state is recovered.
> > Intel has implemented experimental live migration support in KVMGT (soon
> > to release), based on above two interfaces (plus another two to get/set
> > mdev state).
> >   
> > > >  
> > > 
> > > For NVIDIA vGPU 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote:
> On Mon, 15 Aug 2016 09:38:52 +
> "Tian, Kevin"  wrote:
> 
> > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > > Sent: Saturday, August 13, 2016 8:37 AM
> > > 
> > > 
> > > 
> > > On 8/13/2016 2:46 AM, Alex Williamson wrote:  
> > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > Kirti Wankhede  wrote:
> > > >  
> > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:  
> > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > >>> Kirti Wankhede  wrote:
> > > >>>
> > > >>> This is used later by mdev_device_start() and mdev_device_stop() to 
> > > >>> get
> > > >>> the parent_device so it can call the start and stop ops callbacks
> > > >>> respectively.  That seems to imply that all of instances for a given
> > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > >>> still having a hard time buying into the uuid+instance plan when it
> > > >>> seems like each mdev_device should have an actual unique uuid.
> > > >>> Userspace tools can figure out which uuids to start for a given user, 
> > > >>> I
> > > >>> don't see much value in collecting them to instances within a uuid.
> > > >>>  
> > > >>
> > > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > > >> instance was introduced to support multiple devices in a VM.  
> > > >
> > > > The instance number was never required in order to support multiple
> > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > devices with that same UUID and therefore associate udev events to a
> > > > given VM.  Only then does an instance number become necessary since the
> > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > like a very dodgy solution when we should probably just be querying
> > > > libvirt to give us a device to VM association.  
> > 
> > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > for mdev in the basic design. It's bound to NVIDIA management stack too 
> > tightly.
> > 
> > I'm OK to give enough flexibility for various upper level management stacks,
> > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > option where either UUID or STRING could be optional? Upper management 
> > stack can choose its own policy to identify a mdev:
> > 
> > a) $UUID only, so each mdev is allocated with a unique UUID
> > b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
> > (vgpu0, vgpu1, etc.)
> > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > a numeric index
> > 
> > > >  
> > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > > >> all instances of similar devices assigned to VM.
> > > >>
> > > >> For example, to create 2 devices:
> > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > >>
> > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > >>
> > > >> Commit resources for above devices with single 'mdev_start':
> > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > >> 'instance', so 'mdev_create' would look like:
> > > >>
> > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > >>
> > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > > >> would be vendor specific parameters.
> > > >>
> > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > >>
> > > >> Then 'mdev_start' would be:
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > >>
> > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > >>
> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop  
> > > >
> > > > I'm not sure a comma separated list makes sense here, for both
> > > > simplicity in the kernel and more fine grained error reporting, we
> > > > probably want to start/stop them individually.  Actually, why is it
> > > > that we can't use the mediated device being opened and released to
> > > > automatically signal to the backend vendor driver to commit and release
> > > > resources? I don't fully understand why userspace needs this interface. 
> > > >  
> > 
> > There is a meaningful use of start/stop interface, as required in live
> > migration support. Such interface allows vendor driver to quiescent 
> > mdev activity on source device before mdev hardware state is snapshot,
> > and then resume mdev activity on dest device after its state is recovered.
> > Intel has implemented experimental live migration support in KVMGT (soon
> > to release), based on above two interfaces (plus another two to get/set
> > mdev state).
> 
> Ok, that's actually an 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Neo Jia
On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote:
> > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > Sent: Saturday, August 13, 2016 8:37 AM
> > 
> > 
> > 
> > On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > Kirti Wankhede  wrote:
> > >
> > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>
> > >>> This is used later by mdev_device_start() and mdev_device_stop() to get
> > >>> the parent_device so it can call the start and stop ops callbacks
> > >>> respectively.  That seems to imply that all of instances for a given
> > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > >>> still having a hard time buying into the uuid+instance plan when it
> > >>> seems like each mdev_device should have an actual unique uuid.
> > >>> Userspace tools can figure out which uuids to start for a given user, I
> > >>> don't see much value in collecting them to instances within a uuid.
> > >>>
> > >>
> > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > >> instance was introduced to support multiple devices in a VM.
> > >
> > > The instance number was never required in order to support multiple
> > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > devices with that same UUID and therefore associate udev events to a
> > > given VM.  Only then does an instance number become necessary since the
> > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > like a very dodgy solution when we should probably just be querying
> > > libvirt to give us a device to VM association.
> 
> Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> for mdev in the basic design. It's bound to NVIDIA management stack too 
> tightly.
> 
> I'm OK to give enough flexibility for various upper level management stacks,
> e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> option where either UUID or STRING could be optional? Upper management 
> stack can choose its own policy to identify a mdev:
> 
> a) $UUID only, so each mdev is allocated with a unique UUID
> b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
> (vgpu0, vgpu1, etc.)
> c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> a numeric index
> 
> > >
> > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > >> all instances of similar devices assigned to VM.
> > >>
> > >> For example, to create 2 devices:
> > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > >>
> > >> "$UUID-0" and "$UUID-1" devices are created.
> > >>
> > >> Commit resources for above devices with single 'mdev_start':
> > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > >>
> > >> Considering $UUID to be a unique UUID of a device, we don't need
> > >> 'instance', so 'mdev_create' would look like:
> > >>
> > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > >>
> > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > >> would be vendor specific parameters.
> > >>
> > >> Device nodes would be created as "$UUID1" and "$UUID"
> > >>
> > >> Then 'mdev_start' would be:
> > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > >>
> > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > >>
> > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> > >
> > > I'm not sure a comma separated list makes sense here, for both
> > > simplicity in the kernel and more fine grained error reporting, we
> > > probably want to start/stop them individually.  Actually, why is it
> > > that we can't use the mediated device being opened and released to
> > > automatically signal to the backend vendor driver to commit and release
> > > resources? I don't fully understand why userspace needs this interface.
> 
> There is a meaningful use of start/stop interface, as required in live
> migration support. Such interface allows vendor driver to quiescent 
> mdev activity on source device before mdev hardware state is snapshot,
> and then resume mdev activity on dest device after its state is recovered.
> Intel has implemented experimental live migration support in KVMGT (soon
> to release), based on above two interfaces (plus another two to get/set
> mdev state).
> 
> > >
> > 
> > For NVIDIA vGPU solution we need to know all devices assigned to a VM in
> > one shot to commit resources of all vGPUs assigned to a VM along with
> > some common resources.
> 
> Kirti, can you elaborate the background about above one-shot commit
> requirement? It's hard to understand such a requirement. 
> 
> As I relied in another 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Alex Williamson
On Mon, 15 Aug 2016 09:38:52 +
"Tian, Kevin"  wrote:

> > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > Sent: Saturday, August 13, 2016 8:37 AM
> > 
> > 
> > 
> > On 8/13/2016 2:46 AM, Alex Williamson wrote:  
> > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > Kirti Wankhede  wrote:
> > >  
> > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:  
> > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>
> > >>> This is used later by mdev_device_start() and mdev_device_stop() to get
> > >>> the parent_device so it can call the start and stop ops callbacks
> > >>> respectively.  That seems to imply that all of instances for a given
> > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > >>> still having a hard time buying into the uuid+instance plan when it
> > >>> seems like each mdev_device should have an actual unique uuid.
> > >>> Userspace tools can figure out which uuids to start for a given user, I
> > >>> don't see much value in collecting them to instances within a uuid.
> > >>>  
> > >>
> > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > >> instance was introduced to support multiple devices in a VM.  
> > >
> > > The instance number was never required in order to support multiple
> > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > devices with that same UUID and therefore associate udev events to a
> > > given VM.  Only then does an instance number become necessary since the
> > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > like a very dodgy solution when we should probably just be querying
> > > libvirt to give us a device to VM association.  
> 
> Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> for mdev in the basic design. It's bound to NVIDIA management stack too 
> tightly.
> 
> I'm OK to give enough flexibility for various upper level management stacks,
> e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> option where either UUID or STRING could be optional? Upper management 
> stack can choose its own policy to identify a mdev:
> 
> a) $UUID only, so each mdev is allocated with a unique UUID
> b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
> (vgpu0, vgpu1, etc.)
> c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> a numeric index
> 
> > >  
> > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > >> all instances of similar devices assigned to VM.
> > >>
> > >> For example, to create 2 devices:
> > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > >>
> > >> "$UUID-0" and "$UUID-1" devices are created.
> > >>
> > >> Commit resources for above devices with single 'mdev_start':
> > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > >>
> > >> Considering $UUID to be a unique UUID of a device, we don't need
> > >> 'instance', so 'mdev_create' would look like:
> > >>
> > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > >>
> > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > >> would be vendor specific parameters.
> > >>
> > >> Device nodes would be created as "$UUID1" and "$UUID"
> > >>
> > >> Then 'mdev_start' would be:
> > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > >>
> > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > >>
> > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop  
> > >
> > > I'm not sure a comma separated list makes sense here, for both
> > > simplicity in the kernel and more fine grained error reporting, we
> > > probably want to start/stop them individually.  Actually, why is it
> > > that we can't use the mediated device being opened and released to
> > > automatically signal to the backend vendor driver to commit and release
> > > resources? I don't fully understand why userspace needs this interface.  
> 
> There is a meaningful use of start/stop interface, as required in live
> migration support. Such interface allows vendor driver to quiescent 
> mdev activity on source device before mdev hardware state is snapshot,
> and then resume mdev activity on dest device after its state is recovered.
> Intel has implemented experimental live migration support in KVMGT (soon
> to release), based on above two interfaces (plus another two to get/set
> mdev state).

Ok, that's actually an interesting use case for start/stop.

> > 
> > For NVIDIA vGPU solution we need to know all devices assigned to a VM in
> > one shot to commit resources of all vGPUs assigned to a VM along with
> > some common resources.  
> 
> Kirti, can you elaborate the background about above one-shot commit

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Saturday, August 13, 2016 8:37 AM
> 
> 
> 
> On 8/13/2016 2:46 AM, Alex Williamson wrote:
> > On Sat, 13 Aug 2016 00:14:39 +0530
> > Kirti Wankhede  wrote:
> >
> >> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> >>> On Thu, 4 Aug 2016 00:33:51 +0530
> >>> Kirti Wankhede  wrote:
> >>>
> >>> This is used later by mdev_device_start() and mdev_device_stop() to get
> >>> the parent_device so it can call the start and stop ops callbacks
> >>> respectively.  That seems to imply that all of instances for a given
> >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> >>> still having a hard time buying into the uuid+instance plan when it
> >>> seems like each mdev_device should have an actual unique uuid.
> >>> Userspace tools can figure out which uuids to start for a given user, I
> >>> don't see much value in collecting them to instances within a uuid.
> >>>
> >>
> >> Initially we started discussion with VM_UUID+instance suggestion, where
> >> instance was introduced to support multiple devices in a VM.
> >
> > The instance number was never required in order to support multiple
> > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > management tools which wanted to re-use the VM UUID by creating vGPU
> > devices with that same UUID and therefore associate udev events to a
> > given VM.  Only then does an instance number become necessary since the
> > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > like a very dodgy solution when we should probably just be querying
> > libvirt to give us a device to VM association.

Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
for mdev in the basic design. It's bound to NVIDIA management stack too tightly.

I'm OK to give enough flexibility for various upper level management stacks,
e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
option where either UUID or STRING could be optional? Upper management 
stack can choose its own policy to identify a mdev:

a) $UUID only, so each mdev is allocated with a unique UUID
b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
(vgpu0, vgpu1, etc.)
c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
a numeric index

> >
> >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> >> all instances of similar devices assigned to VM.
> >>
> >> For example, to create 2 devices:
> >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> >>
> >> "$UUID-0" and "$UUID-1" devices are created.
> >>
> >> Commit resources for above devices with single 'mdev_start':
> >> # echo "$UUID" > /sys/class/mdev/mdev_start
> >>
> >> Considering $UUID to be a unique UUID of a device, we don't need
> >> 'instance', so 'mdev_create' would look like:
> >>
> >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> >>
> >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> >> would be vendor specific parameters.
> >>
> >> Device nodes would be created as "$UUID1" and "$UUID"
> >>
> >> Then 'mdev_start' would be:
> >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> >>
> >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> >>
> >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> >
> > I'm not sure a comma separated list makes sense here, for both
> > simplicity in the kernel and more fine grained error reporting, we
> > probably want to start/stop them individually.  Actually, why is it
> > that we can't use the mediated device being opened and released to
> > automatically signal to the backend vendor driver to commit and release
> > resources? I don't fully understand why userspace needs this interface.

There is a meaningful use of start/stop interface, as required in live
migration support. Such interface allows vendor driver to quiescent 
mdev activity on source device before mdev hardware state is snapshot,
and then resume mdev activity on dest device after its state is recovered.
Intel has implemented experimental live migration support in KVMGT (soon
to release), based on above two interfaces (plus another two to get/set
mdev state).

> >
> 
> For NVIDIA vGPU solution we need to know all devices assigned to a VM in
> one shot to commit resources of all vGPUs assigned to a VM along with
> some common resources.

Kirti, can you elaborate the background about above one-shot commit
requirement? It's hard to understand such a requirement. 

As I relied in another mail, I really hope start/stop become a per-mdev
attribute instead of global one, e.g.:

echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start

In many scenario the user space client may only want to talk to mdev
instance directly, w/o need to contact its 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-15 Thread Tian, Kevin
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Friday, August 05, 2016 2:13 PM
> 
> On 8/4/2016 12:51 PM, Tian, Kevin wrote:
> >> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> >> Sent: Thursday, August 04, 2016 3:04 AM
> >>
> >>
> >> 2. Physical device driver interface
> >> This interface provides vendor driver the set APIs to manage physical
> >> device related work in their own driver. APIs are :
> >> - supported_config: provide supported configuration list by the vendor
> >>driver
> >> - create: to allocate basic resources in vendor driver for a mediated
> >>  device.
> >> - destroy: to free resources in vendor driver when mediated device is
> >>   destroyed.
> >> - reset: to free and reallocate resources in vendor driver during reboot
> >
> > Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do
> > you think whether it makes sense to expose a sysfs 'reset' node too,
> > similar to what people see under a PCI device node?
> >
> 
> All vendor drivers might not support reset of mdev from sysfs. But those
> who want to support can expose 'reset' node using 'mdev_attr_groups' of
> 'struct parent_ops'.
> 

Yes, this way it works. Just wonder whether it makes sense to expose reset
sysfs node by default if a reset callback is provided by vendor driver. :-)

Thanks
Kevin



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-12 Thread Kirti Wankhede


On 8/13/2016 2:46 AM, Alex Williamson wrote:
> On Sat, 13 Aug 2016 00:14:39 +0530
> Kirti Wankhede  wrote:
> 
>> On 8/10/2016 12:30 AM, Alex Williamson wrote:
>>> On Thu, 4 Aug 2016 00:33:51 +0530
>>> Kirti Wankhede  wrote:
>>>
>>> This is used later by mdev_device_start() and mdev_device_stop() to get
>>> the parent_device so it can call the start and stop ops callbacks
>>> respectively.  That seems to imply that all of instances for a given
>>> uuid come from the same parent_device.  Where is that enforced?  I'm
>>> still having a hard time buying into the uuid+instance plan when it
>>> seems like each mdev_device should have an actual unique uuid.
>>> Userspace tools can figure out which uuids to start for a given user, I
>>> don't see much value in collecting them to instances within a uuid.
>>>   
>>
>> Initially we started discussion with VM_UUID+instance suggestion, where
>> instance was introduced to support multiple devices in a VM.
> 
> The instance number was never required in order to support multiple
> devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> management tools which wanted to re-use the VM UUID by creating vGPU
> devices with that same UUID and therefore associate udev events to a
> given VM.  Only then does an instance number become necessary since the
> UUID needs to be static for a vGPUs within a VM.  This has always felt
> like a very dodgy solution when we should probably just be querying
> libvirt to give us a device to VM association.
> 
>> 'mdev_create' creates device and 'mdev_start' is to commit resources of
>> all instances of similar devices assigned to VM.
>>
>> For example, to create 2 devices:
>> # echo "$UUID:0:params" > /sys/devices/../mdev_create
>> # echo "$UUID:1:params" > /sys/devices/../mdev_create
>>
>> "$UUID-0" and "$UUID-1" devices are created.
>>
>> Commit resources for above devices with single 'mdev_start':
>> # echo "$UUID" > /sys/class/mdev/mdev_start
>>
>> Considering $UUID to be a unique UUID of a device, we don't need
>> 'instance', so 'mdev_create' would look like:
>>
>> # echo "$UUID1:params" > /sys/devices/../mdev_create
>> # echo "$UUID2:params" > /sys/devices/../mdev_create
>>
>> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
>> would be vendor specific parameters.
>>
>> Device nodes would be created as "$UUID1" and "$UUID"
>>
>> Then 'mdev_start' would be:
>> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
>>
>> Similarly 'mdev_stop' and 'mdev_destroy' would be:
>>
>> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop
> 
> I'm not sure a comma separated list makes sense here, for both
> simplicity in the kernel and more fine grained error reporting, we
> probably want to start/stop them individually.  Actually, why is it
> that we can't use the mediated device being opened and released to
> automatically signal to the backend vendor driver to commit and release
> resources? I don't fully understand why userspace needs this interface.
> 

For NVIDIA vGPU solution we need to know all devices assigned to a VM in
one shot to commit resources of all vGPUs assigned to a VM along with
some common resources.

For start callback, I can pass on the list of UUIDs as is to vendor
driver. Let vendor driver decide whether to iterate for each device and
commit resources or do it in one shot.

Thanks,
Kirti

>> and
>>
>> # echo "$UUID1" > /sys/devices/../mdev_destroy
>> # echo "$UUID2" > /sys/devices/../mdev_destroy
>>
>> Does this seems reasonable?
> 
> I've been hoping we could drop the instance numbers and create actual
> unique UUIDs per mediated device for a while ;)  Thanks,
> 
> Alex
> 



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-12 Thread Alex Williamson
On Sat, 13 Aug 2016 00:14:39 +0530
Kirti Wankhede  wrote:

> On 8/10/2016 12:30 AM, Alex Williamson wrote:
> > On Thu, 4 Aug 2016 00:33:51 +0530
> > Kirti Wankhede  wrote:
> > 
> > This is used later by mdev_device_start() and mdev_device_stop() to get
> > the parent_device so it can call the start and stop ops callbacks
> > respectively.  That seems to imply that all of instances for a given
> > uuid come from the same parent_device.  Where is that enforced?  I'm
> > still having a hard time buying into the uuid+instance plan when it
> > seems like each mdev_device should have an actual unique uuid.
> > Userspace tools can figure out which uuids to start for a given user, I
> > don't see much value in collecting them to instances within a uuid.
> >   
> 
> Initially we started discussion with VM_UUID+instance suggestion, where
> instance was introduced to support multiple devices in a VM.

The instance number was never required in order to support multiple
devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
management tools which wanted to re-use the VM UUID by creating vGPU
devices with that same UUID and therefore associate udev events to a
given VM.  Only then does an instance number become necessary since the
UUID needs to be static for a vGPUs within a VM.  This has always felt
like a very dodgy solution when we should probably just be querying
libvirt to give us a device to VM association.

> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> all instances of similar devices assigned to VM.
> 
> For example, to create 2 devices:
> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> 
> "$UUID-0" and "$UUID-1" devices are created.
> 
> Commit resources for above devices with single 'mdev_start':
> # echo "$UUID" > /sys/class/mdev/mdev_start
> 
> Considering $UUID to be a unique UUID of a device, we don't need
> 'instance', so 'mdev_create' would look like:
> 
> # echo "$UUID1:params" > /sys/devices/../mdev_create
> # echo "$UUID2:params" > /sys/devices/../mdev_create
> 
> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> would be vendor specific parameters.
> 
> Device nodes would be created as "$UUID1" and "$UUID"
> 
> Then 'mdev_start' would be:
> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> 
> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> 
> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop

I'm not sure a comma separated list makes sense here, for both
simplicity in the kernel and more fine grained error reporting, we
probably want to start/stop them individually.  Actually, why is it
that we can't use the mediated device being opened and released to
automatically signal to the backend vendor driver to commit and release
resources? I don't fully understand why userspace needs this interface.

> and
> 
> # echo "$UUID1" > /sys/devices/../mdev_destroy
> # echo "$UUID2" > /sys/devices/../mdev_destroy
> 
> Does this seems reasonable?

I've been hoping we could drop the instance numbers and create actual
unique UUIDs per mediated device for a while ;)  Thanks,

Alex



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-12 Thread Kirti Wankhede


On 8/10/2016 12:30 AM, Alex Williamson wrote:
> On Thu, 4 Aug 2016 00:33:51 +0530
> Kirti Wankhede  wrote:
> 
> This is used later by mdev_device_start() and mdev_device_stop() to get
> the parent_device so it can call the start and stop ops callbacks
> respectively.  That seems to imply that all of instances for a given
> uuid come from the same parent_device.  Where is that enforced?  I'm
> still having a hard time buying into the uuid+instance plan when it
> seems like each mdev_device should have an actual unique uuid.
> Userspace tools can figure out which uuids to start for a given user, I
> don't see much value in collecting them to instances within a uuid.
> 

Initially we started discussion with VM_UUID+instance suggestion, where
instance was introduced to support multiple devices in a VM.
'mdev_create' creates device and 'mdev_start' is to commit resources of
all instances of similar devices assigned to VM.

For example, to create 2 devices:
# echo "$UUID:0:params" > /sys/devices/../mdev_create
# echo "$UUID:1:params" > /sys/devices/../mdev_create

"$UUID-0" and "$UUID-1" devices are created.

Commit resources for above devices with single 'mdev_start':
# echo "$UUID" > /sys/class/mdev/mdev_start

Considering $UUID to be a unique UUID of a device, we don't need
'instance', so 'mdev_create' would look like:

# echo "$UUID1:params" > /sys/devices/../mdev_create
# echo "$UUID2:params" > /sys/devices/../mdev_create

where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
would be vendor specific parameters.

Device nodes would be created as "$UUID1" and "$UUID"

Then 'mdev_start' would be:
# echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start

Similarly 'mdev_stop' and 'mdev_destroy' would be:

# echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop

and

# echo "$UUID1" > /sys/devices/../mdev_destroy
# echo "$UUID2" > /sys/devices/../mdev_destroy

Does this seems reasonable?

Thanks,
Kirti



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-09 Thread Alex Williamson
On Thu, 4 Aug 2016 00:33:51 +0530
Kirti Wankhede  wrote:

> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |  mdev | +>+  |<-> VFIO user
>  | |  bus  | | probe()/remove()| vfio_mpci.ko |APIs
>  | |  driver   | | |  |
>  | |   | | +--+
>  | |   | |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |   | +>+  |<-> VFIO user
>  | +---+ | probe()/remove()| vfio_mccw.ko |APIs
>  |   | |  |
>  |  MDEV CORE| +--+
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
>   Return 1 if given device can be handled by given driver and
>   zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>  int  (*match)(struct device *dev);
>  struct device_driverdriver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - reset: to free and reallocate resources in vendor driver during reboot
> - start: to initiate mediated device initialization process from vendor
>driver
> - shutdown: to teardown mediated device resources during teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that VMM sets.
> - get_region_info: to provide region size and its flags for the mediated
>  device.
> - validate_map_request: to validate remap pfn request.
> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> Locks to serialize above callbacks are removed. If required, vendor driver
> can have locks to serialize above 

Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-05 Thread Kirti Wankhede


On 8/4/2016 12:51 PM, Tian, Kevin wrote:
>> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
>> Sent: Thursday, August 04, 2016 3:04 AM
>>
>>
>> 2. Physical device driver interface
>> This interface provides vendor driver the set APIs to manage physical
>> device related work in their own driver. APIs are :
>> - supported_config: provide supported configuration list by the vendor
>>  driver
>> - create: to allocate basic resources in vendor driver for a mediated
>>device.
>> - destroy: to free resources in vendor driver when mediated device is
>> destroyed.
>> - reset: to free and reallocate resources in vendor driver during reboot
> 
> Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do 
> you think whether it makes sense to expose a sysfs 'reset' node too,
> similar to what people see under a PCI device node?
> 

All vendor drivers might not support reset of mdev from sysfs. But those
who want to support can expose 'reset' node using 'mdev_attr_groups' of
'struct parent_ops'.


>> - start: to initiate mediated device initialization process from vendor
>>   driver
>> - shutdown: to teardown mediated device resources during teardown.
> 
> I think 'shutdown' should be 'stop' based on actual code.
>

Thanks for catching that, yes I missed to updated here.

Thanks,
Kirti

> Thanks
> Kevin
> 



Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver

2016-08-04 Thread Tian, Kevin
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Thursday, August 04, 2016 3:04 AM
> 
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - reset: to free and reallocate resources in vendor driver during reboot

Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do 
you think whether it makes sense to expose a sysfs 'reset' node too,
similar to what people see under a PCI device node?

> - start: to initiate mediated device initialization process from vendor
>driver
> - shutdown: to teardown mediated device resources during teardown.

I think 'shutdown' should be 'stop' based on actual code.

Thanks
Kevin