Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-15 Thread Dong Jia
On Tue, 7 Jun 2016 22:29:30 -0600
Alex Williamson  wrote:

> On Wed, 8 Jun 2016 11:18:42 +0800
> Dong Jia  wrote:
> 
> > On Tue, 7 Jun 2016 19:39:21 -0600
> > Alex Williamson  wrote:
> > 
> > > On Wed, 8 Jun 2016 01:18:42 +
> > > "Tian, Kevin"  wrote:
> > >   
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > > 
> > > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > > "Tian, Kevin"  wrote:
> > > > > 
> > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > > >
> > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > > Neo Jia  wrote:
> > > > > > >
> > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > > Neo Jia  wrote:
> > > > > > > > >
> > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > > This intends to handle an intercepted channel I/O 
> > > > > > > > > instruction. It
> > > > > > > > > basically need to do the following thing:
> > > > > > > >
> > > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > > VFIO ioctl at
> > > > > > > > first place?
> > > > > > >
> > > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > > when
> > > > > > > to start an operation when we probably should be passing through
> > > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > > mediated
> > > > > > > device. Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > >
> > > > > > Below is copied from Dong's earlier post which said clear that
> > > > > > a guest cmd submission will trigger the whole flow:
> > > > > >
> > > > > > 
> > > > > > Explanation:
> > > > > > Q1-Q4: Qemu side process.
> > > > > > K1-K6: Kernel side process.
> > > > > >
> > > > > > Q1. Intercept a ssch instruction.
> > > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > > (u_ccwchain).
> > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > > program, which becomes runnable for a real device.
> > > > > > K3. With the necessary information contained in the orb passed 
> > > > > > in
> > > > > > by Qemu, issue the k_ccwchain to the device, and wait event 
> > > > > > q
> > > > > > for the I/O result.
> > > > > > K4. Interrupt handler gets the I/O result, and wakes up the 
> > > > > > wait q.
> > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result 
> > > > > > to
> > > > > > update the user space irb.
> > > > > > K6. Copy irb and scsw back to user space.
> > > > > > Q4. Update the irb for the guest.
> > > > > > 
> > > > > 
> > > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > > QEMU if those are operations that are not visible to the mediated
> > > > > device; which they very well might be, since it's described as an
> > > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > > the case, vfio-pci has its own ioctl for doing a hot reset.
> > Dear Alex, Kevin and Neo,
> > 
> > 'ssch' is a privileged I/O instruction, which should be finally issued
> > to the dedicated subchannel of the physical device.
> > 
> > BTW, I did remove step Q2 with all of the user-space translation code,
> > according to your comments in another thread.
> > 
> > > > 
> > > >   
> > > > > 
> > > > > > My understanding is that such thing belongs to how device is 
> > > > > > mediated
> > > > > > (so device driver specific), instead of something to be abstracted 
> > > > > > in
> > > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > > >
> > > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > > model will intercept the submission request (in its own way), do its
> > > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > > to physical GPU through vendor specific interface.
> > > > > >
> > > > > > No difference with channel I/O here.
> > > > > 
> > > > > Well, if the GPU command is submitted through an MMIO register, is 
> > > > > that
> > > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > > device recognize the command and do the scan/audit 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-08 Thread Neo Jia
On Wed, Jun 08, 2016 at 02:13:49PM +0800, Dong Jia wrote:
> On Tue, 7 Jun 2016 20:48:42 -0700
> Neo Jia  wrote:
> 
> > On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote:
> > > On Tue, 7 Jun 2016 19:39:21 -0600
> > > Alex Williamson  wrote:
> > > 
> > > > On Wed, 8 Jun 2016 01:18:42 +
> > > > "Tian, Kevin"  wrote:
> > > > 
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > > > 
> > > > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > > > "Tian, Kevin"  wrote:
> > > > > >   
> > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > > > >
> > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > > > Neo Jia  wrote:
> > > > > > > >  
> > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > > > Neo Jia  wrote:
> > > > > > > > > >
> > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > > > This intends to handle an intercepted channel I/O 
> > > > > > > > > > instruction. It
> > > > > > > > > > basically need to do the following thing:  
> > > > > > > > >
> > > > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > > > VFIO ioctl at
> > > > > > > > > first place?  
> > > > > > > >
> > > > > > > > Yep, this is my question as well.  It sounds a bit like there's 
> > > > > > > > an
> > > > > > > > emulated device in QEMU that's trying to tell the mediated 
> > > > > > > > device when
> > > > > > > > to start an operation when we probably should be passing through
> > > > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > > > mediated
> > > > > > > > device. Thanks,
> > > > > > > >
> > > > > > > > Alex  
> > > > > > >
> > > > > > > Below is copied from Dong's earlier post which said clear that
> > > > > > > a guest cmd submission will trigger the whole flow:
> > > > > > >
> > > > > > > 
> > > > > > > Explanation:
> > > > > > > Q1-Q4: Qemu side process.
> > > > > > > K1-K6: Kernel side process.
> > > > > > >
> > > > > > > Q1. Intercept a ssch instruction.
> > > > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > > > (u_ccwchain).
> > > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > > > program, which becomes runnable for a real device.
> > > > > > > K3. With the necessary information contained in the orb 
> > > > > > > passed in
> > > > > > > by Qemu, issue the k_ccwchain to the device, and wait 
> > > > > > > event q
> > > > > > > for the I/O result.
> > > > > > > K4. Interrupt handler gets the I/O result, and wakes up the 
> > > > > > > wait q.
> > > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the 
> > > > > > > result to
> > > > > > > update the user space irb.
> > > > > > > K6. Copy irb and scsw back to user space.
> > > > > > > Q4. Update the irb for the guest.
> > > > > > >   
> > > > > > 
> > > > > > Right, but this was the pre-mediated device approach, now we no 
> > > > > > longer
> > > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > > > QEMU if those are operations that are not visible to the mediated
> > > > > > device; which they very well might be, since it's described as an
> > > > > > instruction rather than an i/o operation.  It's not terrible if 
> > > > > > that's
> > > > > > the case, vfio-pci has its own ioctl for doing a hot reset.  
> > > Dear Alex, Kevin and Neo,
> > > 
> > > 'ssch' is a privileged I/O instruction, which should be finally issued
> > > to the dedicated subchannel of the physical device.
> > > 
> > > BTW, I did remove step Q2 with all of the user-space translation code,
> > > according to your comments in another thread.
> > > 
> > > > > 
> > > > > 
> > > > > >   
> > > > > > > My understanding is that such thing belongs to how device is 
> > > > > > > mediated
> > > > > > > (so device driver specific), instead of something to be 
> > > > > > > abstracted in
> > > > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > > > >
> > > > > > > Actually we have same requirement in vGPU case, that a guest 
> > > > > > > driver
> > > > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > > > model will intercept the submission request (in its own way), do 
> > > > > > > its
> > > > > > > necessary scan/audit to ensure correctness/security, and then 
> > > > > > > submit
> > > > > > > to physical GPU through vendor specific interface.
> > > > > > >
> > > > > > > No 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-08 Thread Dong Jia
On Tue, 7 Jun 2016 20:48:42 -0700
Neo Jia  wrote:

> On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote:
> > On Tue, 7 Jun 2016 19:39:21 -0600
> > Alex Williamson  wrote:
> > 
> > > On Wed, 8 Jun 2016 01:18:42 +
> > > "Tian, Kevin"  wrote:
> > > 
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > > 
> > > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > > "Tian, Kevin"  wrote:
> > > > >   
> > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > > >
> > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > > Neo Jia  wrote:
> > > > > > >  
> > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > > Neo Jia  wrote:
> > > > > > > > >
> > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > > This intends to handle an intercepted channel I/O 
> > > > > > > > > instruction. It
> > > > > > > > > basically need to do the following thing:  
> > > > > > > >
> > > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > > VFIO ioctl at
> > > > > > > > first place?  
> > > > > > >
> > > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > > when
> > > > > > > to start an operation when we probably should be passing through
> > > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > > mediated
> > > > > > > device. Thanks,
> > > > > > >
> > > > > > > Alex  
> > > > > >
> > > > > > Below is copied from Dong's earlier post which said clear that
> > > > > > a guest cmd submission will trigger the whole flow:
> > > > > >
> > > > > > 
> > > > > > Explanation:
> > > > > > Q1-Q4: Qemu side process.
> > > > > > K1-K6: Kernel side process.
> > > > > >
> > > > > > Q1. Intercept a ssch instruction.
> > > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > > (u_ccwchain).
> > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > > program, which becomes runnable for a real device.
> > > > > > K3. With the necessary information contained in the orb passed 
> > > > > > in
> > > > > > by Qemu, issue the k_ccwchain to the device, and wait event 
> > > > > > q
> > > > > > for the I/O result.
> > > > > > K4. Interrupt handler gets the I/O result, and wakes up the 
> > > > > > wait q.
> > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result 
> > > > > > to
> > > > > > update the user space irb.
> > > > > > K6. Copy irb and scsw back to user space.
> > > > > > Q4. Update the irb for the guest.
> > > > > >   
> > > > > 
> > > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > > QEMU if those are operations that are not visible to the mediated
> > > > > device; which they very well might be, since it's described as an
> > > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > > the case, vfio-pci has its own ioctl for doing a hot reset.  
> > Dear Alex, Kevin and Neo,
> > 
> > 'ssch' is a privileged I/O instruction, which should be finally issued
> > to the dedicated subchannel of the physical device.
> > 
> > BTW, I did remove step Q2 with all of the user-space translation code,
> > according to your comments in another thread.
> > 
> > > > 
> > > > 
> > > > >   
> > > > > > My understanding is that such thing belongs to how device is 
> > > > > > mediated
> > > > > > (so device driver specific), instead of something to be abstracted 
> > > > > > in
> > > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > > >
> > > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > > model will intercept the submission request (in its own way), do its
> > > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > > to physical GPU through vendor specific interface.
> > > > > >
> > > > > > No difference with channel I/O here.  
> > > > > 
> > > > > Well, if the GPU command is submitted through an MMIO register, is 
> > > > > that
> > > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > > device recognize the command and do the scan/audit itself?  QEMU must
> > > > > not be the point at which mediation occurs for 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Wed, 8 Jun 2016 11:18:42 +0800
Dong Jia  wrote:

> On Tue, 7 Jun 2016 19:39:21 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 8 Jun 2016 01:18:42 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > 
> > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > "Tian, Kevin"  wrote:
> > > > 
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > >
> > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > Neo Jia  wrote:
> > > > > >
> > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > Neo Jia  wrote:
> > > > > > > >
> > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > This intends to handle an intercepted channel I/O instruction. 
> > > > > > > > It
> > > > > > > > basically need to do the following thing:
> > > > > > >
> > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > VFIO ioctl at
> > > > > > > first place?
> > > > > >
> > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > when
> > > > > > to start an operation when we probably should be passing through
> > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > mediated
> > > > > > device. Thanks,
> > > > > >
> > > > > > Alex
> > > > >
> > > > > Below is copied from Dong's earlier post which said clear that
> > > > > a guest cmd submission will trigger the whole flow:
> > > > >
> > > > > 
> > > > > Explanation:
> > > > > Q1-Q4: Qemu side process.
> > > > > K1-K6: Kernel side process.
> > > > >
> > > > > Q1. Intercept a ssch instruction.
> > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > (u_ccwchain).
> > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > program, which becomes runnable for a real device.
> > > > > K3. With the necessary information contained in the orb passed in
> > > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > > for the I/O result.
> > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait 
> > > > > q.
> > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > > update the user space irb.
> > > > > K6. Copy irb and scsw back to user space.
> > > > > Q4. Update the irb for the guest.
> > > > > 
> > > > 
> > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > QEMU if those are operations that are not visible to the mediated
> > > > device; which they very well might be, since it's described as an
> > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > the case, vfio-pci has its own ioctl for doing a hot reset.
> Dear Alex, Kevin and Neo,
> 
> 'ssch' is a privileged I/O instruction, which should be finally issued
> to the dedicated subchannel of the physical device.
> 
> BTW, I did remove step Q2 with all of the user-space translation code,
> according to your comments in another thread.
> 
> > > 
> > >   
> > > > 
> > > > > My understanding is that such thing belongs to how device is mediated
> > > > > (so device driver specific), instead of something to be abstracted in
> > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > >
> > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > model will intercept the submission request (in its own way), do its
> > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > to physical GPU through vendor specific interface.
> > > > >
> > > > > No difference with channel I/O here.
> > > > 
> > > > Well, if the GPU command is submitted through an MMIO register, is that
> > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > device recognize the command and do the scan/audit itself?  QEMU must
> > > > not be the point at which mediation occurs for security purposes, QEMU
> > > > is userspace and userspace is not to be trusted.  I'm still open to
> > > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > > already, but we need to evaluate each one, why it needs to exist, and
> > > > whether we can skip it if the 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Neo Jia
On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote:
> On Tue, 7 Jun 2016 19:39:21 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 8 Jun 2016 01:18:42 +
> > "Tian, Kevin"  wrote:
> > 
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > > 
> > > > On Tue, 7 Jun 2016 03:03:32 +
> > > > "Tian, Kevin"  wrote:
> > > >   
> > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > > >
> > > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > > Neo Jia  wrote:
> > > > > >  
> > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > > Neo Jia  wrote:
> > > > > > > >
> > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > > This intends to handle an intercepted channel I/O instruction. 
> > > > > > > > It
> > > > > > > > basically need to do the following thing:  
> > > > > > >
> > > > > > > May I ask how and when QEMU knows that he needs to issue such 
> > > > > > > VFIO ioctl at
> > > > > > > first place?  
> > > > > >
> > > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > > emulated device in QEMU that's trying to tell the mediated device 
> > > > > > when
> > > > > > to start an operation when we probably should be passing through
> > > > > > whatever i/o operations indicate that status directly to the 
> > > > > > mediated
> > > > > > device. Thanks,
> > > > > >
> > > > > > Alex  
> > > > >
> > > > > Below is copied from Dong's earlier post which said clear that
> > > > > a guest cmd submission will trigger the whole flow:
> > > > >
> > > > > 
> > > > > Explanation:
> > > > > Q1-Q4: Qemu side process.
> > > > > K1-K6: Kernel side process.
> > > > >
> > > > > Q1. Intercept a ssch instruction.
> > > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > > (u_ccwchain).
> > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > > program, which becomes runnable for a real device.
> > > > > K3. With the necessary information contained in the orb passed in
> > > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > > for the I/O result.
> > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait 
> > > > > q.
> > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > > update the user space irb.
> > > > > K6. Copy irb and scsw back to user space.
> > > > > Q4. Update the irb for the guest.
> > > > >   
> > > > 
> > > > Right, but this was the pre-mediated device approach, now we no longer
> > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > > QEMU if those are operations that are not visible to the mediated
> > > > device; which they very well might be, since it's described as an
> > > > instruction rather than an i/o operation.  It's not terrible if that's
> > > > the case, vfio-pci has its own ioctl for doing a hot reset.  
> Dear Alex, Kevin and Neo,
> 
> 'ssch' is a privileged I/O instruction, which should be finally issued
> to the dedicated subchannel of the physical device.
> 
> BTW, I did remove step Q2 with all of the user-space translation code,
> according to your comments in another thread.
> 
> > > 
> > > 
> > > >   
> > > > > My understanding is that such thing belongs to how device is mediated
> > > > > (so device driver specific), instead of something to be abstracted in
> > > > > VFIO which manages resource but doesn't care how resource is used.
> > > > >
> > > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > > model will intercept the submission request (in its own way), do its
> > > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > > to physical GPU through vendor specific interface.
> > > > >
> > > > > No difference with channel I/O here.  
> > > > 
> > > > Well, if the GPU command is submitted through an MMIO register, is that
> > > > MMIO register part of the mediated device?  If so, could the mediated
> > > > device recognize the command and do the scan/audit itself?  QEMU must
> > > > not be the point at which mediation occurs for security purposes, QEMU
> > > > is userspace and userspace is not to be trusted.  I'm still open to
> > > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > > already, but we need to evaluate each one, why it needs to exist, and
> > > > whether we can skip it if the mediated device can trigger the action on
> > > > 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Dong Jia
On Tue, 7 Jun 2016 19:39:21 -0600
Alex Williamson  wrote:

> On Wed, 8 Jun 2016 01:18:42 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, June 08, 2016 6:42 AM
> > > 
> > > On Tue, 7 Jun 2016 03:03:32 +
> > > "Tian, Kevin"  wrote:
> > >   
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > > >
> > > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > > Neo Jia  wrote:
> > > > >  
> > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > > Neo Jia  wrote:
> > > > > > >
> > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > > > basically need to do the following thing:  
> > > > > >
> > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > > > ioctl at
> > > > > > first place?  
> > > > >
> > > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > > emulated device in QEMU that's trying to tell the mediated device when
> > > > > to start an operation when we probably should be passing through
> > > > > whatever i/o operations indicate that status directly to the mediated
> > > > > device. Thanks,
> > > > >
> > > > > Alex  
> > > >
> > > > Below is copied from Dong's earlier post which said clear that
> > > > a guest cmd submission will trigger the whole flow:
> > > >
> > > > 
> > > > Explanation:
> > > > Q1-Q4: Qemu side process.
> > > > K1-K6: Kernel side process.
> > > >
> > > > Q1. Intercept a ssch instruction.
> > > > Q2. Translate the guest ccw program to a user space ccw program
> > > > (u_ccwchain).
> > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > > K2. Translate the user space ccw program to a kernel space ccw
> > > > program, which becomes runnable for a real device.
> > > > K3. With the necessary information contained in the orb passed in
> > > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > > for the I/O result.
> > > > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > > update the user space irb.
> > > > K6. Copy irb and scsw back to user space.
> > > > Q4. Update the irb for the guest.
> > > >   
> > > 
> > > Right, but this was the pre-mediated device approach, now we no longer
> > > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > > QEMU if those are operations that are not visible to the mediated
> > > device; which they very well might be, since it's described as an
> > > instruction rather than an i/o operation.  It's not terrible if that's
> > > the case, vfio-pci has its own ioctl for doing a hot reset.  
Dear Alex, Kevin and Neo,

'ssch' is a privileged I/O instruction, which should be finally issued
to the dedicated subchannel of the physical device.

BTW, I did remove step Q2 with all of the user-space translation code,
according to your comments in another thread.

> > 
> > 
> > >   
> > > > My understanding is that such thing belongs to how device is mediated
> > > > (so device driver specific), instead of something to be abstracted in
> > > > VFIO which manages resource but doesn't care how resource is used.
> > > >
> > > > Actually we have same requirement in vGPU case, that a guest driver
> > > > needs submit GPU commands through some MMIO register. vGPU device
> > > > model will intercept the submission request (in its own way), do its
> > > > necessary scan/audit to ensure correctness/security, and then submit
> > > > to physical GPU through vendor specific interface.
> > > >
> > > > No difference with channel I/O here.  
> > > 
> > > Well, if the GPU command is submitted through an MMIO register, is that
> > > MMIO register part of the mediated device?  If so, could the mediated
> > > device recognize the command and do the scan/audit itself?  QEMU must
> > > not be the point at which mediation occurs for security purposes, QEMU
> > > is userspace and userspace is not to be trusted.  I'm still open to
> > > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > > already, but we need to evaluate each one, why it needs to exist, and
> > > whether we can skip it if the mediated device can trigger the action on
> > > its own.  After all, that's why we're using the vfio api, so we can
> > > re-use much of the existing infrastructure, especially for a vGPU that
> > > exposes itself as a PCI device.  Thanks,
> > >   
> > 
> > My point is that a guest submission on vGPU is just a normal trapped 
> > register write, which is 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Wed, 8 Jun 2016 01:18:42 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, June 08, 2016 6:42 AM
> > 
> > On Tue, 7 Jun 2016 03:03:32 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Tuesday, June 07, 2016 3:31 AM
> > > >
> > > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > > Neo Jia  wrote:
> > > >  
> > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > > Neo Jia  wrote:
> > > > > >
> > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > > basically need to do the following thing:  
> > > > >
> > > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > > ioctl at
> > > > > first place?  
> > > >
> > > > Yep, this is my question as well.  It sounds a bit like there's an
> > > > emulated device in QEMU that's trying to tell the mediated device when
> > > > to start an operation when we probably should be passing through
> > > > whatever i/o operations indicate that status directly to the mediated
> > > > device. Thanks,
> > > >
> > > > Alex  
> > >
> > > Below is copied from Dong's earlier post which said clear that
> > > a guest cmd submission will trigger the whole flow:
> > >
> > > 
> > > Explanation:
> > > Q1-Q4: Qemu side process.
> > > K1-K6: Kernel side process.
> > >
> > > Q1. Intercept a ssch instruction.
> > > Q2. Translate the guest ccw program to a user space ccw program
> > > (u_ccwchain).
> > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > > K2. Translate the user space ccw program to a kernel space ccw
> > > program, which becomes runnable for a real device.
> > > K3. With the necessary information contained in the orb passed in
> > > by Qemu, issue the k_ccwchain to the device, and wait event q
> > > for the I/O result.
> > > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > > update the user space irb.
> > > K6. Copy irb and scsw back to user space.
> > > Q4. Update the irb for the guest.
> > >   
> > 
> > Right, but this was the pre-mediated device approach, now we no longer
> > need step Q2 so we really only need Q1 and therefore Q3 to exist in
> > QEMU if those are operations that are not visible to the mediated
> > device; which they very well might be, since it's described as an
> > instruction rather than an i/o operation.  It's not terrible if that's
> > the case, vfio-pci has its own ioctl for doing a hot reset.  
> 
> 
> 
> >   
> > > My understanding is that such thing belongs to how device is mediated
> > > (so device driver specific), instead of something to be abstracted in
> > > VFIO which manages resource but doesn't care how resource is used.
> > >
> > > Actually we have same requirement in vGPU case, that a guest driver
> > > needs submit GPU commands through some MMIO register. vGPU device
> > > model will intercept the submission request (in its own way), do its
> > > necessary scan/audit to ensure correctness/security, and then submit
> > > to physical GPU through vendor specific interface.
> > >
> > > No difference with channel I/O here.  
> > 
> > Well, if the GPU command is submitted through an MMIO register, is that
> > MMIO register part of the mediated device?  If so, could the mediated
> > device recognize the command and do the scan/audit itself?  QEMU must
> > not be the point at which mediation occurs for security purposes, QEMU
> > is userspace and userspace is not to be trusted.  I'm still open to
> > ioctls where it makes sense, as above, we have PCI specific ioctls and
> > already, but we need to evaluate each one, why it needs to exist, and
> > whether we can skip it if the mediated device can trigger the action on
> > its own.  After all, that's why we're using the vfio api, so we can
> > re-use much of the existing infrastructure, especially for a vGPU that
> > exposes itself as a PCI device.  Thanks,
> >   
> 
> My point is that a guest submission on vGPU is just a normal trapped 
> register write, which is forwarded from Qemu to VFIO through pwrite 
> interface and then hit mediated vGPU device. The mediated device
> will recognize this register write as a submission request and then do
> necessary scan (looks we are saying same thing) and then submit to
> physical device driver. If loading ccw cmds on channel i/o are also 
> through some I/O registers, it can be implemented same way w/o
> introducing new ioctl. The r/w handler of mediated device can figure
> out whether it's a ccw submission or not. But my understanding might 
> be wrong here.

I 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, June 08, 2016 6:42 AM
> 
> On Tue, 7 Jun 2016 03:03:32 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, June 07, 2016 3:31 AM
> > >
> > > On Mon, 6 Jun 2016 10:44:25 -0700
> > > Neo Jia  wrote:
> > >
> > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > > Neo Jia  wrote:
> > > > >
> > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > > This intends to handle an intercepted channel I/O instruction. It
> > > > > basically need to do the following thing:
> > > >
> > > > May I ask how and when QEMU knows that he needs to issue such VFIO 
> > > > ioctl at
> > > > first place?
> > >
> > > Yep, this is my question as well.  It sounds a bit like there's an
> > > emulated device in QEMU that's trying to tell the mediated device when
> > > to start an operation when we probably should be passing through
> > > whatever i/o operations indicate that status directly to the mediated
> > > device. Thanks,
> > >
> > > Alex
> >
> > Below is copied from Dong's earlier post which said clear that
> > a guest cmd submission will trigger the whole flow:
> >
> > 
> > Explanation:
> > Q1-Q4: Qemu side process.
> > K1-K6: Kernel side process.
> >
> > Q1. Intercept a ssch instruction.
> > Q2. Translate the guest ccw program to a user space ccw program
> > (u_ccwchain).
> > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> > K1. Copy from u_ccwchain to kernel (k_ccwchain).
> > K2. Translate the user space ccw program to a kernel space ccw
> > program, which becomes runnable for a real device.
> > K3. With the necessary information contained in the orb passed in
> > by Qemu, issue the k_ccwchain to the device, and wait event q
> > for the I/O result.
> > K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> > update the user space irb.
> > K6. Copy irb and scsw back to user space.
> > Q4. Update the irb for the guest.
> > 
> 
> Right, but this was the pre-mediated device approach, now we no longer
> need step Q2 so we really only need Q1 and therefore Q3 to exist in
> QEMU if those are operations that are not visible to the mediated
> device; which they very well might be, since it's described as an
> instruction rather than an i/o operation.  It's not terrible if that's
> the case, vfio-pci has its own ioctl for doing a hot reset.



> 
> > My understanding is that such thing belongs to how device is mediated
> > (so device driver specific), instead of something to be abstracted in
> > VFIO which manages resource but doesn't care how resource is used.
> >
> > Actually we have same requirement in vGPU case, that a guest driver
> > needs submit GPU commands through some MMIO register. vGPU device
> > model will intercept the submission request (in its own way), do its
> > necessary scan/audit to ensure correctness/security, and then submit
> > to physical GPU through vendor specific interface.
> >
> > No difference with channel I/O here.
> 
> Well, if the GPU command is submitted through an MMIO register, is that
> MMIO register part of the mediated device?  If so, could the mediated
> device recognize the command and do the scan/audit itself?  QEMU must
> not be the point at which mediation occurs for security purposes, QEMU
> is userspace and userspace is not to be trusted.  I'm still open to
> ioctls where it makes sense, as above, we have PCI specific ioctls and
> already, but we need to evaluate each one, why it needs to exist, and
> whether we can skip it if the mediated device can trigger the action on
> its own.  After all, that's why we're using the vfio api, so we can
> re-use much of the existing infrastructure, especially for a vGPU that
> exposes itself as a PCI device.  Thanks,
> 

My point is that a guest submission on vGPU is just a normal trapped 
register write, which is forwarded from Qemu to VFIO through pwrite 
interface and then hit mediated vGPU device. The mediated device
will recognize this register write as a submission request and then do
necessary scan (looks we are saying same thing) and then submit to
physical device driver. If loading ccw cmds on channel i/o are also 
through some I/O registers, it can be implemented same way w/o
introducing new ioctl. The r/w handler of mediated device can figure
out whether it's a ccw submission or not. But my understanding might 
be wrong here.

Thanks
Kevin



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-07 Thread Alex Williamson
On Tue, 7 Jun 2016 03:03:32 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, June 07, 2016 3:31 AM
> > 
> > On Mon, 6 Jun 2016 10:44:25 -0700
> > Neo Jia  wrote:
> >   
> > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:  
> > > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > > Neo Jia  wrote:
> > > >
> > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > > This intends to handle an intercepted channel I/O instruction. It
> > > > basically need to do the following thing:  
> > >
> > > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl 
> > > at
> > > first place?  
> > 
> > Yep, this is my question as well.  It sounds a bit like there's an
> > emulated device in QEMU that's trying to tell the mediated device when
> > to start an operation when we probably should be passing through
> > whatever i/o operations indicate that status directly to the mediated
> > device. Thanks,
> > 
> > Alex  
> 
> Below is copied from Dong's earlier post which said clear that
> a guest cmd submission will trigger the whole flow:
> 
> 
> Explanation:
> Q1-Q4: Qemu side process.
> K1-K6: Kernel side process.
> 
> Q1. Intercept a ssch instruction.
> Q2. Translate the guest ccw program to a user space ccw program
> (u_ccwchain).
> Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> K1. Copy from u_ccwchain to kernel (k_ccwchain).
> K2. Translate the user space ccw program to a kernel space ccw
> program, which becomes runnable for a real device.
> K3. With the necessary information contained in the orb passed in
> by Qemu, issue the k_ccwchain to the device, and wait event q
> for the I/O result.
> K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> update the user space irb.
> K6. Copy irb and scsw back to user space.
> Q4. Update the irb for the guest.
> 

Right, but this was the pre-mediated device approach, now we no longer
need step Q2 so we really only need Q1 and therefore Q3 to exist in
QEMU if those are operations that are not visible to the mediated
device; which they very well might be, since it's described as an
instruction rather than an i/o operation.  It's not terrible if that's
the case, vfio-pci has its own ioctl for doing a hot reset.
 
> My understanding is that such thing belongs to how device is mediated
> (so device driver specific), instead of something to be abstracted in 
> VFIO which manages resource but doesn't care how resource is used.
> 
> Actually we have same requirement in vGPU case, that a guest driver 
> needs submit GPU commands through some MMIO register. vGPU device 
> model will intercept the submission request (in its own way), do its 
> necessary scan/audit to ensure correctness/security, and then submit 
> to physical GPU through vendor specific interface. 
> 
> No difference with channel I/O here.

Well, if the GPU command is submitted through an MMIO register, is that
MMIO register part of the mediated device?  If so, could the mediated
device recognize the command and do the scan/audit itself?  QEMU must
not be the point at which mediation occurs for security purposes, QEMU
is userspace and userspace is not to be trusted.  I'm still open to
ioctls where it makes sense, as above, we have PCI specific ioctls and
already, but we need to evaluate each one, why it needs to exist, and
whether we can skip it if the mediated device can trigger the action on
its own.  After all, that's why we're using the vfio api, so we can
re-use much of the existing infrastructure, especially for a vGPU that
exposes itself as a PCI device.  Thanks,

Alex



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, June 07, 2016 3:31 AM
> 
> On Mon, 6 Jun 2016 10:44:25 -0700
> Neo Jia  wrote:
> 
> > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > > On Sun, 5 Jun 2016 23:27:42 -0700
> > > Neo Jia  wrote:
> > >
> > > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > > This intends to handle an intercepted channel I/O instruction. It
> > > basically need to do the following thing:
> >
> > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at
> > first place?
> 
> Yep, this is my question as well.  It sounds a bit like there's an
> emulated device in QEMU that's trying to tell the mediated device when
> to start an operation when we probably should be passing through
> whatever i/o operations indicate that status directly to the mediated
> device. Thanks,
> 
> Alex

Below is copied from Dong's earlier post which said clear that
a guest cmd submission will trigger the whole flow:


Explanation:
Q1-Q4: Qemu side process.
K1-K6: Kernel side process.

Q1. Intercept a ssch instruction.
Q2. Translate the guest ccw program to a user space ccw program
(u_ccwchain).
Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
K1. Copy from u_ccwchain to kernel (k_ccwchain).
K2. Translate the user space ccw program to a kernel space ccw
program, which becomes runnable for a real device.
K3. With the necessary information contained in the orb passed in
by Qemu, issue the k_ccwchain to the device, and wait event q
for the I/O result.
K4. Interrupt handler gets the I/O result, and wakes up the wait q.
K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
update the user space irb.
K6. Copy irb and scsw back to user space.
Q4. Update the irb for the guest.


My understanding is that such thing belongs to how device is mediated
(so device driver specific), instead of something to be abstracted in 
VFIO which manages resource but doesn't care how resource is used.

Actually we have same requirement in vGPU case, that a guest driver 
needs submit GPU commands through some MMIO register. vGPU device 
model will intercept the submission request (in its own way), do its 
necessary scan/audit to ensure correctness/security, and then submit 
to physical GPU through vendor specific interface. 

No difference with channel I/O here.

Thanks
Kevin



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Alex Williamson
On Mon, 6 Jun 2016 10:44:25 -0700
Neo Jia  wrote:

> On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> > On Sun, 5 Jun 2016 23:27:42 -0700
> > Neo Jia  wrote:
> > 
> > 2. VFIO_DEVICE_CCW_CMD_REQUEST
> > This intends to handle an intercepted channel I/O instruction. It
> > basically need to do the following thing:  
> 
> May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at
> first place?

Yep, this is my question as well.  It sounds a bit like there's an
emulated device in QEMU that's trying to tell the mediated device when
to start an operation when we probably should be passing through
whatever i/o operations indicate that status directly to the mediated
device. Thanks,

Alex



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Neo Jia
On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote:
> On Sun, 5 Jun 2016 23:27:42 -0700
> Neo Jia  wrote:
> 
> 2. VFIO_DEVICE_CCW_CMD_REQUEST
> This intends to handle an intercepted channel I/O instruction. It
> basically need to do the following thing:

May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at
first place?

Thanks,
Neo

>   a. Copy the raw data of the CCW program (a group of chained CCWs) from
>  user into kernel space buffers.
>   b. Do CCW program translation based on the raw data to get a
>  real-device runnable CCW program. We'd pin pages for those CCWs
>  which have memory space pointers for their offload, and update the
>  CCW program with the pinned results (phys).
>   c. Issue the translated CCW program to a real-device to perform the
>  I/O operation, and wait for the I/O result interrupt.
>   d. Once we got the I/O result, copy the result back to user, and
>  unpin the pages.
> 
> Step c could only be done by the physical device driver, since it's it
> that the int_handler belongs to.
> Step b and d should be done by the physical device driver. Or we'd
> pin/unpin pages in the mediated device driver?
> 
> That's why I asked for the new callback.
> 



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Dong Jia
On Sun, 5 Jun 2016 23:27:42 -0700
Neo Jia  wrote:

> On Mon, Jun 06, 2016 at 02:01:48PM +0800, Dong Jia wrote:
> > On Mon, 6 Jun 2016 10:57:49 +0530
> > Kirti Wankhede  wrote:
> > 
> > > 
> > > 
> > > On 6/3/2016 2:27 PM, Dong Jia wrote:
> > > > On Wed, 25 May 2016 01:28:15 +0530
> > > > Kirti Wankhede  wrote:
> > > > 
> > > > 
> > > > ...snip...
> > > > 
> > > >> +struct phy_device_ops {
> > > >> +  struct module   *owner;
> > > >> +  const struct attribute_group **dev_attr_groups;
> > > >> +  const struct attribute_group **mdev_attr_groups;
> > > >> +
> > > >> +  int (*supported_config)(struct device *dev, char *config);
> > > >> +  int (*create)(struct device *dev, uuid_le uuid,
> > > >> +uint32_t instance, char *mdev_params);
> > > >> +  int (*destroy)(struct device *dev, uuid_le uuid,
> > > >> + uint32_t instance);
> > > >> +  int (*start)(uuid_le uuid);
> > > >> +  int (*shutdown)(uuid_le uuid);
> > > >> +  ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t 
> > > >> count,
> > > >> +  enum mdev_emul_space address_space, loff_t pos);
> > > >> +  ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t 
> > > >> count,
> > > >> +   enum mdev_emul_space address_space, loff_t 
> > > >> pos);
> > > >> +  int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> > > >> +  unsigned int index, unsigned int start,
> > > >> +  unsigned int count, void *data);
> > > >> +  int (*get_region_info)(struct mdev_device *vdev, int 
> > > >> region_index,
> > > >> +   struct pci_region_info *region_info);
> > > >> +  int (*validate_map_request)(struct mdev_device *vdev,
> > > >> +  unsigned long virtaddr,
> > > >> +  unsigned long *pfn, unsigned 
> > > >> long *size,
> > > >> +  pgprot_t *prot);
> > > >> +};
> > > > 
> > > > Dear Kirti:
> > > > 
> > > > When I rebased my vfio-ccw patches on this series, I found I need an
> > > > extra 'ioctl' callback in phy_device_ops.
> > > > 
> > > 
> > > Thanks for taking closer look. As per my knowledge ccw is not PCI
> > > device, right? Correct me if I'm wrong.
> > Dear Kirti:
> > 
> > You are right. CCW is different to PCI. The official term is 'Channel
> > I/O device'. They use 'Channels' (co-processors) and CCWs (channel
> > command words) to handle I/O operations.
> > 
> > > I'm curious to know. Are you planning to write a driver (vfio-mccw) for
> > > mediated ccw device?
> > I wrote two drivers:
> > 1. A vfio-pccw driver for the physical ccw device, which will reigister
> > the device and callbacks to mdev framework. With this, I could create
> > a mediated ccw device for the physical one then.
> > 2. A vfio-mccw driver for the mediated ccw device, which will add
> > itself to a vfio_group, mimiced what vfio-mpci did.
> > 
> > The problem is, vfio-mccw need to implement new ioctls besides the
> > existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need
> > the physical device help to handle.
> 
> Hi Dong,
> 
> Could you please help us understand a bit more about the new VFIO ioctl?
Dear Neo,

Sure, with pleasure.
Since I tried not to bring too much ccw specific technical details
here, I wrote quite briefly. Please feel free to ask me for
supplements. :>

> Since it is a new ioctl it is send down by QEMU in this case right?
Right.

> More details?
As mentioned in the former emails, I currently added two new ioctl
commands.

1. VFIO_DEVICE_CCW_HOT_RESET
Both the name and the purpose of this command looks pretty the same
with VFIO_DEVICE_PCI_HOT_RESET.
Since Kevin proposed an individual callback for hot-reset handling, I
believe this will not be a problem (only if you accept the proposal).

2. VFIO_DEVICE_CCW_CMD_REQUEST
This intends to handle an intercepted channel I/O instruction. It
basically need to do the following thing:
  a. Copy the raw data of the CCW program (a group of chained CCWs) from
 user into kernel space buffers.
  b. Do CCW program translation based on the raw data to get a
 real-device runnable CCW program. We'd pin pages for those CCWs
 which have memory space pointers for their offload, and update the
 CCW program with the pinned results (phys).
  c. Issue the translated CCW program to a real-device to perform the
 I/O operation, and wait for the I/O result interrupt.
  d. Once we got the I/O result, copy the result back to user, and
 unpin the pages.

Step c could only be done by the physical device driver, since it's it
that the int_handler belongs to.
Step b and d should be done by the physical device driver. Or we'd
pin/unpin pages in the mediated device driver?

That's why I asked for 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Neo Jia
On Mon, Jun 06, 2016 at 02:01:48PM +0800, Dong Jia wrote:
> On Mon, 6 Jun 2016 10:57:49 +0530
> Kirti Wankhede  wrote:
> 
> > 
> > 
> > On 6/3/2016 2:27 PM, Dong Jia wrote:
> > > On Wed, 25 May 2016 01:28:15 +0530
> > > Kirti Wankhede  wrote:
> > > 
> > > 
> > > ...snip...
> > > 
> > >> +struct phy_device_ops {
> > >> +struct module   *owner;
> > >> +const struct attribute_group **dev_attr_groups;
> > >> +const struct attribute_group **mdev_attr_groups;
> > >> +
> > >> +int (*supported_config)(struct device *dev, char *config);
> > >> +int (*create)(struct device *dev, uuid_le uuid,
> > >> +  uint32_t instance, char *mdev_params);
> > >> +int (*destroy)(struct device *dev, uuid_le uuid,
> > >> +   uint32_t instance);
> > >> +int (*start)(uuid_le uuid);
> > >> +int (*shutdown)(uuid_le uuid);
> > >> +ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t 
> > >> count,
> > >> +enum mdev_emul_space address_space, loff_t pos);
> > >> +ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t 
> > >> count,
> > >> + enum mdev_emul_space address_space, loff_t 
> > >> pos);
> > >> +int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> > >> +unsigned int index, unsigned int start,
> > >> +unsigned int count, void *data);
> > >> +int (*get_region_info)(struct mdev_device *vdev, int 
> > >> region_index,
> > >> + struct pci_region_info *region_info);
> > >> +int (*validate_map_request)(struct mdev_device *vdev,
> > >> +unsigned long virtaddr,
> > >> +unsigned long *pfn, unsigned 
> > >> long *size,
> > >> +pgprot_t *prot);
> > >> +};
> > > 
> > > Dear Kirti:
> > > 
> > > When I rebased my vfio-ccw patches on this series, I found I need an
> > > extra 'ioctl' callback in phy_device_ops.
> > > 
> > 
> > Thanks for taking closer look. As per my knowledge ccw is not PCI
> > device, right? Correct me if I'm wrong.
> Dear Kirti:
> 
> You are right. CCW is different to PCI. The official term is 'Channel
> I/O device'. They use 'Channels' (co-processors) and CCWs (channel
> command words) to handle I/O operations.
> 
> > I'm curious to know. Are you planning to write a driver (vfio-mccw) for
> > mediated ccw device?
> I wrote two drivers:
> 1. A vfio-pccw driver for the physical ccw device, which will reigister
> the device and callbacks to mdev framework. With this, I could create
> a mediated ccw device for the physical one then.
> 2. A vfio-mccw driver for the mediated ccw device, which will add
> itself to a vfio_group, mimiced what vfio-mpci did.
> 
> The problem is, vfio-mccw need to implement new ioctls besides the
> existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need
> the physical device help to handle.

Hi Dong,

Could you please help us understand a bit more about the new VFIO ioctl? Since 
it is
a new ioctl it is send down by QEMU in this case right? More details?

Thanks,
Neo

> 
> > 
> > Thanks,
> > Kirti
> > 
> > > The ccw physical device only supports one ccw mediated device. And I
> > > have two new ioctl commands for the ccw mediated device. One is 
> > > to hot-reset the resource in the physical device that allocated for
> > > the mediated device, the other is to do an I/O instruction translation
> > > and perform an I/O operation on the physical device. I found the
> > > existing callbacks could not meet my requirements.
> > > 
> > > Something like the following would be fine for my case:
> > >   int (*ioctl)(struct mdev_device *vdev,
> > >unsigned int cmd,
> > >unsigned long arg);
> > > 
> > > What do you think about this?
> > > 
> > > 
> > > Dong Jia
> > > 
> > 
> 
> 
> Dong Jia
> 



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-06 Thread Dong Jia
On Mon, 6 Jun 2016 10:57:49 +0530
Kirti Wankhede  wrote:

> 
> 
> On 6/3/2016 2:27 PM, Dong Jia wrote:
> > On Wed, 25 May 2016 01:28:15 +0530
> > Kirti Wankhede  wrote:
> > 
> > 
> > ...snip...
> > 
> >> +struct phy_device_ops {
> >> +  struct module   *owner;
> >> +  const struct attribute_group **dev_attr_groups;
> >> +  const struct attribute_group **mdev_attr_groups;
> >> +
> >> +  int (*supported_config)(struct device *dev, char *config);
> >> +  int (*create)(struct device *dev, uuid_le uuid,
> >> +uint32_t instance, char *mdev_params);
> >> +  int (*destroy)(struct device *dev, uuid_le uuid,
> >> + uint32_t instance);
> >> +  int (*start)(uuid_le uuid);
> >> +  int (*shutdown)(uuid_le uuid);
> >> +  ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count,
> >> +  enum mdev_emul_space address_space, loff_t pos);
> >> +  ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count,
> >> +   enum mdev_emul_space address_space, loff_t pos);
> >> +  int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> >> +  unsigned int index, unsigned int start,
> >> +  unsigned int count, void *data);
> >> +  int (*get_region_info)(struct mdev_device *vdev, int region_index,
> >> +   struct pci_region_info *region_info);
> >> +  int (*validate_map_request)(struct mdev_device *vdev,
> >> +  unsigned long virtaddr,
> >> +  unsigned long *pfn, unsigned long *size,
> >> +  pgprot_t *prot);
> >> +};
> > 
> > Dear Kirti:
> > 
> > When I rebased my vfio-ccw patches on this series, I found I need an
> > extra 'ioctl' callback in phy_device_ops.
> > 
> 
> Thanks for taking closer look. As per my knowledge ccw is not PCI
> device, right? Correct me if I'm wrong.
Dear Kirti:

You are right. CCW is different to PCI. The official term is 'Channel
I/O device'. They use 'Channels' (co-processors) and CCWs (channel
command words) to handle I/O operations.

> I'm curious to know. Are you planning to write a driver (vfio-mccw) for
> mediated ccw device?
I wrote two drivers:
1. A vfio-pccw driver for the physical ccw device, which will reigister
the device and callbacks to mdev framework. With this, I could create
a mediated ccw device for the physical one then.
2. A vfio-mccw driver for the mediated ccw device, which will add
itself to a vfio_group, mimiced what vfio-mpci did.

The problem is, vfio-mccw need to implement new ioctls besides the
existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need
the physical device help to handle.

> 
> Thanks,
> Kirti
> 
> > The ccw physical device only supports one ccw mediated device. And I
> > have two new ioctl commands for the ccw mediated device. One is 
> > to hot-reset the resource in the physical device that allocated for
> > the mediated device, the other is to do an I/O instruction translation
> > and perform an I/O operation on the physical device. I found the
> > existing callbacks could not meet my requirements.
> > 
> > Something like the following would be fine for my case:
> > int (*ioctl)(struct mdev_device *vdev,
> >  unsigned int cmd,
> >  unsigned long arg);
> > 
> > What do you think about this?
> > 
> > 
> > Dong Jia
> > 
> 


Dong Jia




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-05 Thread Kirti Wankhede


On 6/3/2016 2:27 PM, Dong Jia wrote:
> On Wed, 25 May 2016 01:28:15 +0530
> Kirti Wankhede  wrote:
> 
> 
> ...snip...
> 
>> +struct phy_device_ops {
>> +struct module   *owner;
>> +const struct attribute_group **dev_attr_groups;
>> +const struct attribute_group **mdev_attr_groups;
>> +
>> +int (*supported_config)(struct device *dev, char *config);
>> +int (*create)(struct device *dev, uuid_le uuid,
>> +  uint32_t instance, char *mdev_params);
>> +int (*destroy)(struct device *dev, uuid_le uuid,
>> +   uint32_t instance);
>> +int (*start)(uuid_le uuid);
>> +int (*shutdown)(uuid_le uuid);
>> +ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count,
>> +enum mdev_emul_space address_space, loff_t pos);
>> +ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count,
>> + enum mdev_emul_space address_space, loff_t pos);
>> +int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
>> +unsigned int index, unsigned int start,
>> +unsigned int count, void *data);
>> +int (*get_region_info)(struct mdev_device *vdev, int region_index,
>> + struct pci_region_info *region_info);
>> +int (*validate_map_request)(struct mdev_device *vdev,
>> +unsigned long virtaddr,
>> +unsigned long *pfn, unsigned long *size,
>> +pgprot_t *prot);
>> +};
> 
> Dear Kirti:
> 
> When I rebased my vfio-ccw patches on this series, I found I need an
> extra 'ioctl' callback in phy_device_ops.
> 

Thanks for taking closer look. As per my knowledge ccw is not PCI
device, right? Correct me if I'm wrong. I'm curious to know. Are you
planning to write a driver (vfio-mccw) for mediated ccw device?

Thanks,
Kirti

> The ccw physical device only supports one ccw mediated device. And I
> have two new ioctl commands for the ccw mediated device. One is 
> to hot-reset the resource in the physical device that allocated for
> the mediated device, the other is to do an I/O instruction translation
> and perform an I/O operation on the physical device. I found the
> existing callbacks could not meet my requirements.
> 
> Something like the following would be fine for my case:
>   int (*ioctl)(struct mdev_device *vdev,
>unsigned int cmd,
>unsigned long arg);
> 
> What do you think about this?
> 
> 
> Dong Jia
> 



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-05 Thread Dong Jia
On Fri, 3 Jun 2016 09:40:16 +
"Tian, Kevin"  wrote:

> > From: Dong Jia [mailto:bjsdj...@linux.vnet.ibm.com]
> > Sent: Friday, June 03, 2016 4:58 PM
> > 
> > 
> > ...snip...
> > 
> > > +struct phy_device_ops {
> > > + struct module   *owner;
> > > + const struct attribute_group **dev_attr_groups;
> > > + const struct attribute_group **mdev_attr_groups;
> > > +
> > > + int (*supported_config)(struct device *dev, char *config);
> > > + int (*create)(struct device *dev, uuid_le uuid,
> > > +   uint32_t instance, char *mdev_params);
> > > + int (*destroy)(struct device *dev, uuid_le uuid,
> > > +uint32_t instance);
> > > + int (*start)(uuid_le uuid);
> > > + int (*shutdown)(uuid_le uuid);
> > > + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count,
> > > + enum mdev_emul_space address_space, loff_t pos);
> > > + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count,
> > > +  enum mdev_emul_space address_space, loff_t pos);
> > > + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> > > + unsigned int index, unsigned int start,
> > > + unsigned int count, void *data);
> > > + int (*get_region_info)(struct mdev_device *vdev, int region_index,
> > > +  struct pci_region_info *region_info);
> > > + int (*validate_map_request)(struct mdev_device *vdev,
> > > + unsigned long virtaddr,
> > > + unsigned long *pfn, unsigned long *size,
> > > + pgprot_t *prot);
> > > +};
> > 
> > Dear Kirti:
> > 
> > When I rebased my vfio-ccw patches on this series, I found I need an
> > extra 'ioctl' callback in phy_device_ops.
> > 
> > The ccw physical device only supports one ccw mediated device. And I
> > have two new ioctl commands for the ccw mediated device. One is
> > to hot-reset the resource in the physical device that allocated for
> > the mediated device, the other is to do an I/O instruction translation
> > and perform an I/O operation on the physical device. I found the
> > existing callbacks could not meet my requirements.
> > 
> > Something like the following would be fine for my case:
> > int (*ioctl)(struct mdev_device *vdev,
> >  unsigned int cmd,
> >  unsigned long arg);
> > 
> > What do you think about this?
> > 
> 
> 'reset' should be generic. better to define an individual callback
> for it (then we can also expose a node under vgpu path in sysfs).
> 
Sounds reasonable for me. :>

> Thanks
> Kevin
> 




Dong Jia




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-03 Thread Tian, Kevin
> From: Dong Jia [mailto:bjsdj...@linux.vnet.ibm.com]
> Sent: Friday, June 03, 2016 4:58 PM
> 
> 
> ...snip...
> 
> > +struct phy_device_ops {
> > +   struct module   *owner;
> > +   const struct attribute_group **dev_attr_groups;
> > +   const struct attribute_group **mdev_attr_groups;
> > +
> > +   int (*supported_config)(struct device *dev, char *config);
> > +   int (*create)(struct device *dev, uuid_le uuid,
> > + uint32_t instance, char *mdev_params);
> > +   int (*destroy)(struct device *dev, uuid_le uuid,
> > +  uint32_t instance);
> > +   int (*start)(uuid_le uuid);
> > +   int (*shutdown)(uuid_le uuid);
> > +   ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count,
> > +   enum mdev_emul_space address_space, loff_t pos);
> > +   ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count,
> > +enum mdev_emul_space address_space, loff_t pos);
> > +   int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> > +   unsigned int index, unsigned int start,
> > +   unsigned int count, void *data);
> > +   int (*get_region_info)(struct mdev_device *vdev, int region_index,
> > +struct pci_region_info *region_info);
> > +   int (*validate_map_request)(struct mdev_device *vdev,
> > +   unsigned long virtaddr,
> > +   unsigned long *pfn, unsigned long *size,
> > +   pgprot_t *prot);
> > +};
> 
> Dear Kirti:
> 
> When I rebased my vfio-ccw patches on this series, I found I need an
> extra 'ioctl' callback in phy_device_ops.
> 
> The ccw physical device only supports one ccw mediated device. And I
> have two new ioctl commands for the ccw mediated device. One is
> to hot-reset the resource in the physical device that allocated for
> the mediated device, the other is to do an I/O instruction translation
> and perform an I/O operation on the physical device. I found the
> existing callbacks could not meet my requirements.
> 
> Something like the following would be fine for my case:
>   int (*ioctl)(struct mdev_device *vdev,
>unsigned int cmd,
>unsigned long arg);
> 
> What do you think about this?
> 

'reset' should be generic. better to define an individual callback
for it (then we can also expose a node under vgpu path in sysfs).

Thanks
Kevin



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-06-03 Thread Dong Jia
On Wed, 25 May 2016 01:28:15 +0530
Kirti Wankhede  wrote:


...snip...

> +struct phy_device_ops {
> + struct module   *owner;
> + const struct attribute_group **dev_attr_groups;
> + const struct attribute_group **mdev_attr_groups;
> +
> + int (*supported_config)(struct device *dev, char *config);
> + int (*create)(struct device *dev, uuid_le uuid,
> +   uint32_t instance, char *mdev_params);
> + int (*destroy)(struct device *dev, uuid_le uuid,
> +uint32_t instance);
> + int (*start)(uuid_le uuid);
> + int (*shutdown)(uuid_le uuid);
> + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count,
> + enum mdev_emul_space address_space, loff_t pos);
> + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count,
> +  enum mdev_emul_space address_space, loff_t pos);
> + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags,
> + unsigned int index, unsigned int start,
> + unsigned int count, void *data);
> + int (*get_region_info)(struct mdev_device *vdev, int region_index,
> +  struct pci_region_info *region_info);
> + int (*validate_map_request)(struct mdev_device *vdev,
> + unsigned long virtaddr,
> + unsigned long *pfn, unsigned long *size,
> + pgprot_t *prot);
> +};

Dear Kirti:

When I rebased my vfio-ccw patches on this series, I found I need an
extra 'ioctl' callback in phy_device_ops.

The ccw physical device only supports one ccw mediated device. And I
have two new ioctl commands for the ccw mediated device. One is 
to hot-reset the resource in the physical device that allocated for
the mediated device, the other is to do an I/O instruction translation
and perform an I/O operation on the physical device. I found the
existing callbacks could not meet my requirements.

Something like the following would be fine for my case:
int (*ioctl)(struct mdev_device *vdev,
 unsigned int cmd,
 unsigned long arg);

What do you think about this?


Dong Jia




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-27 Thread Tian, Kevin
> From: Kirti Wankhede
> Sent: Wednesday, May 25, 2016 10:47 PM
> 
> 
> >> +static struct devices_list {
> >> +  struct list_headdev_list;
> >> +  struct mutexlist_lock;
> >> +} mdevices, phy_devices;
> >
> > phy_devices -> pdevices? and similarly we can use pdev/mdev
> > pair in other places...
> >
> 
> 'pdevices' sometimes also refers to 'pointer to devices' that's the
> reason I perfer to use phy_devices to represent 'physical devices'

well, I think it should be clear in this context where 'p' means 'physical'.
Just like frequently used pdev in pci files for pci_dev...

> 
> 
> >> +static struct mdev_device *find_mdev_device(uuid_le uuid, int instance)
> >
> > can we just call it "struct mdev* or "mdevice"? "dev_device" looks
> redundant.
> >
> 
> 'struct mdev_device' represents 'device structure for device created by
> mdev module'. Still that doesn't satisfy major folks, I'm open to change
> it.

IMO 'mdev' should be clearly enough to represent a mediated device

> 
> 
> > Sorry I may have to ask same question since I didn't get an answer yet.
> > what exactly does 'instance' mean here? since uuid is unique, why do
> > we need match instance too?
> >
> 
> 'uuid' could be UUID of a VM for whom it is created. To support mutiple
> mediated devices for same VM, name should be unique. Hence we need a
> instance number to identify each mediated device uniquely in one VM.

If UUID alone cannot universally identify a mediated device, what's the
point of explicitly tagging it in kernel? Either we assign a new UUID for
this mdev itself, or possibly better define it as a string? Management 
stack can pass any ID/name in string format which is sufficiently to identify 
mdev to its own purpose? Then in this framework we just do simple
string match...

> 
> 
> 
> >> +  if (phy_dev->ops->destroy) {
> >> +  if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid,
> >> +mdevice->instance)) {
> >> +  mutex_unlock(_devices.list_lock);
> >
> > a warning message is preferred. Also better to return -EBUSY here.
> >
> 
> mdev_destroy_device() is called from 2 paths, one is sysfs mdev_destroy
> and mdev_unregister_device(). For the later case, return from here will
> any ways ignored. mdev_unregister_device() is called from the remove
> function of physical device and that doesn't care about return error, it
> just removes the device from subsystem.

Regardless of whether caller will handle error, this function itself should
return error since it makes sense in other path e.g. sysfs to let user
know what's happening.

> 
> >> +  return;
> >> +  }
> >> +  }
> >> +
> >> +  mdev_remove_attribute_group(>dev,
> >> +  phy_dev->ops->mdev_attr_groups);
> >> +  mdevice->phy_dev = NULL;
> >
> > Am I missing something here? You didn't remove this mdev node from
> > the list, and below...
> >
> 
> device_unregister() calls put_device(dev) and if refcount is zero its
> release function is called, which is mdev_device_release(), that is
> hooked during device_register(). This node is removed from list from
> mdev_device_release().

I'm not sure whether there'll be some race condition here, since you
put a defunc mdev on the list... 

> 
> >> +  phy_dev->dev = dev;
> >> +  phy_dev->ops = ops;
> >> +
> >> +  mutex_lock(_devices.list_lock);
> >> +  ret = mdev_create_sysfs_files(dev);
> >> +  if (ret)
> >> +  goto add_sysfs_error;
> >> +
> >> +  ret = mdev_add_attribute_group(dev, ops->dev_attr_groups);
> >> +  if (ret)
> >> +  goto add_group_error;
> >
> > any reason to include sysfs operations inside the mutex which is
> > purely about phy_devices list?
> >
> 
> dev_attr_groups attribute is for physical device, hence inside
> phy_devices.list_lock.

phy_devices.list_lock only protects the list, when you plan to add a
new phy_device node after it's initialized and get ready. sysfs
attribute setup is still part of initialization.


Thanks
Kevin



Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-26 Thread Alex Williamson
On Thu, 26 May 2016 14:33:39 +0530
Kirti Wankhede  wrote:

> Thanks Alex.
> 
> I'll consider all the nits and fix those in next version of patch.
> 
> More below:
> 
> On 5/26/2016 4:09 AM, Alex Williamson wrote:
> > On Wed, 25 May 2016 01:28:15 +0530
> > Kirti Wankhede  wrote:
> >  
> 
> ...
> 
> >> +
> >> +config MDEV
> >> +tristate "Mediated device driver framework"
> >> +depends on VFIO
> >> +default n
> >> +help
> >> +MDEV provides a framework to virtualize device without  
> SR-IOV cap
> >> +See Documentation/mdev.txt for more details.  
> >
> > I don't see that file anywhere in this series.  
> 
> Yes, missed this file in this patch. I'll add it in next version of patch.
> Since mdev module is moved in vfio directory, should I place this file
> in vfio directory, Documentation/vfio/mdev.txt? or keep documentation of
> mdev module within vfio.txt itself?

Maybe just call it vfio-mediated-device.txt

> >> +  if (phy_dev) {
> >> +  mutex_lock(_devices.list_lock);
> >> +
> >> +  /*
> >> +  * If vendor driver doesn't return success that means vendor
> >> +  * driver doesn't support hot-unplug
> >> +  */
> >> +  if (phy_dev->ops->destroy) {
> >> +  if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid,
> >> +mdevice->instance)) {
> >> +  mutex_unlock(_devices.list_lock);
> >> +  return;
> >> +  }
> >> +  }
> >> +
> >> +  mdev_remove_attribute_group(>dev,
> >> +  phy_dev->ops->mdev_attr_groups);
> >> +  mdevice->phy_dev = NULL;
> >> +  mutex_unlock(_devices.list_lock);  
> >
> > Locking here appears arbitrary, how does the above code interact with
> > phy_devices.dev_list?
> >  
> 
> Sorry for not being clear about phy_devices.list_lock, probably I
> shouldn't have named it 'list_lock'. This lock is also to synchronize
> register_device & unregister_device and physical device specific
> callbacks: supported_config, create, destroy, start and shutdown.
> Although supported_config, create and destroy are per phy_device
> specific callbacks while start and shutdown could refer to multiple
> phy_devices indirectly when there are multiple mdev devices of same type
> on different physical devices. There could be race condition in start
> callback and destroy & unregister_device. I'm revisiting this lock again
> and will see to use per phy device lock for phy_device specific callbacks.
> 
> 
> >> +struct mdev_device {
> >> +  struct kref kref;
> >> +  struct device   dev;
> >> +  struct phy_device   *phy_dev;
> >> +  struct iommu_group  *group;
> >> +  void*iommu_data;
> >> +  uuid_le uuid;
> >> +  uint32_tinstance;
> >> +  void*driver_data;
> >> +  struct mutexops_lock;
> >> +  struct list_headnext;
> >> +};  
> >
> > Could this be in the private header?  Seems like this should be opaque
> > outside of mdev core.
> >  
> 
> No, this structure is used in mediated device call back functions to
> vendor driver so that vendor driver could identify mdev device, similar
> to pci_dev structure in pci bus subsystem. (I'll remove kref which is
> not being used at all.)

Personally I'd prefer to see more use of reference counting and less
locking, especially since the locking is mostly ineffective in this
version.

> >> + * @read: Read emulation callback
> >> + *@mdev: mediated device structure
> >> + *@buf: read buffer
> >> + *@count: number bytes to read
> >> + *@address_space: specifies for which address
> >> + *space the request is: pci_config_space, IO
> >> + *register space or MMIO space.  
> >
> > Seems like I asked before and it's no more clear in the code, how do we
> > handle multiple spaces for various types?  ie. a device might have
> > multiple MMIO spaces.
> >  
> >> + *@pos: offset from base address.  
> 
> Sorry, updated the code but missed to update comment here.
> pos = base_address + offset
> (its not 'pos' anymore, will rename it to addr)
> 
> so vendor driver is aware about base addresses of multiple MMIO spaces
> and its size, they can identify MMIO space based on addr.

Why not let the vendor driver provide vfio_region_info directly,
including the offset within the device file descriptor thedn the
mediated device core simply pass read/write through without caring what
the address space is?  Thanks,

Alex
 
> >> +/*
> >> + * Physical Device
> >> + */
> >> +struct phy_device {
> >> +  struct device   *dev;
> >> +  const struct phy_device_ops *ops;
> >> +  struct list_head  

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-26 Thread Kirti Wankhede
Thanks Alex.

I'll consider all the nits and fix those in next version of patch.

More below:

On 5/26/2016 4:09 AM, Alex Williamson wrote:
> On Wed, 25 May 2016 01:28:15 +0530
> Kirti Wankhede  wrote:
>

...

>> +
>> +config MDEV
>> +tristate "Mediated device driver framework"
>> +depends on VFIO
>> +default n
>> +help
>> +MDEV provides a framework to virtualize device without
SR-IOV cap
>> +See Documentation/mdev.txt for more details.
>
> I don't see that file anywhere in this series.

Yes, missed this file in this patch. I'll add it in next version of patch.
Since mdev module is moved in vfio directory, should I place this file
in vfio directory, Documentation/vfio/mdev.txt? or keep documentation of
mdev module within vfio.txt itself?


>> +if (phy_dev) {
>> +mutex_lock(_devices.list_lock);
>> +
>> +/*
>> +* If vendor driver doesn't return success that means vendor
>> +* driver doesn't support hot-unplug
>> +*/
>> +if (phy_dev->ops->destroy) {
>> +if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid,
>> +  mdevice->instance)) {
>> +mutex_unlock(_devices.list_lock);
>> +return;
>> +}
>> +}
>> +
>> +mdev_remove_attribute_group(>dev,
>> +phy_dev->ops->mdev_attr_groups);
>> +mdevice->phy_dev = NULL;
>> +mutex_unlock(_devices.list_lock);
>
> Locking here appears arbitrary, how does the above code interact with
> phy_devices.dev_list?
>

Sorry for not being clear about phy_devices.list_lock, probably I
shouldn't have named it 'list_lock'. This lock is also to synchronize
register_device & unregister_device and physical device specific
callbacks: supported_config, create, destroy, start and shutdown.
Although supported_config, create and destroy are per phy_device
specific callbacks while start and shutdown could refer to multiple
phy_devices indirectly when there are multiple mdev devices of same type
on different physical devices. There could be race condition in start
callback and destroy & unregister_device. I'm revisiting this lock again
and will see to use per phy device lock for phy_device specific callbacks.


>> +struct mdev_device {
>> +struct kref kref;
>> +struct device   dev;
>> +struct phy_device   *phy_dev;
>> +struct iommu_group  *group;
>> +void*iommu_data;
>> +uuid_le uuid;
>> +uint32_tinstance;
>> +void*driver_data;
>> +struct mutexops_lock;
>> +struct list_headnext;
>> +};
>
> Could this be in the private header?  Seems like this should be opaque
> outside of mdev core.
>

No, this structure is used in mediated device call back functions to
vendor driver so that vendor driver could identify mdev device, similar
to pci_dev structure in pci bus subsystem. (I'll remove kref which is
not being used at all.)


>> + * @read:   Read emulation callback
>> + *  @mdev: mediated device structure
>> + *  @buf: read buffer
>> + *  @count: number bytes to read
>> + *  @address_space: specifies for which address
>> + *  space the request is: pci_config_space, IO
>> + *  register space or MMIO space.
>
> Seems like I asked before and it's no more clear in the code, how do we
> handle multiple spaces for various types?  ie. a device might have
> multiple MMIO spaces.
>
>> + *  @pos: offset from base address.

Sorry, updated the code but missed to update comment here.
pos = base_address + offset
(its not 'pos' anymore, will rename it to addr)

so vendor driver is aware about base addresses of multiple MMIO spaces
and its size, they can identify MMIO space based on addr.

>> +/*
>> + * Physical Device
>> + */
>> +struct phy_device {
>> +struct device   *dev;
>> +const struct phy_device_ops *ops;
>> +struct list_headnext;
>> +};
>
> I would really like to be able to use the mediated device interface to
> create a purely virtual device, is the expectation that my physical
> device interface would create a virtual struct device which would
> become the parent and control point in sysfs for creating all the mdev
> devices? Should we be calling this a host_device or mdev_parent_dev in
> that case since there's really no requirement that it be a physical
> device?

Makes sense. I'll rename it to parent_device.

Thanks,
Kirti.




Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-25 Thread Alex Williamson
On Wed, 25 May 2016 01:28:15 +0530
Kirti Wankhede  wrote:

> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by differnt drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |  mdev | +>+  |<-> VFIO user
>  | |  bus  | | probe()/remove()| vfio_mpci.ko |APIs
>  | |  driver   | | |  |
>  | |   | | +--+
>  | |   | |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |   | +>+  |<-> VFIO user
>  | +---+ | probe()/remove()| vfio_mccw.ko |APIs
>  |   | |  |
>  |  MDEV CORE| +--+
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
>   Return 1 if given device can be handled by given driver and
>   zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>int  (*match)(struct device *dev);
>  struct device_driverdriver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - start: to initiate mediated device initialization process from vendor
>driver when VM boots and before QEMU starts.
> - shutdown: to teardown mediated device resources during VM teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that QEMU sets.
> - get_region_info: to provide region size and its flags for the mediated
>  device.
> - validate_map_request: to validate remap pfn request.

nit, vfio is a userspace driver interface where QEMU is simply a user
of that interface.  We should never assume QEMU is the only user.

> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> 
> 

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-25 Thread Kirti Wankhede

On 5/25/2016 1:25 PM, Tian, Kevin wrote:
>> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
>> Sent: Wednesday, May 25, 2016 3:58 AM
>>

...

>> +
>> +config MDEV
>> +tristate "Mediated device driver framework"
>
> Sorry not a native speaker. Is it cleaner to say "Driver framework for
Mediated
> Devices" or "Mediated Device Framework"? Should we focus on driver or
device
> here?
>

Both, device and driver. This framework provides way to register
physical *devices* and also register *driver* for mediated devices.


>> +depends on VFIO
>> +default n
>> +help
>> +MDEV provides a framework to virtualize device without
SR-IOV cap
>> +See Documentation/mdev.txt for more details.
>
> Looks Documentation/mdev.txt is not included in this version.
>

Yes, will have Documentation/mdev.txt in next version of patch.


>> +static struct devices_list {
>> +struct list_headdev_list;
>> +struct mutexlist_lock;
>> +} mdevices, phy_devices;
>
> phy_devices -> pdevices? and similarly we can use pdev/mdev
> pair in other places...
>

'pdevices' sometimes also refers to 'pointer to devices' that's the
reason I perfer to use phy_devices to represent 'physical devices'


>> +static struct mdev_device *find_mdev_device(uuid_le uuid, int instance)
>
> can we just call it "struct mdev* or "mdevice"? "dev_device" looks
redundant.
>

'struct mdev_device' represents 'device structure for device created by
mdev module'. Still that doesn't satisfy major folks, I'm open to change
it.


> Sorry I may have to ask same question since I didn't get an answer yet.
> what exactly does 'instance' mean here? since uuid is unique, why do
> we need match instance too?
>

'uuid' could be UUID of a VM for whom it is created. To support mutiple
mediated devices for same VM, name should be unique. Hence we need a
instance number to identify each mediated device uniquely in one VM.



>> +if (phy_dev->ops->destroy) {
>> +if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid,
>> +  mdevice->instance)) {
>> +mutex_unlock(_devices.list_lock);
>
> a warning message is preferred. Also better to return -EBUSY here.
>

mdev_destroy_device() is called from 2 paths, one is sysfs mdev_destroy
and mdev_unregister_device(). For the later case, return from here will
any ways ignored. mdev_unregister_device() is called from the remove
function of physical device and that doesn't care about return error, it
just removes the device from subsystem.

>> +return;
>> +}
>> +}
>> +
>> +mdev_remove_attribute_group(>dev,
>> +phy_dev->ops->mdev_attr_groups);
>> +mdevice->phy_dev = NULL;
>
> Am I missing something here? You didn't remove this mdev node from
> the list, and below...
>

device_unregister() calls put_device(dev) and if refcount is zero its
release function is called, which is mdev_device_release(), that is
hooked during device_register(). This node is removed from list from
mdev_device_release().


>> +mutex_unlock(_devices.list_lock);
>
> you should use mutex of mdevices list
>

No, this lock is for phy_dev.


>> +phy_dev->dev = dev;
>> +phy_dev->ops = ops;
>> +
>> +mutex_lock(_devices.list_lock);
>> +ret = mdev_create_sysfs_files(dev);
>> +if (ret)
>> +goto add_sysfs_error;
>> +
>> +ret = mdev_add_attribute_group(dev, ops->dev_attr_groups);
>> +if (ret)
>> +goto add_group_error;
>
> any reason to include sysfs operations inside the mutex which is
> purely about phy_devices list?
>

dev_attr_groups attribute is for physical device, hence inside
phy_devices.list_lock.

* @dev_attr_groups:Default attributes of the physical device.


>> +void mdev_unregister_device(struct device *dev)
>> +{
>> +struct phy_device *phy_dev;
>> +struct mdev_device *vdev = NULL;
>> +
>> +phy_dev = find_physical_device(dev);
>> +
>> +if (!phy_dev)
>> +return;
>> +
>> +dev_info(dev, "MDEV: Unregistering\n");
>> +
>> +while ((vdev = find_next_mdev_device(phy_dev)))
>> +mdev_destroy_device(vdev);
>
> Need check return value here since ops->destroy may fail.
>

See my comment above.


>> +static void mdev_device_release(struct device *dev)
>
> what's the difference between this release and earlier destroy version?
>

See my comment above


>> +static void __exit mdev_exit(void)
>> +{
>
> should we check any remaining mdev/pdev which are not cleaned
> up correctly here?
>

If there are any physical device registered with this module then the
usage count is not zero so rmmod would anyways fail.

All mdev_devices, which are created for any physical device,  are
destroyed from mdev_unregister_device(physial_device);

Hence no need to explicitly add the code here which would never get used.

>> +

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-25 Thread Tian, Kevin
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Wednesday, May 25, 2016 3:58 AM
> 
> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by differnt drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |  mdev | +>+  |<-> VFIO user
>  | |  bus  | | probe()/remove()| vfio_mpci.ko |APIs
>  | |  driver   | | |  |
>  | |   | | +--+
>  | |   | |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |   | +>+  |<-> VFIO user
>  | +---+ | probe()/remove()| vfio_mccw.ko |APIs
>  |   | |  |
>  |  MDEV CORE| +--+
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
>   Return 1 if given device can be handled by given driver and
>   zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>int  (*match)(struct device *dev);
>  struct device_driverdriver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - start: to initiate mediated device initialization process from vendor
>driver when VM boots and before QEMU starts.
> - shutdown: to teardown mediated device resources during VM teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that QEMU sets.
> - get_region_info: to provide region size and its flags for the mediated
>  device.
> - validate_map_request: to validate remap pfn request.
> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: