Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Tue, 7 Jun 2016 22:29:30 -0600 Alex Williamson wrote: > On Wed, 8 Jun 2016 11:18:42 +0800 > Dong Jia wrote: > > > On Tue, 7 Jun 2016 19:39:21 -0600 > > Alex Williamson wrote: > > > > > On Wed, 8 Jun 2016 01:18:42 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > > > This intends to handle an intercepted channel I/O > > > > > > > > > instruction. It > > > > > > > > > basically need to do the following thing: > > > > > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such > > > > > > > > VFIO ioctl at > > > > > > > > first place? > > > > > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > > > > emulated device in QEMU that's trying to tell the mediated device > > > > > > > when > > > > > > > to start an operation when we probably should be passing through > > > > > > > whatever i/o operations indicate that status directly to the > > > > > > > mediated > > > > > > > device. Thanks, > > > > > > > > > > > > > > Alex > > > > > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > > > > > > > Explanation: > > > > > > Q1-Q4: Qemu side process. > > > > > > K1-K6: Kernel side process. > > > > > > > > > > > > Q1. Intercept a ssch instruction. > > > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > > > (u_ccwchain). > > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > > > program, which becomes runnable for a real device. > > > > > > K3. With the necessary information contained in the orb passed > > > > > > in > > > > > > by Qemu, issue the k_ccwchain to the device, and wait event > > > > > > q > > > > > > for the I/O result. > > > > > > K4. Interrupt handler gets the I/O result, and wakes up the > > > > > > wait q. > > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result > > > > > > to > > > > > > update the user space irb. > > > > > > K6. Copy irb and scsw back to user space. > > > > > > Q4. Update the irb for the guest. > > > > > > > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > > > QEMU if those are operations that are not visible to the mediated > > > > > device; which they very well might be, since it's described as an > > > > > instruction rather than an i/o operation. It's not terrible if that's > > > > > the case, vfio-pci has its own ioctl for doing a hot reset. > > Dear Alex, Kevin and Neo, > > > > 'ssch' is a privileged I/O instruction, which should be finally issued > > to the dedicated subchannel of the physical device. > > > > BTW, I did remove step Q2 with all of the user-space translation code, > > according to your comments in another thread. > > > > > > > > > > > > > > > > > > > > > My understanding is that such thing belongs to how device is > > > > > > mediated > > > > > > (so device driver specific), instead of something to be abstracted > > > > > > in > > > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > > > model will intercept the submission request (in its own way), do its > > > > > > necessary scan/audit to ensure correctness/security, and then submit > > > > > > to physical GPU through vendor specific interface. > > > > > > > > > > > > No difference with channel I/O here. > > > > > > > > > > Well, if the GPU command is submitted through an MMIO register, is > > > > > that > > > > > MMIO register part of the mediated device? If so, could the mediated > > > > > device recognize the command and do the scan/audit itself? QEMU must > > > > > not be the point at which mediation occurs for security purposes, QEMU > > > > > is userspace and userspace is not to be trusted. I'm still
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, Jun 08, 2016 at 02:13:49PM +0800, Dong Jia wrote: > On Tue, 7 Jun 2016 20:48:42 -0700 > Neo Jia wrote: > > > On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote: > > > On Tue, 7 Jun 2016 19:39:21 -0600 > > > Alex Williamson wrote: > > > > > > > On Wed, 8 Jun 2016 01:18:42 + > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > > > > This intends to handle an intercepted channel I/O > > > > > > > > > > instruction. It > > > > > > > > > > basically need to do the following thing: > > > > > > > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such > > > > > > > > > VFIO ioctl at > > > > > > > > > first place? > > > > > > > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's > > > > > > > > an > > > > > > > > emulated device in QEMU that's trying to tell the mediated > > > > > > > > device when > > > > > > > > to start an operation when we probably should be passing through > > > > > > > > whatever i/o operations indicate that status directly to the > > > > > > > > mediated > > > > > > > > device. Thanks, > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > > > > > > > > > > Explanation: > > > > > > > Q1-Q4: Qemu side process. > > > > > > > K1-K6: Kernel side process. > > > > > > > > > > > > > > Q1. Intercept a ssch instruction. > > > > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > > > > (u_ccwchain). > > > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > > > > program, which becomes runnable for a real device. > > > > > > > K3. With the necessary information contained in the orb > > > > > > > passed in > > > > > > > by Qemu, issue the k_ccwchain to the device, and wait > > > > > > > event q > > > > > > > for the I/O result. > > > > > > > K4. Interrupt handler gets the I/O result, and wakes up the > > > > > > > wait q. > > > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the > > > > > > > result to > > > > > > > update the user space irb. > > > > > > > K6. Copy irb and scsw back to user space. > > > > > > > Q4. Update the irb for the guest. > > > > > > > > > > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no > > > > > > longer > > > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > > > > QEMU if those are operations that are not visible to the mediated > > > > > > device; which they very well might be, since it's described as an > > > > > > instruction rather than an i/o operation. It's not terrible if > > > > > > that's > > > > > > the case, vfio-pci has its own ioctl for doing a hot reset. > > > Dear Alex, Kevin and Neo, > > > > > > 'ssch' is a privileged I/O instruction, which should be finally issued > > > to the dedicated subchannel of the physical device. > > > > > > BTW, I did remove step Q2 with all of the user-space translation code, > > > according to your comments in another thread. > > > > > > > > > > > > > > > > > > > > > > > > > > My understanding is that such thing belongs to how device is > > > > > > > mediated > > > > > > > (so device driver specific), instead of something to be > > > > > > > abstracted in > > > > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > > > > > > > Actually we have same requirement in vGPU case, that a guest > > > > > > > driver > > > > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > > > > model will intercept the submission request (in its own way), do > > > > > > > its > > > > > > > necessary scan/audit to ensure correctness/security, and then > > > > > > > submit > > > > > > > to physical GPU through vendor specific interface. > > > > > > > > > > > > > > No difference with channel I/O here. > > > > > > > > > > > > Well, if the GPU command is submitted through an MMIO register, is
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Tue, 7 Jun 2016 20:48:42 -0700 Neo Jia wrote: > On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote: > > On Tue, 7 Jun 2016 19:39:21 -0600 > > Alex Williamson wrote: > > > > > On Wed, 8 Jun 2016 01:18:42 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > > > "Tian, Kevin" wrote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > > > This intends to handle an intercepted channel I/O > > > > > > > > > instruction. It > > > > > > > > > basically need to do the following thing: > > > > > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such > > > > > > > > VFIO ioctl at > > > > > > > > first place? > > > > > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > > > > emulated device in QEMU that's trying to tell the mediated device > > > > > > > when > > > > > > > to start an operation when we probably should be passing through > > > > > > > whatever i/o operations indicate that status directly to the > > > > > > > mediated > > > > > > > device. Thanks, > > > > > > > > > > > > > > Alex > > > > > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > > > > > > > Explanation: > > > > > > Q1-Q4: Qemu side process. > > > > > > K1-K6: Kernel side process. > > > > > > > > > > > > Q1. Intercept a ssch instruction. > > > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > > > (u_ccwchain). > > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > > > program, which becomes runnable for a real device. > > > > > > K3. With the necessary information contained in the orb passed > > > > > > in > > > > > > by Qemu, issue the k_ccwchain to the device, and wait event > > > > > > q > > > > > > for the I/O result. > > > > > > K4. Interrupt handler gets the I/O result, and wakes up the > > > > > > wait q. > > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result > > > > > > to > > > > > > update the user space irb. > > > > > > K6. Copy irb and scsw back to user space. > > > > > > Q4. Update the irb for the guest. > > > > > > > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > > > QEMU if those are operations that are not visible to the mediated > > > > > device; which they very well might be, since it's described as an > > > > > instruction rather than an i/o operation. It's not terrible if that's > > > > > the case, vfio-pci has its own ioctl for doing a hot reset. > > Dear Alex, Kevin and Neo, > > > > 'ssch' is a privileged I/O instruction, which should be finally issued > > to the dedicated subchannel of the physical device. > > > > BTW, I did remove step Q2 with all of the user-space translation code, > > according to your comments in another thread. > > > > > > > > > > > > > > > > > > > > > My understanding is that such thing belongs to how device is > > > > > > mediated > > > > > > (so device driver specific), instead of something to be abstracted > > > > > > in > > > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > > > model will intercept the submission request (in its own way), do its > > > > > > necessary scan/audit to ensure correctness/security, and then submit > > > > > > to physical GPU through vendor specific interface. > > > > > > > > > > > > No difference with channel I/O here. > > > > > > > > > > Well, if the GPU command is submitted through an MMIO register, is > > > > > that > > > > > MMIO register part of the mediated device? If so, could the mediated > > > > > device recognize the command and do the scan/audit itself? QEMU must > > > > > not be the point at which mediation occurs for security purposes, QEMU > > > > > is userspace and userspace is not to be trusted. I'm still open to > > > > > ioctls where
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, 8 Jun 2016 11:18:42 +0800 Dong Jia wrote: > On Tue, 7 Jun 2016 19:39:21 -0600 > Alex Williamson wrote: > > > On Wed, 8 Jun 2016 01:18:42 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > > Neo Jia wrote: > > > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > > This intends to handle an intercepted channel I/O instruction. > > > > > > > > It > > > > > > > > basically need to do the following thing: > > > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such > > > > > > > VFIO ioctl at > > > > > > > first place? > > > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > > > emulated device in QEMU that's trying to tell the mediated device > > > > > > when > > > > > > to start an operation when we probably should be passing through > > > > > > whatever i/o operations indicate that status directly to the > > > > > > mediated > > > > > > device. Thanks, > > > > > > > > > > > > Alex > > > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > > > > Explanation: > > > > > Q1-Q4: Qemu side process. > > > > > K1-K6: Kernel side process. > > > > > > > > > > Q1. Intercept a ssch instruction. > > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > > (u_ccwchain). > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > > program, which becomes runnable for a real device. > > > > > K3. With the necessary information contained in the orb passed in > > > > > by Qemu, issue the k_ccwchain to the device, and wait event q > > > > > for the I/O result. > > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait > > > > > q. > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > > > > > update the user space irb. > > > > > K6. Copy irb and scsw back to user space. > > > > > Q4. Update the irb for the guest. > > > > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > > QEMU if those are operations that are not visible to the mediated > > > > device; which they very well might be, since it's described as an > > > > instruction rather than an i/o operation. It's not terrible if that's > > > > the case, vfio-pci has its own ioctl for doing a hot reset. > Dear Alex, Kevin and Neo, > > 'ssch' is a privileged I/O instruction, which should be finally issued > to the dedicated subchannel of the physical device. > > BTW, I did remove step Q2 with all of the user-space translation code, > according to your comments in another thread. > > > > > > > > > > > > > > > > My understanding is that such thing belongs to how device is mediated > > > > > (so device driver specific), instead of something to be abstracted in > > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > > model will intercept the submission request (in its own way), do its > > > > > necessary scan/audit to ensure correctness/security, and then submit > > > > > to physical GPU through vendor specific interface. > > > > > > > > > > No difference with channel I/O here. > > > > > > > > Well, if the GPU command is submitted through an MMIO register, is that > > > > MMIO register part of the mediated device? If so, could the mediated > > > > device recognize the command and do the scan/audit itself? QEMU must > > > > not be the point at which mediation occurs for security purposes, QEMU > > > > is userspace and userspace is not to be trusted. I'm still open to > > > > ioctls where it makes sense, as above, we have PCI specific ioctls and > > > > already, but we need to evaluate each one, why it needs to exist, and > > > > whether we can skip it if the mediated device can trigger the action on > > > > its own. After all, that's why we're using the vfio api, so we can > > > > re-use much of
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, Jun 08, 2016 at 11:18:42AM +0800, Dong Jia wrote: > On Tue, 7 Jun 2016 19:39:21 -0600 > Alex Williamson wrote: > > > On Wed, 8 Jun 2016 01:18:42 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > > "Tian, Kevin" wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > > Neo Jia wrote: > > > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > > This intends to handle an intercepted channel I/O instruction. > > > > > > > > It > > > > > > > > basically need to do the following thing: > > > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such > > > > > > > VFIO ioctl at > > > > > > > first place? > > > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > > > emulated device in QEMU that's trying to tell the mediated device > > > > > > when > > > > > > to start an operation when we probably should be passing through > > > > > > whatever i/o operations indicate that status directly to the > > > > > > mediated > > > > > > device. Thanks, > > > > > > > > > > > > Alex > > > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > > > > Explanation: > > > > > Q1-Q4: Qemu side process. > > > > > K1-K6: Kernel side process. > > > > > > > > > > Q1. Intercept a ssch instruction. > > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > > (u_ccwchain). > > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > > program, which becomes runnable for a real device. > > > > > K3. With the necessary information contained in the orb passed in > > > > > by Qemu, issue the k_ccwchain to the device, and wait event q > > > > > for the I/O result. > > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait > > > > > q. > > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > > > > > update the user space irb. > > > > > K6. Copy irb and scsw back to user space. > > > > > Q4. Update the irb for the guest. > > > > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > > QEMU if those are operations that are not visible to the mediated > > > > device; which they very well might be, since it's described as an > > > > instruction rather than an i/o operation. It's not terrible if that's > > > > the case, vfio-pci has its own ioctl for doing a hot reset. > Dear Alex, Kevin and Neo, > > 'ssch' is a privileged I/O instruction, which should be finally issued > to the dedicated subchannel of the physical device. > > BTW, I did remove step Q2 with all of the user-space translation code, > according to your comments in another thread. > > > > > > > > > > > > > > > > My understanding is that such thing belongs to how device is mediated > > > > > (so device driver specific), instead of something to be abstracted in > > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > > model will intercept the submission request (in its own way), do its > > > > > necessary scan/audit to ensure correctness/security, and then submit > > > > > to physical GPU through vendor specific interface. > > > > > > > > > > No difference with channel I/O here. > > > > > > > > Well, if the GPU command is submitted through an MMIO register, is that > > > > MMIO register part of the mediated device? If so, could the mediated > > > > device recognize the command and do the scan/audit itself? QEMU must > > > > not be the point at which mediation occurs for security purposes, QEMU > > > > is userspace and userspace is not to be trusted. I'm still open to > > > > ioctls where it makes sense, as above, we have PCI specific ioctls and > > > > already, but we need to evaluate each one, why it needs to exist, and > > > > whether we can skip it if the mediated device can trigger the action on > > > > its own. After all, that's why we're using the vfio api, so we can > > > > re-use much of the existing infr
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Tue, 7 Jun 2016 19:39:21 -0600 Alex Williamson wrote: > On Wed, 8 Jun 2016 01:18:42 + > "Tian, Kevin" wrote: > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > > > On Tue, 7 Jun 2016 03:03:32 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > > Neo Jia wrote: > > > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > > Neo Jia wrote: > > > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > > This intends to handle an intercepted channel I/O instruction. It > > > > > > > basically need to do the following thing: > > > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO > > > > > > ioctl at > > > > > > first place? > > > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > > emulated device in QEMU that's trying to tell the mediated device when > > > > > to start an operation when we probably should be passing through > > > > > whatever i/o operations indicate that status directly to the mediated > > > > > device. Thanks, > > > > > > > > > > Alex > > > > > > > > Below is copied from Dong's earlier post which said clear that > > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > > > > Explanation: > > > > Q1-Q4: Qemu side process. > > > > K1-K6: Kernel side process. > > > > > > > > Q1. Intercept a ssch instruction. > > > > Q2. Translate the guest ccw program to a user space ccw program > > > > (u_ccwchain). > > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > > K2. Translate the user space ccw program to a kernel space ccw > > > > program, which becomes runnable for a real device. > > > > K3. With the necessary information contained in the orb passed in > > > > by Qemu, issue the k_ccwchain to the device, and wait event q > > > > for the I/O result. > > > > K4. Interrupt handler gets the I/O result, and wakes up the wait q. > > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > > > > update the user space irb. > > > > K6. Copy irb and scsw back to user space. > > > > Q4. Update the irb for the guest. > > > > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > > QEMU if those are operations that are not visible to the mediated > > > device; which they very well might be, since it's described as an > > > instruction rather than an i/o operation. It's not terrible if that's > > > the case, vfio-pci has its own ioctl for doing a hot reset. Dear Alex, Kevin and Neo, 'ssch' is a privileged I/O instruction, which should be finally issued to the dedicated subchannel of the physical device. BTW, I did remove step Q2 with all of the user-space translation code, according to your comments in another thread. > > > > > > > > > > > My understanding is that such thing belongs to how device is mediated > > > > (so device driver specific), instead of something to be abstracted in > > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > > needs submit GPU commands through some MMIO register. vGPU device > > > > model will intercept the submission request (in its own way), do its > > > > necessary scan/audit to ensure correctness/security, and then submit > > > > to physical GPU through vendor specific interface. > > > > > > > > No difference with channel I/O here. > > > > > > Well, if the GPU command is submitted through an MMIO register, is that > > > MMIO register part of the mediated device? If so, could the mediated > > > device recognize the command and do the scan/audit itself? QEMU must > > > not be the point at which mediation occurs for security purposes, QEMU > > > is userspace and userspace is not to be trusted. I'm still open to > > > ioctls where it makes sense, as above, we have PCI specific ioctls and > > > already, but we need to evaluate each one, why it needs to exist, and > > > whether we can skip it if the mediated device can trigger the action on > > > its own. After all, that's why we're using the vfio api, so we can > > > re-use much of the existing infrastructure, especially for a vGPU that > > > exposes itself as a PCI device. Thanks, > > > > > > > My point is that a guest submission on vGPU is just a normal trapped > > register write, which is forwarded from Qemu to VFIO through pwrite > > interface and then hit mediated vGPU device. The mediated device
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, 8 Jun 2016 01:18:42 + "Tian, Kevin" wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Wednesday, June 08, 2016 6:42 AM > > > > On Tue, 7 Jun 2016 03:03:32 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > > Neo Jia wrote: > > > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > > Neo Jia wrote: > > > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > > This intends to handle an intercepted channel I/O instruction. It > > > > > > basically need to do the following thing: > > > > > > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO > > > > > ioctl at > > > > > first place? > > > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > > emulated device in QEMU that's trying to tell the mediated device when > > > > to start an operation when we probably should be passing through > > > > whatever i/o operations indicate that status directly to the mediated > > > > device. Thanks, > > > > > > > > Alex > > > > > > Below is copied from Dong's earlier post which said clear that > > > a guest cmd submission will trigger the whole flow: > > > > > > > > > Explanation: > > > Q1-Q4: Qemu side process. > > > K1-K6: Kernel side process. > > > > > > Q1. Intercept a ssch instruction. > > > Q2. Translate the guest ccw program to a user space ccw program > > > (u_ccwchain). > > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > > K2. Translate the user space ccw program to a kernel space ccw > > > program, which becomes runnable for a real device. > > > K3. With the necessary information contained in the orb passed in > > > by Qemu, issue the k_ccwchain to the device, and wait event q > > > for the I/O result. > > > K4. Interrupt handler gets the I/O result, and wakes up the wait q. > > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > > > update the user space irb. > > > K6. Copy irb and scsw back to user space. > > > Q4. Update the irb for the guest. > > > > > > > Right, but this was the pre-mediated device approach, now we no longer > > need step Q2 so we really only need Q1 and therefore Q3 to exist in > > QEMU if those are operations that are not visible to the mediated > > device; which they very well might be, since it's described as an > > instruction rather than an i/o operation. It's not terrible if that's > > the case, vfio-pci has its own ioctl for doing a hot reset. > > > > > > > > My understanding is that such thing belongs to how device is mediated > > > (so device driver specific), instead of something to be abstracted in > > > VFIO which manages resource but doesn't care how resource is used. > > > > > > Actually we have same requirement in vGPU case, that a guest driver > > > needs submit GPU commands through some MMIO register. vGPU device > > > model will intercept the submission request (in its own way), do its > > > necessary scan/audit to ensure correctness/security, and then submit > > > to physical GPU through vendor specific interface. > > > > > > No difference with channel I/O here. > > > > Well, if the GPU command is submitted through an MMIO register, is that > > MMIO register part of the mediated device? If so, could the mediated > > device recognize the command and do the scan/audit itself? QEMU must > > not be the point at which mediation occurs for security purposes, QEMU > > is userspace and userspace is not to be trusted. I'm still open to > > ioctls where it makes sense, as above, we have PCI specific ioctls and > > already, but we need to evaluate each one, why it needs to exist, and > > whether we can skip it if the mediated device can trigger the action on > > its own. After all, that's why we're using the vfio api, so we can > > re-use much of the existing infrastructure, especially for a vGPU that > > exposes itself as a PCI device. Thanks, > > > > My point is that a guest submission on vGPU is just a normal trapped > register write, which is forwarded from Qemu to VFIO through pwrite > interface and then hit mediated vGPU device. The mediated device > will recognize this register write as a submission request and then do > necessary scan (looks we are saying same thing) and then submit to > physical device driver. If loading ccw cmds on channel i/o are also > through some I/O registers, it can be implemented same way w/o > introducing new ioctl. The r/w handler of mediated device can figure > out whether it's a ccw submission or not. But my understanding might > be wrong here. I think we're in violent agreement ;)
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Wednesday, June 08, 2016 6:42 AM > > On Tue, 7 Jun 2016 03:03:32 + > "Tian, Kevin" wrote: > > > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > > Neo Jia wrote: > > > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > > Neo Jia wrote: > > > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > > This intends to handle an intercepted channel I/O instruction. It > > > > > basically need to do the following thing: > > > > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO > > > > ioctl at > > > > first place? > > > > > > Yep, this is my question as well. It sounds a bit like there's an > > > emulated device in QEMU that's trying to tell the mediated device when > > > to start an operation when we probably should be passing through > > > whatever i/o operations indicate that status directly to the mediated > > > device. Thanks, > > > > > > Alex > > > > Below is copied from Dong's earlier post which said clear that > > a guest cmd submission will trigger the whole flow: > > > > > > Explanation: > > Q1-Q4: Qemu side process. > > K1-K6: Kernel side process. > > > > Q1. Intercept a ssch instruction. > > Q2. Translate the guest ccw program to a user space ccw program > > (u_ccwchain). > > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > > K1. Copy from u_ccwchain to kernel (k_ccwchain). > > K2. Translate the user space ccw program to a kernel space ccw > > program, which becomes runnable for a real device. > > K3. With the necessary information contained in the orb passed in > > by Qemu, issue the k_ccwchain to the device, and wait event q > > for the I/O result. > > K4. Interrupt handler gets the I/O result, and wakes up the wait q. > > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > > update the user space irb. > > K6. Copy irb and scsw back to user space. > > Q4. Update the irb for the guest. > > > > Right, but this was the pre-mediated device approach, now we no longer > need step Q2 so we really only need Q1 and therefore Q3 to exist in > QEMU if those are operations that are not visible to the mediated > device; which they very well might be, since it's described as an > instruction rather than an i/o operation. It's not terrible if that's > the case, vfio-pci has its own ioctl for doing a hot reset. > > > My understanding is that such thing belongs to how device is mediated > > (so device driver specific), instead of something to be abstracted in > > VFIO which manages resource but doesn't care how resource is used. > > > > Actually we have same requirement in vGPU case, that a guest driver > > needs submit GPU commands through some MMIO register. vGPU device > > model will intercept the submission request (in its own way), do its > > necessary scan/audit to ensure correctness/security, and then submit > > to physical GPU through vendor specific interface. > > > > No difference with channel I/O here. > > Well, if the GPU command is submitted through an MMIO register, is that > MMIO register part of the mediated device? If so, could the mediated > device recognize the command and do the scan/audit itself? QEMU must > not be the point at which mediation occurs for security purposes, QEMU > is userspace and userspace is not to be trusted. I'm still open to > ioctls where it makes sense, as above, we have PCI specific ioctls and > already, but we need to evaluate each one, why it needs to exist, and > whether we can skip it if the mediated device can trigger the action on > its own. After all, that's why we're using the vfio api, so we can > re-use much of the existing infrastructure, especially for a vGPU that > exposes itself as a PCI device. Thanks, > My point is that a guest submission on vGPU is just a normal trapped register write, which is forwarded from Qemu to VFIO through pwrite interface and then hit mediated vGPU device. The mediated device will recognize this register write as a submission request and then do necessary scan (looks we are saying same thing) and then submit to physical device driver. If loading ccw cmds on channel i/o are also through some I/O registers, it can be implemented same way w/o introducing new ioctl. The r/w handler of mediated device can figure out whether it's a ccw submission or not. But my understanding might be wrong here. Thanks Kevin
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Tue, 7 Jun 2016 03:03:32 + "Tian, Kevin" wrote: > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Tuesday, June 07, 2016 3:31 AM > > > > On Mon, 6 Jun 2016 10:44:25 -0700 > > Neo Jia wrote: > > > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > > Neo Jia wrote: > > > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > > This intends to handle an intercepted channel I/O instruction. It > > > > basically need to do the following thing: > > > > > > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl > > > at > > > first place? > > > > Yep, this is my question as well. It sounds a bit like there's an > > emulated device in QEMU that's trying to tell the mediated device when > > to start an operation when we probably should be passing through > > whatever i/o operations indicate that status directly to the mediated > > device. Thanks, > > > > Alex > > Below is copied from Dong's earlier post which said clear that > a guest cmd submission will trigger the whole flow: > > > Explanation: > Q1-Q4: Qemu side process. > K1-K6: Kernel side process. > > Q1. Intercept a ssch instruction. > Q2. Translate the guest ccw program to a user space ccw program > (u_ccwchain). > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). > K1. Copy from u_ccwchain to kernel (k_ccwchain). > K2. Translate the user space ccw program to a kernel space ccw > program, which becomes runnable for a real device. > K3. With the necessary information contained in the orb passed in > by Qemu, issue the k_ccwchain to the device, and wait event q > for the I/O result. > K4. Interrupt handler gets the I/O result, and wakes up the wait q. > K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to > update the user space irb. > K6. Copy irb and scsw back to user space. > Q4. Update the irb for the guest. > Right, but this was the pre-mediated device approach, now we no longer need step Q2 so we really only need Q1 and therefore Q3 to exist in QEMU if those are operations that are not visible to the mediated device; which they very well might be, since it's described as an instruction rather than an i/o operation. It's not terrible if that's the case, vfio-pci has its own ioctl for doing a hot reset. > My understanding is that such thing belongs to how device is mediated > (so device driver specific), instead of something to be abstracted in > VFIO which manages resource but doesn't care how resource is used. > > Actually we have same requirement in vGPU case, that a guest driver > needs submit GPU commands through some MMIO register. vGPU device > model will intercept the submission request (in its own way), do its > necessary scan/audit to ensure correctness/security, and then submit > to physical GPU through vendor specific interface. > > No difference with channel I/O here. Well, if the GPU command is submitted through an MMIO register, is that MMIO register part of the mediated device? If so, could the mediated device recognize the command and do the scan/audit itself? QEMU must not be the point at which mediation occurs for security purposes, QEMU is userspace and userspace is not to be trusted. I'm still open to ioctls where it makes sense, as above, we have PCI specific ioctls and already, but we need to evaluate each one, why it needs to exist, and whether we can skip it if the mediated device can trigger the action on its own. After all, that's why we're using the vfio api, so we can re-use much of the existing infrastructure, especially for a vGPU that exposes itself as a PCI device. Thanks, Alex
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Tuesday, June 07, 2016 3:31 AM > > On Mon, 6 Jun 2016 10:44:25 -0700 > Neo Jia wrote: > > > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > > On Sun, 5 Jun 2016 23:27:42 -0700 > > > Neo Jia wrote: > > > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > > This intends to handle an intercepted channel I/O instruction. It > > > basically need to do the following thing: > > > > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at > > first place? > > Yep, this is my question as well. It sounds a bit like there's an > emulated device in QEMU that's trying to tell the mediated device when > to start an operation when we probably should be passing through > whatever i/o operations indicate that status directly to the mediated > device. Thanks, > > Alex Below is copied from Dong's earlier post which said clear that a guest cmd submission will trigger the whole flow: Explanation: Q1-Q4: Qemu side process. K1-K6: Kernel side process. Q1. Intercept a ssch instruction. Q2. Translate the guest ccw program to a user space ccw program (u_ccwchain). Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb). K1. Copy from u_ccwchain to kernel (k_ccwchain). K2. Translate the user space ccw program to a kernel space ccw program, which becomes runnable for a real device. K3. With the necessary information contained in the orb passed in by Qemu, issue the k_ccwchain to the device, and wait event q for the I/O result. K4. Interrupt handler gets the I/O result, and wakes up the wait q. K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to update the user space irb. K6. Copy irb and scsw back to user space. Q4. Update the irb for the guest. My understanding is that such thing belongs to how device is mediated (so device driver specific), instead of something to be abstracted in VFIO which manages resource but doesn't care how resource is used. Actually we have same requirement in vGPU case, that a guest driver needs submit GPU commands through some MMIO register. vGPU device model will intercept the submission request (in its own way), do its necessary scan/audit to ensure correctness/security, and then submit to physical GPU through vendor specific interface. No difference with channel I/O here. Thanks Kevin
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Mon, 6 Jun 2016 10:44:25 -0700 Neo Jia wrote: > On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > > On Sun, 5 Jun 2016 23:27:42 -0700 > > Neo Jia wrote: > > > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > > This intends to handle an intercepted channel I/O instruction. It > > basically need to do the following thing: > > May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at > first place? Yep, this is my question as well. It sounds a bit like there's an emulated device in QEMU that's trying to tell the mediated device when to start an operation when we probably should be passing through whatever i/o operations indicate that status directly to the mediated device. Thanks, Alex
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Mon, Jun 06, 2016 at 04:29:11PM +0800, Dong Jia wrote: > On Sun, 5 Jun 2016 23:27:42 -0700 > Neo Jia wrote: > > 2. VFIO_DEVICE_CCW_CMD_REQUEST > This intends to handle an intercepted channel I/O instruction. It > basically need to do the following thing: May I ask how and when QEMU knows that he needs to issue such VFIO ioctl at first place? Thanks, Neo > a. Copy the raw data of the CCW program (a group of chained CCWs) from > user into kernel space buffers. > b. Do CCW program translation based on the raw data to get a > real-device runnable CCW program. We'd pin pages for those CCWs > which have memory space pointers for their offload, and update the > CCW program with the pinned results (phys). > c. Issue the translated CCW program to a real-device to perform the > I/O operation, and wait for the I/O result interrupt. > d. Once we got the I/O result, copy the result back to user, and > unpin the pages. > > Step c could only be done by the physical device driver, since it's it > that the int_handler belongs to. > Step b and d should be done by the physical device driver. Or we'd > pin/unpin pages in the mediated device driver? > > That's why I asked for the new callback. >
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Sun, 5 Jun 2016 23:27:42 -0700 Neo Jia wrote: > On Mon, Jun 06, 2016 at 02:01:48PM +0800, Dong Jia wrote: > > On Mon, 6 Jun 2016 10:57:49 +0530 > > Kirti Wankhede wrote: > > > > > > > > > > > On 6/3/2016 2:27 PM, Dong Jia wrote: > > > > On Wed, 25 May 2016 01:28:15 +0530 > > > > Kirti Wankhede wrote: > > > > > > > > > > > > ...snip... > > > > > > > >> +struct phy_device_ops { > > > >> + struct module *owner; > > > >> + const struct attribute_group **dev_attr_groups; > > > >> + const struct attribute_group **mdev_attr_groups; > > > >> + > > > >> + int (*supported_config)(struct device *dev, char *config); > > > >> + int (*create)(struct device *dev, uuid_le uuid, > > > >> +uint32_t instance, char *mdev_params); > > > >> + int (*destroy)(struct device *dev, uuid_le uuid, > > > >> + uint32_t instance); > > > >> + int (*start)(uuid_le uuid); > > > >> + int (*shutdown)(uuid_le uuid); > > > >> + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t > > > >> count, > > > >> + enum mdev_emul_space address_space, loff_t pos); > > > >> + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t > > > >> count, > > > >> + enum mdev_emul_space address_space, loff_t > > > >> pos); > > > >> + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > > > >> + unsigned int index, unsigned int start, > > > >> + unsigned int count, void *data); > > > >> + int (*get_region_info)(struct mdev_device *vdev, int > > > >> region_index, > > > >> + struct pci_region_info *region_info); > > > >> + int (*validate_map_request)(struct mdev_device *vdev, > > > >> + unsigned long virtaddr, > > > >> + unsigned long *pfn, unsigned > > > >> long *size, > > > >> + pgprot_t *prot); > > > >> +}; > > > > > > > > Dear Kirti: > > > > > > > > When I rebased my vfio-ccw patches on this series, I found I need an > > > > extra 'ioctl' callback in phy_device_ops. > > > > > > > > > > Thanks for taking closer look. As per my knowledge ccw is not PCI > > > device, right? Correct me if I'm wrong. > > Dear Kirti: > > > > You are right. CCW is different to PCI. The official term is 'Channel > > I/O device'. They use 'Channels' (co-processors) and CCWs (channel > > command words) to handle I/O operations. > > > > > I'm curious to know. Are you planning to write a driver (vfio-mccw) for > > > mediated ccw device? > > I wrote two drivers: > > 1. A vfio-pccw driver for the physical ccw device, which will reigister > > the device and callbacks to mdev framework. With this, I could create > > a mediated ccw device for the physical one then. > > 2. A vfio-mccw driver for the mediated ccw device, which will add > > itself to a vfio_group, mimiced what vfio-mpci did. > > > > The problem is, vfio-mccw need to implement new ioctls besides the > > existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need > > the physical device help to handle. > > Hi Dong, > > Could you please help us understand a bit more about the new VFIO ioctl? Dear Neo, Sure, with pleasure. Since I tried not to bring too much ccw specific technical details here, I wrote quite briefly. Please feel free to ask me for supplements. :> > Since it is a new ioctl it is send down by QEMU in this case right? Right. > More details? As mentioned in the former emails, I currently added two new ioctl commands. 1. VFIO_DEVICE_CCW_HOT_RESET Both the name and the purpose of this command looks pretty the same with VFIO_DEVICE_PCI_HOT_RESET. Since Kevin proposed an individual callback for hot-reset handling, I believe this will not be a problem (only if you accept the proposal). 2. VFIO_DEVICE_CCW_CMD_REQUEST This intends to handle an intercepted channel I/O instruction. It basically need to do the following thing: a. Copy the raw data of the CCW program (a group of chained CCWs) from user into kernel space buffers. b. Do CCW program translation based on the raw data to get a real-device runnable CCW program. We'd pin pages for those CCWs which have memory space pointers for their offload, and update the CCW program with the pinned results (phys). c. Issue the translated CCW program to a real-device to perform the I/O operation, and wait for the I/O result interrupt. d. Once we got the I/O result, copy the result back to user, and unpin the pages. Step c could only be done by the physical device driver, since it's it that the int_handler belongs to. Step b and d should be done by the physical device driver. Or we'd pin/unpin pages in the mediated device driver? That's why I asked for the new callback. > > Thanks, > Neo > > > > > > > > > Than
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Mon, Jun 06, 2016 at 02:01:48PM +0800, Dong Jia wrote: > On Mon, 6 Jun 2016 10:57:49 +0530 > Kirti Wankhede wrote: > > > > > > > On 6/3/2016 2:27 PM, Dong Jia wrote: > > > On Wed, 25 May 2016 01:28:15 +0530 > > > Kirti Wankhede wrote: > > > > > > > > > ...snip... > > > > > >> +struct phy_device_ops { > > >> +struct module *owner; > > >> +const struct attribute_group **dev_attr_groups; > > >> +const struct attribute_group **mdev_attr_groups; > > >> + > > >> +int (*supported_config)(struct device *dev, char *config); > > >> +int (*create)(struct device *dev, uuid_le uuid, > > >> + uint32_t instance, char *mdev_params); > > >> +int (*destroy)(struct device *dev, uuid_le uuid, > > >> + uint32_t instance); > > >> +int (*start)(uuid_le uuid); > > >> +int (*shutdown)(uuid_le uuid); > > >> +ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t > > >> count, > > >> +enum mdev_emul_space address_space, loff_t pos); > > >> +ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t > > >> count, > > >> + enum mdev_emul_space address_space, loff_t > > >> pos); > > >> +int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > > >> +unsigned int index, unsigned int start, > > >> +unsigned int count, void *data); > > >> +int (*get_region_info)(struct mdev_device *vdev, int > > >> region_index, > > >> + struct pci_region_info *region_info); > > >> +int (*validate_map_request)(struct mdev_device *vdev, > > >> +unsigned long virtaddr, > > >> +unsigned long *pfn, unsigned > > >> long *size, > > >> +pgprot_t *prot); > > >> +}; > > > > > > Dear Kirti: > > > > > > When I rebased my vfio-ccw patches on this series, I found I need an > > > extra 'ioctl' callback in phy_device_ops. > > > > > > > Thanks for taking closer look. As per my knowledge ccw is not PCI > > device, right? Correct me if I'm wrong. > Dear Kirti: > > You are right. CCW is different to PCI. The official term is 'Channel > I/O device'. They use 'Channels' (co-processors) and CCWs (channel > command words) to handle I/O operations. > > > I'm curious to know. Are you planning to write a driver (vfio-mccw) for > > mediated ccw device? > I wrote two drivers: > 1. A vfio-pccw driver for the physical ccw device, which will reigister > the device and callbacks to mdev framework. With this, I could create > a mediated ccw device for the physical one then. > 2. A vfio-mccw driver for the mediated ccw device, which will add > itself to a vfio_group, mimiced what vfio-mpci did. > > The problem is, vfio-mccw need to implement new ioctls besides the > existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need > the physical device help to handle. Hi Dong, Could you please help us understand a bit more about the new VFIO ioctl? Since it is a new ioctl it is send down by QEMU in this case right? More details? Thanks, Neo > > > > > Thanks, > > Kirti > > > > > The ccw physical device only supports one ccw mediated device. And I > > > have two new ioctl commands for the ccw mediated device. One is > > > to hot-reset the resource in the physical device that allocated for > > > the mediated device, the other is to do an I/O instruction translation > > > and perform an I/O operation on the physical device. I found the > > > existing callbacks could not meet my requirements. > > > > > > Something like the following would be fine for my case: > > > int (*ioctl)(struct mdev_device *vdev, > > >unsigned int cmd, > > >unsigned long arg); > > > > > > What do you think about this? > > > > > > > > > Dong Jia > > > > > > > > Dong Jia >
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Mon, 6 Jun 2016 10:57:49 +0530 Kirti Wankhede wrote: > > > On 6/3/2016 2:27 PM, Dong Jia wrote: > > On Wed, 25 May 2016 01:28:15 +0530 > > Kirti Wankhede wrote: > > > > > > ...snip... > > > >> +struct phy_device_ops { > >> + struct module *owner; > >> + const struct attribute_group **dev_attr_groups; > >> + const struct attribute_group **mdev_attr_groups; > >> + > >> + int (*supported_config)(struct device *dev, char *config); > >> + int (*create)(struct device *dev, uuid_le uuid, > >> +uint32_t instance, char *mdev_params); > >> + int (*destroy)(struct device *dev, uuid_le uuid, > >> + uint32_t instance); > >> + int (*start)(uuid_le uuid); > >> + int (*shutdown)(uuid_le uuid); > >> + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count, > >> + enum mdev_emul_space address_space, loff_t pos); > >> + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count, > >> + enum mdev_emul_space address_space, loff_t pos); > >> + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > >> + unsigned int index, unsigned int start, > >> + unsigned int count, void *data); > >> + int (*get_region_info)(struct mdev_device *vdev, int region_index, > >> + struct pci_region_info *region_info); > >> + int (*validate_map_request)(struct mdev_device *vdev, > >> + unsigned long virtaddr, > >> + unsigned long *pfn, unsigned long *size, > >> + pgprot_t *prot); > >> +}; > > > > Dear Kirti: > > > > When I rebased my vfio-ccw patches on this series, I found I need an > > extra 'ioctl' callback in phy_device_ops. > > > > Thanks for taking closer look. As per my knowledge ccw is not PCI > device, right? Correct me if I'm wrong. Dear Kirti: You are right. CCW is different to PCI. The official term is 'Channel I/O device'. They use 'Channels' (co-processors) and CCWs (channel command words) to handle I/O operations. > I'm curious to know. Are you planning to write a driver (vfio-mccw) for > mediated ccw device? I wrote two drivers: 1. A vfio-pccw driver for the physical ccw device, which will reigister the device and callbacks to mdev framework. With this, I could create a mediated ccw device for the physical one then. 2. A vfio-mccw driver for the mediated ccw device, which will add itself to a vfio_group, mimiced what vfio-mpci did. The problem is, vfio-mccw need to implement new ioctls besides the existing ones (VFIO_DEVICE_GET_INFO, etc). And these ioctls really need the physical device help to handle. > > Thanks, > Kirti > > > The ccw physical device only supports one ccw mediated device. And I > > have two new ioctl commands for the ccw mediated device. One is > > to hot-reset the resource in the physical device that allocated for > > the mediated device, the other is to do an I/O instruction translation > > and perform an I/O operation on the physical device. I found the > > existing callbacks could not meet my requirements. > > > > Something like the following would be fine for my case: > > int (*ioctl)(struct mdev_device *vdev, > > unsigned int cmd, > > unsigned long arg); > > > > What do you think about this? > > > > > > Dong Jia > > > Dong Jia
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On 6/3/2016 2:27 PM, Dong Jia wrote: > On Wed, 25 May 2016 01:28:15 +0530 > Kirti Wankhede wrote: > > > ...snip... > >> +struct phy_device_ops { >> +struct module *owner; >> +const struct attribute_group **dev_attr_groups; >> +const struct attribute_group **mdev_attr_groups; >> + >> +int (*supported_config)(struct device *dev, char *config); >> +int (*create)(struct device *dev, uuid_le uuid, >> + uint32_t instance, char *mdev_params); >> +int (*destroy)(struct device *dev, uuid_le uuid, >> + uint32_t instance); >> +int (*start)(uuid_le uuid); >> +int (*shutdown)(uuid_le uuid); >> +ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count, >> +enum mdev_emul_space address_space, loff_t pos); >> +ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count, >> + enum mdev_emul_space address_space, loff_t pos); >> +int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, >> +unsigned int index, unsigned int start, >> +unsigned int count, void *data); >> +int (*get_region_info)(struct mdev_device *vdev, int region_index, >> + struct pci_region_info *region_info); >> +int (*validate_map_request)(struct mdev_device *vdev, >> +unsigned long virtaddr, >> +unsigned long *pfn, unsigned long *size, >> +pgprot_t *prot); >> +}; > > Dear Kirti: > > When I rebased my vfio-ccw patches on this series, I found I need an > extra 'ioctl' callback in phy_device_ops. > Thanks for taking closer look. As per my knowledge ccw is not PCI device, right? Correct me if I'm wrong. I'm curious to know. Are you planning to write a driver (vfio-mccw) for mediated ccw device? Thanks, Kirti > The ccw physical device only supports one ccw mediated device. And I > have two new ioctl commands for the ccw mediated device. One is > to hot-reset the resource in the physical device that allocated for > the mediated device, the other is to do an I/O instruction translation > and perform an I/O operation on the physical device. I found the > existing callbacks could not meet my requirements. > > Something like the following would be fine for my case: > int (*ioctl)(struct mdev_device *vdev, >unsigned int cmd, >unsigned long arg); > > What do you think about this? > > > Dong Jia >
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Fri, 3 Jun 2016 09:40:16 + "Tian, Kevin" wrote: > > From: Dong Jia [mailto:bjsdj...@linux.vnet.ibm.com] > > Sent: Friday, June 03, 2016 4:58 PM > > > > > > ...snip... > > > > > +struct phy_device_ops { > > > + struct module *owner; > > > + const struct attribute_group **dev_attr_groups; > > > + const struct attribute_group **mdev_attr_groups; > > > + > > > + int (*supported_config)(struct device *dev, char *config); > > > + int (*create)(struct device *dev, uuid_le uuid, > > > + uint32_t instance, char *mdev_params); > > > + int (*destroy)(struct device *dev, uuid_le uuid, > > > +uint32_t instance); > > > + int (*start)(uuid_le uuid); > > > + int (*shutdown)(uuid_le uuid); > > > + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count, > > > + enum mdev_emul_space address_space, loff_t pos); > > > + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count, > > > + enum mdev_emul_space address_space, loff_t pos); > > > + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > > > + unsigned int index, unsigned int start, > > > + unsigned int count, void *data); > > > + int (*get_region_info)(struct mdev_device *vdev, int region_index, > > > + struct pci_region_info *region_info); > > > + int (*validate_map_request)(struct mdev_device *vdev, > > > + unsigned long virtaddr, > > > + unsigned long *pfn, unsigned long *size, > > > + pgprot_t *prot); > > > +}; > > > > Dear Kirti: > > > > When I rebased my vfio-ccw patches on this series, I found I need an > > extra 'ioctl' callback in phy_device_ops. > > > > The ccw physical device only supports one ccw mediated device. And I > > have two new ioctl commands for the ccw mediated device. One is > > to hot-reset the resource in the physical device that allocated for > > the mediated device, the other is to do an I/O instruction translation > > and perform an I/O operation on the physical device. I found the > > existing callbacks could not meet my requirements. > > > > Something like the following would be fine for my case: > > int (*ioctl)(struct mdev_device *vdev, > > unsigned int cmd, > > unsigned long arg); > > > > What do you think about this? > > > > 'reset' should be generic. better to define an individual callback > for it (then we can also expose a node under vgpu path in sysfs). > Sounds reasonable for me. :> > Thanks > Kevin > Dong Jia
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
> From: Dong Jia [mailto:bjsdj...@linux.vnet.ibm.com] > Sent: Friday, June 03, 2016 4:58 PM > > > ...snip... > > > +struct phy_device_ops { > > + struct module *owner; > > + const struct attribute_group **dev_attr_groups; > > + const struct attribute_group **mdev_attr_groups; > > + > > + int (*supported_config)(struct device *dev, char *config); > > + int (*create)(struct device *dev, uuid_le uuid, > > + uint32_t instance, char *mdev_params); > > + int (*destroy)(struct device *dev, uuid_le uuid, > > + uint32_t instance); > > + int (*start)(uuid_le uuid); > > + int (*shutdown)(uuid_le uuid); > > + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count, > > + enum mdev_emul_space address_space, loff_t pos); > > + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count, > > +enum mdev_emul_space address_space, loff_t pos); > > + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > > + unsigned int index, unsigned int start, > > + unsigned int count, void *data); > > + int (*get_region_info)(struct mdev_device *vdev, int region_index, > > +struct pci_region_info *region_info); > > + int (*validate_map_request)(struct mdev_device *vdev, > > + unsigned long virtaddr, > > + unsigned long *pfn, unsigned long *size, > > + pgprot_t *prot); > > +}; > > Dear Kirti: > > When I rebased my vfio-ccw patches on this series, I found I need an > extra 'ioctl' callback in phy_device_ops. > > The ccw physical device only supports one ccw mediated device. And I > have two new ioctl commands for the ccw mediated device. One is > to hot-reset the resource in the physical device that allocated for > the mediated device, the other is to do an I/O instruction translation > and perform an I/O operation on the physical device. I found the > existing callbacks could not meet my requirements. > > Something like the following would be fine for my case: > int (*ioctl)(struct mdev_device *vdev, >unsigned int cmd, >unsigned long arg); > > What do you think about this? > 'reset' should be generic. better to define an individual callback for it (then we can also expose a node under vgpu path in sysfs). Thanks Kevin
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, 25 May 2016 01:28:15 +0530 Kirti Wankhede wrote: ...snip... > +struct phy_device_ops { > + struct module *owner; > + const struct attribute_group **dev_attr_groups; > + const struct attribute_group **mdev_attr_groups; > + > + int (*supported_config)(struct device *dev, char *config); > + int (*create)(struct device *dev, uuid_le uuid, > + uint32_t instance, char *mdev_params); > + int (*destroy)(struct device *dev, uuid_le uuid, > +uint32_t instance); > + int (*start)(uuid_le uuid); > + int (*shutdown)(uuid_le uuid); > + ssize_t (*read)(struct mdev_device *vdev, char *buf, size_t count, > + enum mdev_emul_space address_space, loff_t pos); > + ssize_t (*write)(struct mdev_device *vdev, char *buf, size_t count, > + enum mdev_emul_space address_space, loff_t pos); > + int (*set_irqs)(struct mdev_device *vdev, uint32_t flags, > + unsigned int index, unsigned int start, > + unsigned int count, void *data); > + int (*get_region_info)(struct mdev_device *vdev, int region_index, > + struct pci_region_info *region_info); > + int (*validate_map_request)(struct mdev_device *vdev, > + unsigned long virtaddr, > + unsigned long *pfn, unsigned long *size, > + pgprot_t *prot); > +}; Dear Kirti: When I rebased my vfio-ccw patches on this series, I found I need an extra 'ioctl' callback in phy_device_ops. The ccw physical device only supports one ccw mediated device. And I have two new ioctl commands for the ccw mediated device. One is to hot-reset the resource in the physical device that allocated for the mediated device, the other is to do an I/O instruction translation and perform an I/O operation on the physical device. I found the existing callbacks could not meet my requirements. Something like the following would be fine for my case: int (*ioctl)(struct mdev_device *vdev, unsigned int cmd, unsigned long arg); What do you think about this? Dong Jia
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
> From: Kirti Wankhede > Sent: Wednesday, May 25, 2016 10:47 PM > > > >> +static struct devices_list { > >> + struct list_headdev_list; > >> + struct mutexlist_lock; > >> +} mdevices, phy_devices; > > > > phy_devices -> pdevices? and similarly we can use pdev/mdev > > pair in other places... > > > > 'pdevices' sometimes also refers to 'pointer to devices' that's the > reason I perfer to use phy_devices to represent 'physical devices' well, I think it should be clear in this context where 'p' means 'physical'. Just like frequently used pdev in pci files for pci_dev... > > > >> +static struct mdev_device *find_mdev_device(uuid_le uuid, int instance) > > > > can we just call it "struct mdev* or "mdevice"? "dev_device" looks > redundant. > > > > 'struct mdev_device' represents 'device structure for device created by > mdev module'. Still that doesn't satisfy major folks, I'm open to change > it. IMO 'mdev' should be clearly enough to represent a mediated device > > > > Sorry I may have to ask same question since I didn't get an answer yet. > > what exactly does 'instance' mean here? since uuid is unique, why do > > we need match instance too? > > > > 'uuid' could be UUID of a VM for whom it is created. To support mutiple > mediated devices for same VM, name should be unique. Hence we need a > instance number to identify each mediated device uniquely in one VM. If UUID alone cannot universally identify a mediated device, what's the point of explicitly tagging it in kernel? Either we assign a new UUID for this mdev itself, or possibly better define it as a string? Management stack can pass any ID/name in string format which is sufficiently to identify mdev to its own purpose? Then in this framework we just do simple string match... > > > > >> + if (phy_dev->ops->destroy) { > >> + if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid, > >> +mdevice->instance)) { > >> + mutex_unlock(&phy_devices.list_lock); > > > > a warning message is preferred. Also better to return -EBUSY here. > > > > mdev_destroy_device() is called from 2 paths, one is sysfs mdev_destroy > and mdev_unregister_device(). For the later case, return from here will > any ways ignored. mdev_unregister_device() is called from the remove > function of physical device and that doesn't care about return error, it > just removes the device from subsystem. Regardless of whether caller will handle error, this function itself should return error since it makes sense in other path e.g. sysfs to let user know what's happening. > > >> + return; > >> + } > >> + } > >> + > >> + mdev_remove_attribute_group(&mdevice->dev, > >> + phy_dev->ops->mdev_attr_groups); > >> + mdevice->phy_dev = NULL; > > > > Am I missing something here? You didn't remove this mdev node from > > the list, and below... > > > > device_unregister() calls put_device(dev) and if refcount is zero its > release function is called, which is mdev_device_release(), that is > hooked during device_register(). This node is removed from list from > mdev_device_release(). I'm not sure whether there'll be some race condition here, since you put a defunc mdev on the list... > > >> + phy_dev->dev = dev; > >> + phy_dev->ops = ops; > >> + > >> + mutex_lock(&phy_devices.list_lock); > >> + ret = mdev_create_sysfs_files(dev); > >> + if (ret) > >> + goto add_sysfs_error; > >> + > >> + ret = mdev_add_attribute_group(dev, ops->dev_attr_groups); > >> + if (ret) > >> + goto add_group_error; > > > > any reason to include sysfs operations inside the mutex which is > > purely about phy_devices list? > > > > dev_attr_groups attribute is for physical device, hence inside > phy_devices.list_lock. phy_devices.list_lock only protects the list, when you plan to add a new phy_device node after it's initialized and get ready. sysfs attribute setup is still part of initialization. Thanks Kevin
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Thu, 26 May 2016 14:33:39 +0530 Kirti Wankhede wrote: > Thanks Alex. > > I'll consider all the nits and fix those in next version of patch. > > More below: > > On 5/26/2016 4:09 AM, Alex Williamson wrote: > > On Wed, 25 May 2016 01:28:15 +0530 > > Kirti Wankhede wrote: > > > > ... > > >> + > >> +config MDEV > >> +tristate "Mediated device driver framework" > >> +depends on VFIO > >> +default n > >> +help > >> +MDEV provides a framework to virtualize device without > SR-IOV cap > >> +See Documentation/mdev.txt for more details. > > > > I don't see that file anywhere in this series. > > Yes, missed this file in this patch. I'll add it in next version of patch. > Since mdev module is moved in vfio directory, should I place this file > in vfio directory, Documentation/vfio/mdev.txt? or keep documentation of > mdev module within vfio.txt itself? Maybe just call it vfio-mediated-device.txt > >> + if (phy_dev) { > >> + mutex_lock(&phy_devices.list_lock); > >> + > >> + /* > >> + * If vendor driver doesn't return success that means vendor > >> + * driver doesn't support hot-unplug > >> + */ > >> + if (phy_dev->ops->destroy) { > >> + if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid, > >> +mdevice->instance)) { > >> + mutex_unlock(&phy_devices.list_lock); > >> + return; > >> + } > >> + } > >> + > >> + mdev_remove_attribute_group(&mdevice->dev, > >> + phy_dev->ops->mdev_attr_groups); > >> + mdevice->phy_dev = NULL; > >> + mutex_unlock(&phy_devices.list_lock); > > > > Locking here appears arbitrary, how does the above code interact with > > phy_devices.dev_list? > > > > Sorry for not being clear about phy_devices.list_lock, probably I > shouldn't have named it 'list_lock'. This lock is also to synchronize > register_device & unregister_device and physical device specific > callbacks: supported_config, create, destroy, start and shutdown. > Although supported_config, create and destroy are per phy_device > specific callbacks while start and shutdown could refer to multiple > phy_devices indirectly when there are multiple mdev devices of same type > on different physical devices. There could be race condition in start > callback and destroy & unregister_device. I'm revisiting this lock again > and will see to use per phy device lock for phy_device specific callbacks. > > > >> +struct mdev_device { > >> + struct kref kref; > >> + struct device dev; > >> + struct phy_device *phy_dev; > >> + struct iommu_group *group; > >> + void*iommu_data; > >> + uuid_le uuid; > >> + uint32_tinstance; > >> + void*driver_data; > >> + struct mutexops_lock; > >> + struct list_headnext; > >> +}; > > > > Could this be in the private header? Seems like this should be opaque > > outside of mdev core. > > > > No, this structure is used in mediated device call back functions to > vendor driver so that vendor driver could identify mdev device, similar > to pci_dev structure in pci bus subsystem. (I'll remove kref which is > not being used at all.) Personally I'd prefer to see more use of reference counting and less locking, especially since the locking is mostly ineffective in this version. > >> + * @read: Read emulation callback > >> + *@mdev: mediated device structure > >> + *@buf: read buffer > >> + *@count: number bytes to read > >> + *@address_space: specifies for which address > >> + *space the request is: pci_config_space, IO > >> + *register space or MMIO space. > > > > Seems like I asked before and it's no more clear in the code, how do we > > handle multiple spaces for various types? ie. a device might have > > multiple MMIO spaces. > > > >> + *@pos: offset from base address. > > Sorry, updated the code but missed to update comment here. > pos = base_address + offset > (its not 'pos' anymore, will rename it to addr) > > so vendor driver is aware about base addresses of multiple MMIO spaces > and its size, they can identify MMIO space based on addr. Why not let the vendor driver provide vfio_region_info directly, including the offset within the device file descriptor thedn the mediated device core simply pass read/write through without caring what the address space is? Thanks, Alex > >> +/* > >> + * Physical Device > >> + */ > >> +struct phy_device { > >> + struct device *dev; > >> + const struct phy_device_ops *ops; > >> + struct list_headnext; > >> +};
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
Thanks Alex. I'll consider all the nits and fix those in next version of patch. More below: On 5/26/2016 4:09 AM, Alex Williamson wrote: > On Wed, 25 May 2016 01:28:15 +0530 > Kirti Wankhede wrote: > ... >> + >> +config MDEV >> +tristate "Mediated device driver framework" >> +depends on VFIO >> +default n >> +help >> +MDEV provides a framework to virtualize device without SR-IOV cap >> +See Documentation/mdev.txt for more details. > > I don't see that file anywhere in this series. Yes, missed this file in this patch. I'll add it in next version of patch. Since mdev module is moved in vfio directory, should I place this file in vfio directory, Documentation/vfio/mdev.txt? or keep documentation of mdev module within vfio.txt itself? >> +if (phy_dev) { >> +mutex_lock(&phy_devices.list_lock); >> + >> +/* >> +* If vendor driver doesn't return success that means vendor >> +* driver doesn't support hot-unplug >> +*/ >> +if (phy_dev->ops->destroy) { >> +if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid, >> + mdevice->instance)) { >> +mutex_unlock(&phy_devices.list_lock); >> +return; >> +} >> +} >> + >> +mdev_remove_attribute_group(&mdevice->dev, >> +phy_dev->ops->mdev_attr_groups); >> +mdevice->phy_dev = NULL; >> +mutex_unlock(&phy_devices.list_lock); > > Locking here appears arbitrary, how does the above code interact with > phy_devices.dev_list? > Sorry for not being clear about phy_devices.list_lock, probably I shouldn't have named it 'list_lock'. This lock is also to synchronize register_device & unregister_device and physical device specific callbacks: supported_config, create, destroy, start and shutdown. Although supported_config, create and destroy are per phy_device specific callbacks while start and shutdown could refer to multiple phy_devices indirectly when there are multiple mdev devices of same type on different physical devices. There could be race condition in start callback and destroy & unregister_device. I'm revisiting this lock again and will see to use per phy device lock for phy_device specific callbacks. >> +struct mdev_device { >> +struct kref kref; >> +struct device dev; >> +struct phy_device *phy_dev; >> +struct iommu_group *group; >> +void*iommu_data; >> +uuid_le uuid; >> +uint32_tinstance; >> +void*driver_data; >> +struct mutexops_lock; >> +struct list_headnext; >> +}; > > Could this be in the private header? Seems like this should be opaque > outside of mdev core. > No, this structure is used in mediated device call back functions to vendor driver so that vendor driver could identify mdev device, similar to pci_dev structure in pci bus subsystem. (I'll remove kref which is not being used at all.) >> + * @read: Read emulation callback >> + * @mdev: mediated device structure >> + * @buf: read buffer >> + * @count: number bytes to read >> + * @address_space: specifies for which address >> + * space the request is: pci_config_space, IO >> + * register space or MMIO space. > > Seems like I asked before and it's no more clear in the code, how do we > handle multiple spaces for various types? ie. a device might have > multiple MMIO spaces. > >> + * @pos: offset from base address. Sorry, updated the code but missed to update comment here. pos = base_address + offset (its not 'pos' anymore, will rename it to addr) so vendor driver is aware about base addresses of multiple MMIO spaces and its size, they can identify MMIO space based on addr. >> +/* >> + * Physical Device >> + */ >> +struct phy_device { >> +struct device *dev; >> +const struct phy_device_ops *ops; >> +struct list_headnext; >> +}; > > I would really like to be able to use the mediated device interface to > create a purely virtual device, is the expectation that my physical > device interface would create a virtual struct device which would > become the parent and control point in sysfs for creating all the mdev > devices? Should we be calling this a host_device or mdev_parent_dev in > that case since there's really no requirement that it be a physical > device? Makes sense. I'll rename it to parent_device. Thanks, Kirti.
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On Wed, 25 May 2016 01:28:15 +0530 Kirti Wankhede wrote: > Design for Mediated Device Driver: > Main purpose of this driver is to provide a common interface for mediated > device management that can be used by differnt drivers of different > devices. > > This module provides a generic interface to create the device, add it to > mediated bus, add device to IOMMU group and then add it to vfio group. > > Below is the high Level block diagram, with Nvidia, Intel and IBM devices > as example, since these are the devices which are going to actively use > this module as of now. > > +---+ > | | > | +---+ | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | mdev | +>+ |<-> VFIO user > | | bus | | probe()/remove()| vfio_mpci.ko |APIs > | | driver | | | | > | | | | +--+ > | | | | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | | +>+ |<-> VFIO user > | +---+ | probe()/remove()| vfio_mccw.ko |APIs > | | | | > | MDEV CORE| +--+ > | MODULE | > | mdev.ko | > | +---+ | mdev_register_device() +--+ > | | | +<+ | > | | | | | nvidia.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | Physical | | > | | device | | mdev_register_device() +--+ > | | interface | |<+ | > | | | | | i915.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | | | > | | | | mdev_register_device() +--+ > | | | +<+ | > | | | | | ccw_device.ko|<-> physical > | | | +>+ |device > | | | |callback +--+ > | +---+ | > +---+ > > Core driver provides two types of registration interfaces: > 1. Registration interface for mediated bus driver: > > /** > * struct mdev_driver - Mediated device's driver > * @name: driver name > * @probe: called when new device created > * @remove:called when device removed > * @match: called when new device or driver is added for this bus. > Return 1 if given device can be handled by given driver and > zero otherwise. > * @driver:device driver structure > * > **/ > struct mdev_driver { > const char *name; > int (*probe) (struct device *dev); > void (*remove) (struct device *dev); >int (*match)(struct device *dev); > struct device_driverdriver; > }; > > int mdev_register_driver(struct mdev_driver *drv, struct module *owner); > void mdev_unregister_driver(struct mdev_driver *drv); > > Mediated device's driver for mdev should use this interface to register > with Core driver. With this, mediated devices driver for such devices is > responsible to add mediated device to VFIO group. > > 2. Physical device driver interface > This interface provides vendor driver the set APIs to manage physical > device related work in their own driver. APIs are : > - supported_config: provide supported configuration list by the vendor > driver > - create: to allocate basic resources in vendor driver for a mediated > device. > - destroy: to free resources in vendor driver when mediated device is > destroyed. > - start: to initiate mediated device initialization process from vendor >driver when VM boots and before QEMU starts. > - shutdown: to teardown mediated device resources during VM teardown. > - read : read emulation callback. > - write: write emulation callback. > - set_irqs: send interrupt configuration information that QEMU sets. > - get_region_info: to provide region size and its flags for the mediated > device. > - validate_map_request: to validate remap pfn request. nit, vfio is a userspace driver interface where QEMU is simply a user of that interface. We should never assume QEMU is the only user. > > This registration interface should be used by vendor drivers to register > each physical device to mdev core driver. > > Signed-off-by: Kirti Wankhede
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
On 5/25/2016 1:25 PM, Tian, Kevin wrote: >> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] >> Sent: Wednesday, May 25, 2016 3:58 AM >> ... >> + >> +config MDEV >> +tristate "Mediated device driver framework" > > Sorry not a native speaker. Is it cleaner to say "Driver framework for Mediated > Devices" or "Mediated Device Framework"? Should we focus on driver or device > here? > Both, device and driver. This framework provides way to register physical *devices* and also register *driver* for mediated devices. >> +depends on VFIO >> +default n >> +help >> +MDEV provides a framework to virtualize device without SR-IOV cap >> +See Documentation/mdev.txt for more details. > > Looks Documentation/mdev.txt is not included in this version. > Yes, will have Documentation/mdev.txt in next version of patch. >> +static struct devices_list { >> +struct list_headdev_list; >> +struct mutexlist_lock; >> +} mdevices, phy_devices; > > phy_devices -> pdevices? and similarly we can use pdev/mdev > pair in other places... > 'pdevices' sometimes also refers to 'pointer to devices' that's the reason I perfer to use phy_devices to represent 'physical devices' >> +static struct mdev_device *find_mdev_device(uuid_le uuid, int instance) > > can we just call it "struct mdev* or "mdevice"? "dev_device" looks redundant. > 'struct mdev_device' represents 'device structure for device created by mdev module'. Still that doesn't satisfy major folks, I'm open to change it. > Sorry I may have to ask same question since I didn't get an answer yet. > what exactly does 'instance' mean here? since uuid is unique, why do > we need match instance too? > 'uuid' could be UUID of a VM for whom it is created. To support mutiple mediated devices for same VM, name should be unique. Hence we need a instance number to identify each mediated device uniquely in one VM. >> +if (phy_dev->ops->destroy) { >> +if (phy_dev->ops->destroy(phy_dev->dev, mdevice->uuid, >> + mdevice->instance)) { >> +mutex_unlock(&phy_devices.list_lock); > > a warning message is preferred. Also better to return -EBUSY here. > mdev_destroy_device() is called from 2 paths, one is sysfs mdev_destroy and mdev_unregister_device(). For the later case, return from here will any ways ignored. mdev_unregister_device() is called from the remove function of physical device and that doesn't care about return error, it just removes the device from subsystem. >> +return; >> +} >> +} >> + >> +mdev_remove_attribute_group(&mdevice->dev, >> +phy_dev->ops->mdev_attr_groups); >> +mdevice->phy_dev = NULL; > > Am I missing something here? You didn't remove this mdev node from > the list, and below... > device_unregister() calls put_device(dev) and if refcount is zero its release function is called, which is mdev_device_release(), that is hooked during device_register(). This node is removed from list from mdev_device_release(). >> +mutex_unlock(&phy_devices.list_lock); > > you should use mutex of mdevices list > No, this lock is for phy_dev. >> +phy_dev->dev = dev; >> +phy_dev->ops = ops; >> + >> +mutex_lock(&phy_devices.list_lock); >> +ret = mdev_create_sysfs_files(dev); >> +if (ret) >> +goto add_sysfs_error; >> + >> +ret = mdev_add_attribute_group(dev, ops->dev_attr_groups); >> +if (ret) >> +goto add_group_error; > > any reason to include sysfs operations inside the mutex which is > purely about phy_devices list? > dev_attr_groups attribute is for physical device, hence inside phy_devices.list_lock. * @dev_attr_groups:Default attributes of the physical device. >> +void mdev_unregister_device(struct device *dev) >> +{ >> +struct phy_device *phy_dev; >> +struct mdev_device *vdev = NULL; >> + >> +phy_dev = find_physical_device(dev); >> + >> +if (!phy_dev) >> +return; >> + >> +dev_info(dev, "MDEV: Unregistering\n"); >> + >> +while ((vdev = find_next_mdev_device(phy_dev))) >> +mdev_destroy_device(vdev); > > Need check return value here since ops->destroy may fail. > See my comment above. >> +static void mdev_device_release(struct device *dev) > > what's the difference between this release and earlier destroy version? > See my comment above >> +static void __exit mdev_exit(void) >> +{ > > should we check any remaining mdev/pdev which are not cleaned > up correctly here? > If there are any physical device registered with this module then the usage count is not zero so rmmod would anyways fail. All mdev_devices, which are created for any physical device, are destroyed from mdev_unregister_device(physial_device); Hence no need to explicitly add the code here which would never get
Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > Sent: Wednesday, May 25, 2016 3:58 AM > > Design for Mediated Device Driver: > Main purpose of this driver is to provide a common interface for mediated > device management that can be used by differnt drivers of different > devices. > > This module provides a generic interface to create the device, add it to > mediated bus, add device to IOMMU group and then add it to vfio group. > > Below is the high Level block diagram, with Nvidia, Intel and IBM devices > as example, since these are the devices which are going to actively use > this module as of now. > > +---+ > | | > | +---+ | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | mdev | +>+ |<-> VFIO user > | | bus | | probe()/remove()| vfio_mpci.ko |APIs > | | driver | | | | > | | | | +--+ > | | | | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | | +>+ |<-> VFIO user > | +---+ | probe()/remove()| vfio_mccw.ko |APIs > | | | | > | MDEV CORE| +--+ > | MODULE | > | mdev.ko | > | +---+ | mdev_register_device() +--+ > | | | +<+ | > | | | | | nvidia.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | Physical | | > | | device | | mdev_register_device() +--+ > | | interface | |<+ | > | | | | | i915.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | | | > | | | | mdev_register_device() +--+ > | | | +<+ | > | | | | | ccw_device.ko|<-> physical > | | | +>+ |device > | | | |callback +--+ > | +---+ | > +---+ > > Core driver provides two types of registration interfaces: > 1. Registration interface for mediated bus driver: > > /** > * struct mdev_driver - Mediated device's driver > * @name: driver name > * @probe: called when new device created > * @remove:called when device removed > * @match: called when new device or driver is added for this bus. > Return 1 if given device can be handled by given driver and > zero otherwise. > * @driver:device driver structure > * > **/ > struct mdev_driver { > const char *name; > int (*probe) (struct device *dev); > void (*remove) (struct device *dev); >int (*match)(struct device *dev); > struct device_driverdriver; > }; > > int mdev_register_driver(struct mdev_driver *drv, struct module *owner); > void mdev_unregister_driver(struct mdev_driver *drv); > > Mediated device's driver for mdev should use this interface to register > with Core driver. With this, mediated devices driver for such devices is > responsible to add mediated device to VFIO group. > > 2. Physical device driver interface > This interface provides vendor driver the set APIs to manage physical > device related work in their own driver. APIs are : > - supported_config: provide supported configuration list by the vendor > driver > - create: to allocate basic resources in vendor driver for a mediated > device. > - destroy: to free resources in vendor driver when mediated device is > destroyed. > - start: to initiate mediated device initialization process from vendor >driver when VM boots and before QEMU starts. > - shutdown: to teardown mediated device resources during VM teardown. > - read : read emulation callback. > - write: write emulation callback. > - set_irqs: send interrupt configuration information that QEMU sets. > - get_region_info: to provide region size and its flags for the mediated > device. > - validate_map_request: to validate remap pfn request. > > This registration interface should be used by vendor drivers to register > each physical device to mdev core driver. > > Signed-off-by: Kirti Wankhede > Signed-off-by: Neo Jia > Change-Id: I88f4482f7608f40550a152c5f882b64271287c62 > --- > drivers/vfi
[Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver
Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by differnt drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---+ | | | +---+ | mdev_register_driver() +--+ | | | +<+ __init() | | | | | | | | | mdev | +>+ |<-> VFIO user | | bus | | probe()/remove()| vfio_mpci.ko |APIs | | driver | | | | | | | | +--+ | | | | mdev_register_driver() +--+ | | | +<+ __init() | | | | | | | | | | +>+ |<-> VFIO user | +---+ | probe()/remove()| vfio_mccw.ko |APIs | | | | | MDEV CORE| +--+ | MODULE | | mdev.ko | | +---+ | mdev_register_device() +--+ | | | +<+ | | | | | | nvidia.ko |<-> physical | | | +>+ |device | | | |callback +--+ | | Physical | | | | device | | mdev_register_device() +--+ | | interface | |<+ | | | | | | i915.ko |<-> physical | | | +>+ |device | | | |callback +--+ | | | | | | | | mdev_register_device() +--+ | | | +<+ | | | | | | ccw_device.ko|<-> physical | | | +>+ |device | | | |callback +--+ | +---+ | +---+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @match: called when new device or driver is added for this bus. Return 1 if given device can be handled by given driver and zero otherwise. * @driver:device driver structure * **/ struct mdev_driver { const char *name; int (*probe) (struct device *dev); void (*remove) (struct device *dev); int (*match)(struct device *dev); struct device_driverdriver; }; int mdev_register_driver(struct mdev_driver *drv, struct module *owner); void mdev_unregister_driver(struct mdev_driver *drv); Mediated device's driver for mdev should use this interface to register with Core driver. With this, mediated devices driver for such devices is responsible to add mediated device to VFIO group. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in their own driver. APIs are : - supported_config: provide supported configuration list by the vendor driver - create: to allocate basic resources in vendor driver for a mediated device. - destroy: to free resources in vendor driver when mediated device is destroyed. - start: to initiate mediated device initialization process from vendor driver when VM boots and before QEMU starts. - shutdown: to teardown mediated device resources during VM teardown. - read : read emulation callback. - write: write emulation callback. - set_irqs: send interrupt configuration information that QEMU sets. - get_region_info: to provide region size and its flags for the mediated device. - validate_map_request: to validate remap pfn request. This registration interface should be used by vendor drivers to register each physical device to mdev core driver. Signed-off-by: Kirti Wankhede Signed-off-by: Neo Jia Change-Id: I88f4482f7608f40550a152c5f882b64271287c62 --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile| 1 + drivers/vfio/mdev/Kconfig| 11 + drivers/vfio/mdev/Makefile | 5 + drivers/vfio/mdev/mdev-core.c| 462 +++ drivers/vfio/mdev/mdev-driver.c | 139 dr