Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-18 Thread Jason Wang


On 2019/7/10 下午3:22, Jason Wang wrote:

Yeah, that's a major concern. If it's true, is it something
that's not acceptable?



I think not, but I don't know if any other one that care this.





And I do see some new RFC for VFIO to add more DMA API.

Is there any pointers?



I don't remember the details, but it should be something related to 
SVA support in recent intel IOMMU.



E.g this series:

https://www.spinics.net/lists/iommu/msg37146.html

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-10 Thread Jason Wang


On 2019/7/10 下午2:22, Tiwei Bie wrote:

On Wed, Jul 10, 2019 at 10:26:10AM +0800, Jason Wang wrote:

On 2019/7/9 下午2:33, Tiwei Bie wrote:

On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:

On 2019/7/8 下午2:16, Tiwei Bie wrote:

On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:

On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

Yeah, I totally agree.

I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then
we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU.

Yeah, you are right. We have several choices here:

#1. We expose a VFIO device, so we can reuse the VFIO container/group
  based DMA API and potentially reuse a lot of VFIO code in QEMU.

  But in this case, we have two choices for the VFIO device interface
  (i.e. the interface on top of VFIO device fd):

  A) we may invent a new vhost protocol (as demonstrated by the code
 in this RFC) on VFIO device fd to make it work in VFIO's way,

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-10 Thread Tiwei Bie
On Wed, Jul 10, 2019 at 10:26:10AM +0800, Jason Wang wrote:
> On 2019/7/9 下午2:33, Tiwei Bie wrote:
> > On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> > > On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > > > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > > > Tiwei Bie  wrote:
> > > > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > > > Details about this can be found here:
> > > > > > > > > > > > 
> > > > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > > > 
> > > > > > > > > > > > What's new in this version
> > > > > > > > > > > > ==
> > > > > > > > > > > > 
> > > > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > > > addressed
> > > > > > > > > > > > some comments from 
> > > > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > > > 
> > > > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > > > 
> > > > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > > > CONFIG_REGION
> > > > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > > > setup the
> > > > > > > > > > > > device; 2) NOTIFY_REGION 
> > > > > > > > > > > > (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > > > > > can be used to notify the device.
> > > > > > > > > > > > 
> > > > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > > > 
> > > > > > > > > > > > The region described by CONFIG_REGION is the main 
> > > > > > > > > > > > control interface.
> > > > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > > > 
> > > > > > > > > > > > The message type is determined by the `request` field 
> > > > > > > > > > > > in message
> > > > > > > > > > > > header. The message size is encoded in the message 
> > > > > > > > > > > > header too.
> > > > > > > > > > > > The message format looks like this:
> > > > > > > > > > > > 
> > > > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > > > __u64 request;
> > > > > > > > > > > > __u32 flags;
> > > > > > > > > > > > /* Flag values: */
> > > > > > > > > > > >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > > > reply */
> > > > > > > > > > > > __u32 size;
> > > > > > > > > > > > union {
> > > > > > > > > > > > __u64 u64;
> > > > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > > > } payload;
> > > > > > > > > > > > };
> > > > > > > > > > > > 
> > > > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the 
> > > > > > > > > > > > message
> > > > > > > > > > > > requests in above structure.
> > > > > > > > > > > Still a comments like V1. What's the advantage of 
> > > > > > > > > > > inventing a new protocol?
> > > > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > > > > I believe either of the following should be better:
> > > > > > > > > > > 
> > > > > > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > > > > > extend it with e.g notify region. The advantages is that 
> > > > > > > > > > > all exist userspace
> > > > > > > > > > > program could be reused without modification (or minimal 
> > > > > > > > > > > modification). And
> > > > > > > > > > > vhost API hides lots of details that is not necessary to 
> > > > > > > > > > > be understood by
> > > > > > > > > > > application (e.g in the case of container).
> > > > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd 
> > > > > > > > > > directly,
> > > > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead 
> > > > > > > > > > of
> > > > > > > > > > using the existing vfio_mdev) for mdev device?
> > > > > > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > > > > > Right, either way, these ioctls have to be and just need to be
> > > > > > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > > > > > also need to consider is that which file descriptor the 
> > > > > > > > userspace
> > > > > > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > > > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > > > > > device?
> > > > > > > Yes.
> > > > > > Got it! I'm not sure what's Alex opinion on this. If we all
> > > > > > agree with this, I can do it in this 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-09 Thread Jason Wang


On 2019/7/9 下午2:33, Tiwei Bie wrote:

On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:

On 2019/7/8 下午2:16, Tiwei Bie wrote:

On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:

On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

Yeah, I totally agree.


I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then
we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU.

Yeah, you are right. We have several choices here:

#1. We expose a VFIO device, so we can reuse the VFIO container/group
 based DMA API and potentially reuse a lot of VFIO code in QEMU.

 But in this case, we have two choices for the VFIO device interface
 (i.e. the interface on top of VFIO device fd):

 A) we may invent a new vhost protocol (as demonstrated by the code
in this RFC) on VFIO device fd to make it work in VFIO's way,
i.e. regions and irqs.

 B) Or as you proposed, instead of inventing a new vhost protocol,
  

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-09 Thread Tiwei Bie
On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:
> On 2019/7/8 下午2:16, Tiwei Bie wrote:
> > On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> > > On Thu, 4 Jul 2019 14:21:34 +0800
> > > Tiwei Bie  wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > addressed
> > > > > > > > > > some comments from 
> > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), 
> > > > > > > > > > which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > > > interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in 
> > > > > > > > > > message
> > > > > > > > > > header. The message size is encoded in the message header 
> > > > > > > > > > too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > __u64 request;
> > > > > > > > > > __u32 flags;
> > > > > > > > > > /* Flag values: */
> > > > > > > > > >  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > reply */
> > > > > > > > > > __u32 size;
> > > > > > > > > > union {
> > > > > > > > > > __u64 u64;
> > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > } payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the 
> > > > > > > > > > message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a 
> > > > > > > > > new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > > > exist userspace
> > > > > > > > > program could be reused without modification (or minimal 
> > > > > > > > > modification). And
> > > > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > > > understood by
> > > > > > > > > application (e.g in the case of container).
> > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > > > using the existing vfio_mdev) for mdev device?
> > > > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > > > Right, either way, these ioctls have to be and just need to be
> > > > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > > > also need to consider is that which file descriptor the userspace
> > > > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > > > device?
> > > > > Yes.
> > > > Got it! I'm not sure what's Alex opinion on this. If we all
> > > > agree with this, I can do it in this way.
> > > > 
> > > > > Is there any other way btw?
> > > > Just a quick thought.. Maybe totally a bad idea. I was thinking
> > > > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > > > fd. So I was wondering whether it's possible to allow binding
> > > > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > > > devices. The new mdev driver, vhost_mdev, can provide similar
> > > > ways to let userspace open the mdev device and do the vhost ioctls
> > > > on it. To distinguish with the vfio_mdev compatible mdev 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-08 Thread Jason Wang


On 2019/7/8 下午2:16, Tiwei Bie wrote:

On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:

On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..
   

I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?
   

Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

Yeah, I totally agree.



I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. 
Then we have the chance to reuse vfio codes in qemu for dealing with e.g 
vIOMMU.






The mdev interface does provide a general interface for creating and
managing virtual devices, vfio-mdev is just one driver on the mdev
bus.  Parav (Mellanox) has been doing work on mdev-core to help clean
out vfio-isms from the interface, aiui, with the intent of implementing
another mdev bus driver for using the devices within the kernel.

Great to know this! I found below series after some searching:

https://lkml.org/lkml/2019/3/8/821

In above series, the new mlx5_core mdev driver will do the probe
by calling mlx5_get_core_dev() first on the parent device of the
mdev device. In vhost_mdev, maybe we can also keep track of all
the 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-08 Thread Tiwei Bie
On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:
> On Thu, 4 Jul 2019 14:21:34 +0800
> Tiwei Bie  wrote:
> > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午9:08, Tiwei Bie wrote:  
> > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:  
> > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:  
> > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:  
> > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:  
> > > > > > > > Details about this can be found here:
> > > > > > > > 
> > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > 
> > > > > > > > What's new in this version
> > > > > > > > ==
> > > > > > > > 
> > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > addressed
> > > > > > > > some comments from 
> > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > 
> > > > > > > > Below is the updated device interface:
> > > > > > > > 
> > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > CONFIG_REGION
> > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > can be used to notify the device.
> > > > > > > > 
> > > > > > > > 1. CONFIG_REGION
> > > > > > > > 
> > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > interface.
> > > > > > > > Messages will be written to or read from this region.
> > > > > > > > 
> > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > The message format looks like this:
> > > > > > > > 
> > > > > > > > struct vhost_vfio_op {
> > > > > > > > __u64 request;
> > > > > > > > __u32 flags;
> > > > > > > > /* Flag values: */
> > > > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > __u32 size;
> > > > > > > > union {
> > > > > > > > __u64 u64;
> > > > > > > > struct vhost_vring_state state;
> > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > } payload;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > requests in above structure.  
> > > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > > protocol?  
> > > > > > I'm trying to make it work in VFIO's way..
> > > > > >   
> > > > > > > I believe either of the following should be better:
> > > > > > > 
> > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > exist userspace
> > > > > > > program could be reused without modification (or minimal 
> > > > > > > modification). And
> > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > understood by
> > > > > > > application (e.g in the case of container).  
> > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > using the existing vfio_mdev) for mdev device?  
> > > > > Can we simply add them into ioctl of mdev_parent_ops?  
> > > > Right, either way, these ioctls have to be and just need to be
> > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > also need to consider is that which file descriptor the userspace
> > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > device?
> > > >   
> > > 
> > > Yes.  
> > 
> > Got it! I'm not sure what's Alex opinion on this. If we all
> > agree with this, I can do it in this way.
> > 
> > > Is there any other way btw?  
> > 
> > Just a quick thought.. Maybe totally a bad idea. I was thinking
> > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > fd. So I was wondering whether it's possible to allow binding
> > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > devices. The new mdev driver, vhost_mdev, can provide similar
> > ways to let userspace open the mdev device and do the vhost ioctls
> > on it. To distinguish with the vfio_mdev compatible mdev devices,
> > the device API of the new vhost_mdev compatible mdev devices
> > might be e.g. "vhost-net" for net?
> > 
> > So in VFIO case, the device will be for passthru directly. And
> > in VHOST case, the device can be used to accelerate the existing
> > virtualized devices.
> > 
> > How do you think?
> 
> VFIO really can't prevent vendor specific ioctls on the device file
> descriptor for mdevs, but a) we'd want to be sure the ioctl address
> space can't collide with ioctls we'd 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-05 Thread Alex Williamson
On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

> On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > On 2019/7/3 下午9:08, Tiwei Bie wrote:  
> > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:  
> > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:  
> > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:  
> > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:  
> > > > > > > Details about this can be found here:
> > > > > > > 
> > > > > > > https://lwn.net/Articles/750770/
> > > > > > > 
> > > > > > > What's new in this version
> > > > > > > ==
> > > > > > > 
> > > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > 
> > > > > > > Below is the updated device interface:
> > > > > > > 
> > > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > can be used to notify the device.
> > > > > > > 
> > > > > > > 1. CONFIG_REGION
> > > > > > > 
> > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > interface.
> > > > > > > Messages will be written to or read from this region.
> > > > > > > 
> > > > > > > The message type is determined by the `request` field in message
> > > > > > > header. The message size is encoded in the message header too.
> > > > > > > The message format looks like this:
> > > > > > > 
> > > > > > > struct vhost_vfio_op {
> > > > > > >   __u64 request;
> > > > > > >   __u32 flags;
> > > > > > >   /* Flag values: */
> > > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > >   __u32 size;
> > > > > > >   union {
> > > > > > >   __u64 u64;
> > > > > > >   struct vhost_vring_state state;
> > > > > > >   struct vhost_vring_addr addr;
> > > > > > >   } payload;
> > > > > > > };
> > > > > > > 
> > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > requests in above structure.  
> > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > protocol?  
> > > > > I'm trying to make it work in VFIO's way..
> > > > >   
> > > > > > I believe either of the following should be better:
> > > > > > 
> > > > > > - using vhost ioctl,  we can start from 
> > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > extend it with e.g notify region. The advantages is that all exist 
> > > > > > userspace
> > > > > > program could be reused without modification (or minimal 
> > > > > > modification). And
> > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > understood by
> > > > > > application (e.g in the case of container).  
> > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > using the existing vfio_mdev) for mdev device?  
> > > > Can we simply add them into ioctl of mdev_parent_ops?  
> > > Right, either way, these ioctls have to be and just need to be
> > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > also need to consider is that which file descriptor the userspace
> > > will do the ioctl() on. So I'm wondering do you mean let the
> > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > device?
> > >   
> > 
> > Yes.  
> 
> Got it! I'm not sure what's Alex opinion on this. If we all
> agree with this, I can do it in this way.
> 
> > Is there any other way btw?  
> 
> Just a quick thought.. Maybe totally a bad idea. I was thinking
> whether it would be odd to do non-VFIO's ioctls on VFIO's device
> fd. So I was wondering whether it's possible to allow binding
> another mdev driver (e.g. vhost_mdev) to the supported mdev
> devices. The new mdev driver, vhost_mdev, can provide similar
> ways to let userspace open the mdev device and do the vhost ioctls
> on it. To distinguish with the vfio_mdev compatible mdev devices,
> the device API of the new vhost_mdev compatible mdev devices
> might be e.g. "vhost-net" for net?
> 
> So in VFIO case, the device will be for passthru directly. And
> in VHOST case, the device can be used to accelerate the existing
> virtualized devices.
> 
> How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

The mdev interface does provide a general 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Fri, Jul 05, 2019 at 08:30:00AM +0800, Jason Wang wrote:
> On 2019/7/4 下午3:02, Tiwei Bie wrote:
> > On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:
> > > On 2019/7/4 下午2:21, Tiwei Bie wrote:
> > > > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > > > Details about this can be found here:
> > > > > > > > > > 
> > > > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > > > 
> > > > > > > > > > What's new in this version
> > > > > > > > > > ==
> > > > > > > > > > 
> > > > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > > > addressed
> > > > > > > > > > some comments from 
> > > > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > > > 
> > > > > > > > > > Below is the updated device interface:
> > > > > > > > > > 
> > > > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > > > CONFIG_REGION
> > > > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to 
> > > > > > > > > > setup the
> > > > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), 
> > > > > > > > > > which
> > > > > > > > > > can be used to notify the device.
> > > > > > > > > > 
> > > > > > > > > > 1. CONFIG_REGION
> > > > > > > > > > 
> > > > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > > > interface.
> > > > > > > > > > Messages will be written to or read from this region.
> > > > > > > > > > 
> > > > > > > > > > The message type is determined by the `request` field in 
> > > > > > > > > > message
> > > > > > > > > > header. The message size is encoded in the message header 
> > > > > > > > > > too.
> > > > > > > > > > The message format looks like this:
> > > > > > > > > > 
> > > > > > > > > > struct vhost_vfio_op {
> > > > > > > > > > __u64 request;
> > > > > > > > > > __u32 flags;
> > > > > > > > > > /* Flag values: */
> > > > > > > > > >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need 
> > > > > > > > > > reply */
> > > > > > > > > > __u32 size;
> > > > > > > > > > union {
> > > > > > > > > > __u64 u64;
> > > > > > > > > > struct vhost_vring_state state;
> > > > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > > > } payload;
> > > > > > > > > > };
> > > > > > > > > > 
> > > > > > > > > > The existing vhost-kernel ioctl cmds are reused as the 
> > > > > > > > > > message
> > > > > > > > > > requests in above structure.
> > > > > > > > > Still a comments like V1. What's the advantage of inventing a 
> > > > > > > > > new protocol?
> > > > > > > > I'm trying to make it work in VFIO's way..
> > > > > > > > 
> > > > > > > > > I believe either of the following should be better:
> > > > > > > > > 
> > > > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > > > exist userspace
> > > > > > > > > program could be reused without modification (or minimal 
> > > > > > > > > modification). And
> > > > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > > > understood by
> > > > > > > > > application (e.g in the case of container).
> > > > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > > > using the existing vfio_mdev) for mdev device?
> > > > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > > > Right, either way, these ioctls have to be and just need to be
> > > > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > > > also need to consider is that which file descriptor the userspace
> > > > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > > > device?
> > > > > > 
> > > > > Yes.
> > > > Got it! I'm not sure what's Alex opinion on this. If we all
> > > > agree with this, I can do it in this way.
> > > > 
> > > > > Is there any other way btw?
> > > > Just a quick thought.. Maybe totally a bad idea.
> > > 
> > > It's not for sure :)
> > Thanks!
> > 
> > > 
> > > >I was thinking
> > > > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > > > fd. So I was wondering whether it's possible to allow binding
> > > > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > > > devices. The new mdev driver, vhost_mdev, can provide similar
> > > > ways to let userspace open the mdev device and do the vhost 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Jason Wang


On 2019/7/4 下午3:02, Tiwei Bie wrote:

On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:

On 2019/7/4 下午2:21, Tiwei Bie wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?


Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea.


It's not for sure :)

Thanks!




   I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?


If my understanding is correct, there will be no VFIO ioctl if we go for
vhost_mdev?

Yeah, exactly. If we go for vhost_mdev, we may have some vhost nodes
in /dev similar to what /dev/vfio/* does to handle the $UUID and open
the device (e.g. similar to VFIO_GROUP_GET_DEVICE_FD in VFIO). And
to setup the device, we can try to reuse the ioctls of the existing
kernel vhost as much as possible.



Interesting, actually, I've considered something similar. I think there 
should be no issues other than DMA:


- Need to invent new API for DMA mapping other than SET_MEM_TABLE? 
(Which is too heavyweight).


- Need to consider a way to co-work with both on chip IOMMU (your 
proposal should be fine) and scalable IOV.


Thanks




Thanks,
Tiwei


Thanks



Thanks,
Tiwei

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:
> On 2019/7/4 下午2:21, Tiwei Bie wrote:
> > On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > > > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > > > Details about this can be found here:
> > > > > > > > 
> > > > > > > > https://lwn.net/Articles/750770/
> > > > > > > > 
> > > > > > > > What's new in this version
> > > > > > > > ==
> > > > > > > > 
> > > > > > > > A new VFIO device type is introduced - vfio-vhost. This 
> > > > > > > > addressed
> > > > > > > > some comments from 
> > > > > > > > here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > > > 
> > > > > > > > Below is the updated device interface:
> > > > > > > > 
> > > > > > > > Currently, there are two regions of this device: 1) 
> > > > > > > > CONFIG_REGION
> > > > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > > > can be used to notify the device.
> > > > > > > > 
> > > > > > > > 1. CONFIG_REGION
> > > > > > > > 
> > > > > > > > The region described by CONFIG_REGION is the main control 
> > > > > > > > interface.
> > > > > > > > Messages will be written to or read from this region.
> > > > > > > > 
> > > > > > > > The message type is determined by the `request` field in message
> > > > > > > > header. The message size is encoded in the message header too.
> > > > > > > > The message format looks like this:
> > > > > > > > 
> > > > > > > > struct vhost_vfio_op {
> > > > > > > > __u64 request;
> > > > > > > > __u32 flags;
> > > > > > > > /* Flag values: */
> > > > > > > >  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > > > __u32 size;
> > > > > > > > union {
> > > > > > > > __u64 u64;
> > > > > > > > struct vhost_vring_state state;
> > > > > > > > struct vhost_vring_addr addr;
> > > > > > > > } payload;
> > > > > > > > };
> > > > > > > > 
> > > > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > > > requests in above structure.
> > > > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > > > protocol?
> > > > > > I'm trying to make it work in VFIO's way..
> > > > > > 
> > > > > > > I believe either of the following should be better:
> > > > > > > 
> > > > > > > - using vhost ioctl,  we can start from 
> > > > > > > SET_VRING_KICK/SET_VRING_CALL and
> > > > > > > extend it with e.g notify region. The advantages is that all 
> > > > > > > exist userspace
> > > > > > > program could be reused without modification (or minimal 
> > > > > > > modification). And
> > > > > > > vhost API hides lots of details that is not necessary to be 
> > > > > > > understood by
> > > > > > > application (e.g in the case of container).
> > > > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > > > using the existing vfio_mdev) for mdev device?
> > > > > Can we simply add them into ioctl of mdev_parent_ops?
> > > > Right, either way, these ioctls have to be and just need to be
> > > > added in the ioctl of the mdev_parent_ops. But another thing we
> > > > also need to consider is that which file descriptor the userspace
> > > > will do the ioctl() on. So I'm wondering do you mean let the
> > > > userspace do the ioctl() on the VFIO device fd of the mdev
> > > > device?
> > > > 
> > > Yes.
> > Got it! I'm not sure what's Alex opinion on this. If we all
> > agree with this, I can do it in this way.
> > 
> > > Is there any other way btw?
> > Just a quick thought.. Maybe totally a bad idea.
> 
> 
> It's not for sure :)

Thanks!

> 
> 
> >   I was thinking
> > whether it would be odd to do non-VFIO's ioctls on VFIO's device
> > fd. So I was wondering whether it's possible to allow binding
> > another mdev driver (e.g. vhost_mdev) to the supported mdev
> > devices. The new mdev driver, vhost_mdev, can provide similar
> > ways to let userspace open the mdev device and do the vhost ioctls
> > on it. To distinguish with the vfio_mdev compatible mdev devices,
> > the device API of the new vhost_mdev compatible mdev devices
> > might be e.g. "vhost-net" for net?
> > 
> > So in VFIO case, the device will be for passthru directly. And
> > in VHOST case, the device can be used to accelerate the existing
> > virtualized devices.
> > 
> > How do you think?
> 
> 
> If my understanding is correct, there will be no VFIO ioctl if we go for
> vhost_mdev?

Yeah, exactly. If we go for vhost_mdev, we may have some vhost nodes
in /dev similar to what 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Jason Wang


On 2019/7/4 下午2:21, Tiwei Bie wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?


Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea.



It's not for sure :)



  I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?



If my understanding is correct, there will be no VFIO ioctl if we go for 
vhost_mdev?


Thanks




Thanks,
Tiwei

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-04 Thread Tiwei Bie
On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > Details about this can be found here:
> > > > > > 
> > > > > > https://lwn.net/Articles/750770/
> > > > > > 
> > > > > > What's new in this version
> > > > > > ==
> > > > > > 
> > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > 
> > > > > > Below is the updated device interface:
> > > > > > 
> > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > can be used to notify the device.
> > > > > > 
> > > > > > 1. CONFIG_REGION
> > > > > > 
> > > > > > The region described by CONFIG_REGION is the main control interface.
> > > > > > Messages will be written to or read from this region.
> > > > > > 
> > > > > > The message type is determined by the `request` field in message
> > > > > > header. The message size is encoded in the message header too.
> > > > > > The message format looks like this:
> > > > > > 
> > > > > > struct vhost_vfio_op {
> > > > > > __u64 request;
> > > > > > __u32 flags;
> > > > > > /* Flag values: */
> > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > __u32 size;
> > > > > > union {
> > > > > > __u64 u64;
> > > > > > struct vhost_vring_state state;
> > > > > > struct vhost_vring_addr addr;
> > > > > > } payload;
> > > > > > };
> > > > > > 
> > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > requests in above structure.
> > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > protocol?
> > > > I'm trying to make it work in VFIO's way..
> > > > 
> > > > > I believe either of the following should be better:
> > > > > 
> > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL 
> > > > > and
> > > > > extend it with e.g notify region. The advantages is that all exist 
> > > > > userspace
> > > > > program could be reused without modification (or minimal 
> > > > > modification). And
> > > > > vhost API hides lots of details that is not necessary to be 
> > > > > understood by
> > > > > application (e.g in the case of container).
> > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > using the existing vfio_mdev) for mdev device?
> > > Can we simply add them into ioctl of mdev_parent_ops?
> > Right, either way, these ioctls have to be and just need to be
> > added in the ioctl of the mdev_parent_ops. But another thing we
> > also need to consider is that which file descriptor the userspace
> > will do the ioctl() on. So I'm wondering do you mean let the
> > userspace do the ioctl() on the VFIO device fd of the mdev
> > device?
> > 
> 
> Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.

> Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

Thanks,
Tiwei
> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?



Yes. Is there any other way btw?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 12:31:57PM -0600, Alex Williamson wrote:
> On Wed,  3 Jul 2019 17:13:39 +0800
> Tiwei Bie  wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8f10748dac79..6c5718ab7eeb 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -201,6 +201,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
> >  #define VFIO_DEVICE_FLAGS_CCW  (1 << 4)/* vfio-ccw device */
> >  #define VFIO_DEVICE_FLAGS_AP   (1 << 5)/* vfio-ap device */
> > +#define VFIO_DEVICE_FLAGS_VHOST(1 << 6)/* vfio-vhost device */
> > __u32   num_regions;/* Max region index + 1 */
> > __u32   num_irqs;   /* Max IRQ index + 1 */
> >  };
> > @@ -217,6 +218,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_API_AMBA_STRING"vfio-amba"
> >  #define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
> >  #define VFIO_DEVICE_API_AP_STRING  "vfio-ap"
> > +#define VFIO_DEVICE_API_VHOST_STRING   "vfio-vhost"
> >  
> >  /**
> >   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> > @@ -573,6 +575,23 @@ enum {
> > VFIO_CCW_NUM_IRQS
> >  };
> >  
> > +/*
> > + * The vfio-vhost bus driver makes use of the following fixed region and
> > + * IRQ index mapping. Unimplemented regions return a size of zero.
> > + * Unimplemented IRQ types return a count of zero.
> > + */
> > +
> > +enum {
> > +   VFIO_VHOST_CONFIG_REGION_INDEX,
> > +   VFIO_VHOST_NOTIFY_REGION_INDEX,
> > +   VFIO_VHOST_NUM_REGIONS
> > +};
> > +
> > +enum {
> > +   VFIO_VHOST_VQ_IRQ_INDEX,
> > +   VFIO_VHOST_NUM_IRQS
> > +};
> > +
> 
> Note that the vfio API has evolved a bit since vfio-pci started this
> way, with fixed indexes for pre-defined region types.  We now support
> device specific regions which can be identified by a capability within
> the REGION_INFO ioctl return data.  This allows a bit more flexibility,
> at the cost of complexity, but the infrastructure already exists in
> kernel and QEMU to make it relatively easy.  I think we'll have the
> same support for interrupts soon too.  If you continue to pursue the
> vfio-vhost direction you might want to consider these before committing
> to fixed indexes.  Thanks,

Thanks for the details! Will give it a try!

Thanks,
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Alex Williamson
On Wed,  3 Jul 2019 17:13:39 +0800
Tiwei Bie  wrote:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748dac79..6c5718ab7eeb 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -201,6 +201,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3) /* vfio-amba device */
>  #define VFIO_DEVICE_FLAGS_CCW(1 << 4)/* vfio-ccw device */
>  #define VFIO_DEVICE_FLAGS_AP (1 << 5)/* vfio-ap device */
> +#define VFIO_DEVICE_FLAGS_VHOST  (1 << 6)/* vfio-vhost device */
>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>  };
> @@ -217,6 +218,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_API_AMBA_STRING  "vfio-amba"
>  #define VFIO_DEVICE_API_CCW_STRING   "vfio-ccw"
>  #define VFIO_DEVICE_API_AP_STRING"vfio-ap"
> +#define VFIO_DEVICE_API_VHOST_STRING "vfio-vhost"
>  
>  /**
>   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> @@ -573,6 +575,23 @@ enum {
>   VFIO_CCW_NUM_IRQS
>  };
>  
> +/*
> + * The vfio-vhost bus driver makes use of the following fixed region and
> + * IRQ index mapping. Unimplemented regions return a size of zero.
> + * Unimplemented IRQ types return a count of zero.
> + */
> +
> +enum {
> + VFIO_VHOST_CONFIG_REGION_INDEX,
> + VFIO_VHOST_NOTIFY_REGION_INDEX,
> + VFIO_VHOST_NUM_REGIONS
> +};
> +
> +enum {
> + VFIO_VHOST_VQ_IRQ_INDEX,
> + VFIO_VHOST_NUM_IRQS
> +};
> +

Note that the vfio API has evolved a bit since vfio-pci started this
way, with fixed indexes for pre-defined region types.  We now support
device specific regions which can be identified by a capability within
the REGION_INFO ioctl return data.  This allows a bit more flexibility,
at the cost of complexity, but the infrastructure already exists in
kernel and QEMU to make it relatively easy.  I think we'll have the
same support for interrupts soon too.  If you continue to pursue the
vfio-vhost direction you might want to consider these before committing
to fixed indexes.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > Details about this can be found here:
> > > > 
> > > > https://lwn.net/Articles/750770/
> > > > 
> > > > What's new in this version
> > > > ==
> > > > 
> > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > > > 
> > > > Below is the updated device interface:
> > > > 
> > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > can be used to notify the device.
> > > > 
> > > > 1. CONFIG_REGION
> > > > 
> > > > The region described by CONFIG_REGION is the main control interface.
> > > > Messages will be written to or read from this region.
> > > > 
> > > > The message type is determined by the `request` field in message
> > > > header. The message size is encoded in the message header too.
> > > > The message format looks like this:
> > > > 
> > > > struct vhost_vfio_op {
> > > > __u64 request;
> > > > __u32 flags;
> > > > /* Flag values: */
> > > >#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > __u32 size;
> > > > union {
> > > > __u64 u64;
> > > > struct vhost_vring_state state;
> > > > struct vhost_vring_addr addr;
> > > > } payload;
> > > > };
> > > > 
> > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > requests in above structure.
> > > 
> > > Still a comments like V1. What's the advantage of inventing a new 
> > > protocol?
> > I'm trying to make it work in VFIO's way..
> > 
> > > I believe either of the following should be better:
> > > 
> > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > extend it with e.g notify region. The advantages is that all exist 
> > > userspace
> > > program could be reused without modification (or minimal modification). 
> > > And
> > > vhost API hides lots of details that is not necessary to be understood by
> > > application (e.g in the case of container).
> > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > or introducing another mdev driver (i.e. vhost_mdev instead of
> > using the existing vfio_mdev) for mdev device?
> 
> 
> Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

> 
> 
> > 
[...]
> > > > 3. VFIO interrupt ioctl API
> > > > 
> > > > VFIO interrupt ioctl API is used to setup device interrupts.
> > > > IRQ-bypass can also be supported.
> > > > 
> > > > Currently, the data path interrupt can be configured via the
> > > > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> > > 
> > > How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> > > SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> > > SET_MEM_TABLE DMA can be done at the level of parent device which means it
> > > can work for e.g the card with on-chip IOMMU.
> > Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
> > to do the DMA programming. But like what you said, there could be
> > a problem when using cards with on-chip IOMMU.
> 
> 
> Yes, another issue is SET_MEM_TABLE can not be used to update just a part of
> the table. This seems less flexible than VFIO API but it could be extended.

Agree.

> 
> 
> > 
> > > And what's the plan for vIOMMU?
> > As this RFC assumes userspace will use VFIO IOMMU API, userspace
> > just needs to follow the same way like what vfio-pci device does
> > in QEMU to support vIOMMU.
> 
> 
> Right, this is more a question for the qemu part. It means it needs to go
> for ordinary VFIO path to get all notifiers/listeners support from vIOMMU.

Yeah.

> 
> 
> > 
> > > 
> > > > Signed-off-by: Tiwei Bie 
> > > > ---
> > > >drivers/vhost/Makefile |   2 +
> > > >drivers/vhost/vdpa.c   | 770 
> > > > +
> > > >include/linux/vdpa_mdev.h  |  72 
> > > >include/uapi/linux/vfio.h  |  19 +
> > > >include/uapi/linux/vhost.h |  25 ++
> > > >5 files changed, 888 insertions(+)
> > > >create mode 100644 drivers/vhost/vdpa.c
> > > >create mode 100644 include/linux/vdpa_mdev.h
> > > 
> > > We probably need some sample parent device implementation. It could be a
> > > software datapath like e.g we can start from virtio-net device in guest 
> > > or a
> > > 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.


Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?



Can we simply add them into ioctl of mdev_parent_ops?





- using PCI layout, then you don't even need to re-invent notifiy region at
all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..



Yes.





Personally, I prefer vhost ioctl.

+1




[...]

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.


How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
SET_MEM_TABLE DMA can be done at the level of parent device which means it
can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.



Yes, another issue is SET_MEM_TABLE can not be used to update just a 
part of the table. This seems less flexible than VFIO API but it could 
be extended.






And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.



Right, this is more a question for the qemu part. It means it needs to 
go for ordinary VFIO path to get all notifiers/listeners support from 
vIOMMU.








Signed-off-by: Tiwei Bie 
---
   drivers/vhost/Makefile |   2 +
   drivers/vhost/vdpa.c   | 770 +
   include/linux/vdpa_mdev.h  |  72 
   include/uapi/linux/vfio.h  |  19 +
   include/uapi/linux/vhost.h |  25 ++
   5 files changed, 888 insertions(+)
   create mode 100644 drivers/vhost/vdpa.c
   create mode 100644 include/linux/vdpa_mdev.h


We probably need some sample parent device implementation. It could be a
software datapath like e.g we can start from virtio-net device in guest or a
vhost/tap on host.

Yeah, something like this would be interesting!



Plan to do something like that :) ?

Thanks




Thanks,
Tiwei


Thanks



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > Details about this can be found here:
> > 
> > https://lwn.net/Articles/750770/
> > 
> > What's new in this version
> > ==
> > 
> > A new VFIO device type is introduced - vfio-vhost. This addressed
> > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > 
> > Below is the updated device interface:
> > 
> > Currently, there are two regions of this device: 1) CONFIG_REGION
> > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > can be used to notify the device.
> > 
> > 1. CONFIG_REGION
> > 
> > The region described by CONFIG_REGION is the main control interface.
> > Messages will be written to or read from this region.
> > 
> > The message type is determined by the `request` field in message
> > header. The message size is encoded in the message header too.
> > The message format looks like this:
> > 
> > struct vhost_vfio_op {
> > __u64 request;
> > __u32 flags;
> > /* Flag values: */
> >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > __u32 size;
> > union {
> > __u64 u64;
> > struct vhost_vring_state state;
> > struct vhost_vring_addr addr;
> > } payload;
> > };
> > 
> > The existing vhost-kernel ioctl cmds are reused as the message
> > requests in above structure.
> 
> 
> Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

> I believe either of the following should be better:
> 
> - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> extend it with e.g notify region. The advantages is that all exist userspace
> program could be reused without modification (or minimal modification). And
> vhost API hides lots of details that is not necessary to be understood by
> application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

> 
> - using PCI layout, then you don't even need to re-invent notifiy region at
> all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..

> 
> Personally, I prefer vhost ioctl.

+1

> 
> 
> > 
[...]
> > 
> > 3. VFIO interrupt ioctl API
> > 
> > VFIO interrupt ioctl API is used to setup device interrupts.
> > IRQ-bypass can also be supported.
> > 
> > Currently, the data path interrupt can be configured via the
> > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> 
> 
> How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> SET_MEM_TABLE DMA can be done at the level of parent device which means it
> can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.

> 
> And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.

> 
> 
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   drivers/vhost/Makefile |   2 +
> >   drivers/vhost/vdpa.c   | 770 +
> >   include/linux/vdpa_mdev.h  |  72 
> >   include/uapi/linux/vfio.h  |  19 +
> >   include/uapi/linux/vhost.h |  25 ++
> >   5 files changed, 888 insertions(+)
> >   create mode 100644 drivers/vhost/vdpa.c
> >   create mode 100644 include/linux/vdpa_mdev.h
> 
> 
> We probably need some sample parent device implementation. It could be a
> software datapath like e.g we can start from virtio-net device in guest or a
> vhost/tap on host.

Yeah, something like this would be interesting!

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.



Still a comments like V1. What's the advantage of inventing a new 
protocol? I believe either of the following should be better:


- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL 
and extend it with e.g notify region. The advantages is that all exist 
userspace program could be reused without modification (or minimal 
modification). And vhost API hides lots of details that is not necessary 
to be understood by application (e.g in the case of container).


- using PCI layout, then you don't even need to re-invent notifiy region 
at all and we can pass-through them to guest.


Personally, I prefer vhost ioctl.




Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
int ret;

ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count)
return -1;

return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
uint64_t request = op->request;
int ret;

ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count || request != op->request)
return -1;

return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
struct vhost_vfio_op op;

op.request = VHOST_SET_FEATURES;
op.flags = 0;
op.size = sizeof(features);
op.payload.u64 = features;

return vhost_vfio_write(dev, );
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
struct vhost_vfio_op op;
int ret;

op.request = VHOST_GET_FEATURES;
op.flags = VHOST_VFIO_NEED_REPLY;
op.size = 0;

/* Just need to write the header */
ret = vhost_vfio_write(dev, );
if (ret != 0)
goto out;

/* `op` wasn't changed during write */
op.flags = 0;
op.size = sizeof(*features);

ret = vhost_vfio_read(dev, );
if (ret != 0)
goto out;

*features = op.payload.u64;
out:
return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
..
offset = host_page_size * queue_idx;

ret = pwrite64(vfio->device_fd, _idx, sizeof(queue_idx),
vfio->notify_offset + offset);
..
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.



How about DMA API? Do you expect to use VFIO IOMMU API or using vhost 
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with 
SET_MEM_TABLE DMA can be done at 

[RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
int ret;

ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count)
return -1;

return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
uint64_t request = op->request;
int ret;

ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count || request != op->request)
return -1;

return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
struct vhost_vfio_op op;

op.request = VHOST_SET_FEATURES;
op.flags = 0;
op.size = sizeof(features);
op.payload.u64 = features;

return vhost_vfio_write(dev, );
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
struct vhost_vfio_op op;
int ret;

op.request = VHOST_GET_FEATURES;
op.flags = VHOST_VFIO_NEED_REPLY;
op.size = 0;

/* Just need to write the header */
ret = vhost_vfio_write(dev, );
if (ret != 0)
goto out;

/* `op` wasn't changed during write */
op.flags = 0;
op.size = sizeof(*features);

ret = vhost_vfio_read(dev, );
if (ret != 0)
goto out;

*features = op.payload.u64;
out:
return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
..
offset = host_page_size * queue_idx;

ret = pwrite64(vfio->device_fd, _idx, sizeof(queue_idx),
vfio->notify_offset + offset);
..
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.

Signed-off-by: Tiwei Bie 
---
 drivers/vhost/Makefile |   2 +
 drivers/vhost/vdpa.c   | 770 +
 include/linux/vdpa_mdev.h  |  72 
 include/uapi/linux/vfio.h  |  19 +
 include/uapi/linux/vhost.h |  25 ++
 5 files changed, 888 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c
 create mode 100644 include/linux/vdpa_mdev.h

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..cabb71095940 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,6 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_VFIO) += vdpa.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index ..5c9426e2a091
--- /dev/null
+++ b/drivers/vhost/vdpa.c
@@ -0,0 +1,770