Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-07 Thread Yongji Xie
On Wed, Jul 7, 2021 at 4:53 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > +static bool vduse_validate_config(struct vduse_dev_config *config)
> > +{
>
> The name field needs to be NUL terminated?
>

I think so.

> > + case VDUSE_CREATE_DEV: {
> > + struct vduse_dev_config config;
> > + unsigned long size = offsetof(struct vduse_dev_config, 
> > config);
> > + void *buf;
> > +
> > + ret = -EFAULT;
> > + if (copy_from_user(, argp, size))
> > + break;
> > +
> > + ret = -EINVAL;
> > + if (vduse_validate_config() == false)
> > + break;
> > +
> > + buf = vmemdup_user(argp + size, config.config_size);
> > + if (IS_ERR(buf)) {
> > + ret = PTR_ERR(buf);
> > + break;
> > + }
> > + ret = vduse_create_dev(, buf, control->api_version);
> > + break;
> > + }
> > + case VDUSE_DESTROY_DEV: {
> > + char name[VDUSE_NAME_MAX];
> > +
> > + ret = -EFAULT;
> > + if (copy_from_user(name, argp, VDUSE_NAME_MAX))
> > + break;
>
> Is this missing a NUL terminator?

Oh, yes. Looks like I need to set '\0' to name[VDUSE_VDUSE_NAME_MAX - 1] here.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-07 Thread Stefan Hajnoczi
On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> +static bool vduse_validate_config(struct vduse_dev_config *config)
> +{

The name field needs to be NUL terminated?

> + case VDUSE_CREATE_DEV: {
> + struct vduse_dev_config config;
> + unsigned long size = offsetof(struct vduse_dev_config, config);
> + void *buf;
> +
> + ret = -EFAULT;
> + if (copy_from_user(, argp, size))
> + break;
> +
> + ret = -EINVAL;
> + if (vduse_validate_config() == false)
> + break;
> +
> + buf = vmemdup_user(argp + size, config.config_size);
> + if (IS_ERR(buf)) {
> + ret = PTR_ERR(buf);
> + break;
> + }
> + ret = vduse_create_dev(, buf, control->api_version);
> + break;
> + }
> + case VDUSE_DESTROY_DEV: {
> + char name[VDUSE_NAME_MAX];
> +
> + ret = -EFAULT;
> + if (copy_from_user(name, argp, VDUSE_NAME_MAX))
> + break;

Is this missing a NUL terminator?


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Jason Wang


在 2021/7/1 下午6:26, Yongji Xie 写道:

On Thu, Jul 1, 2021 at 3:55 PM Jason Wang  wrote:


在 2021/7/1 下午2:50, Yongji Xie 写道:

On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:

On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:

On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:

+/* ioctls */
+
+struct vduse_dev_config {
+ char name[VDUSE_NAME_MAX]; /* vduse device name */
+ __u32 vendor_id; /* virtio vendor id */
+ __u32 device_id; /* virtio device id */
+ __u64 features; /* device features */
+ __u64 bounce_size; /* bounce buffer size for iommu */
+ __u16 vq_size_max; /* the max size of virtqueue */

The VIRTIO specification allows per-virtqueue sizes. A device can have
two virtqueues, where the first one allows up to 1024 descriptors and
the second one allows only 128 descriptors, for example.


Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
that now. All virtqueues have the same maximum size.

I see struct vpda_config_ops only supports a per-device max vq size:
u16 (*get_vq_num_max)(struct vdpa_device *vdev);

virtio-pci supports per-virtqueue sizes because the struct
virtio_pci_common_cfg->queue_size register is per-queue (controlled by
queue_select).


Oh, yes. I miss queue_select.


I guess this is a question for Jason: will vdpa will keep this limitation?
If yes, then VDUSE can stick to it too without running into problems in
the future.


I think it's better to extend the get_vq_num_max() per virtqueue.

Currently, vDPA assumes the parent to have a global max size. This seems
to work on most of the parents but not vp-vDPA (which could be backed by
QEMU, in that case cvq's size is smaller).

Fortunately, we haven't enabled had cvq support in the userspace now.

I can post the fixes.


OK. If so, it looks like we need to support the per-vq configuration.
I wonder if it's better to use something like: VDUSE_CREATE_DEVICE ->
VDUSE_SETUP_VQ -> VDUSE_SETUP_VQ -> ... -> VDUSE_ENABLE_DEVICE to do
initialization rather than only use VDUSE_CREATE_DEVICE.



This should be fine.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Yongji Xie
On Thu, Jul 1, 2021 at 3:55 PM Jason Wang  wrote:
>
>
> 在 2021/7/1 下午2:50, Yongji Xie 写道:
> > On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:
> >> On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:
> >>> On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  
> >>> wrote:
>  On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > +/* ioctls */
> > +
> > +struct vduse_dev_config {
> > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > + __u32 vendor_id; /* virtio vendor id */
> > + __u32 device_id; /* virtio device id */
> > + __u64 features; /* device features */
> > + __u64 bounce_size; /* bounce buffer size for iommu */
> > + __u16 vq_size_max; /* the max size of virtqueue */
>  The VIRTIO specification allows per-virtqueue sizes. A device can have
>  two virtqueues, where the first one allows up to 1024 descriptors and
>  the second one allows only 128 descriptors, for example.
> 
> >>> Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
> >>> that now. All virtqueues have the same maximum size.
> >> I see struct vpda_config_ops only supports a per-device max vq size:
> >> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> >>
> >> virtio-pci supports per-virtqueue sizes because the struct
> >> virtio_pci_common_cfg->queue_size register is per-queue (controlled by
> >> queue_select).
> >>
> > Oh, yes. I miss queue_select.
> >
> >> I guess this is a question for Jason: will vdpa will keep this limitation?
> >> If yes, then VDUSE can stick to it too without running into problems in
> >> the future.
>
>
> I think it's better to extend the get_vq_num_max() per virtqueue.
>
> Currently, vDPA assumes the parent to have a global max size. This seems
> to work on most of the parents but not vp-vDPA (which could be backed by
> QEMU, in that case cvq's size is smaller).
>
> Fortunately, we haven't enabled had cvq support in the userspace now.
>
> I can post the fixes.
>

OK. If so, it looks like we need to support the per-vq configuration.
I wonder if it's better to use something like: VDUSE_CREATE_DEVICE ->
VDUSE_SETUP_VQ -> VDUSE_SETUP_VQ -> ... -> VDUSE_ENABLE_DEVICE to do
initialization rather than only use VDUSE_CREATE_DEVICE.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Jason Wang


在 2021/7/1 下午2:50, Yongji Xie 写道:

On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:

On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:

On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:

+/* ioctls */
+
+struct vduse_dev_config {
+ char name[VDUSE_NAME_MAX]; /* vduse device name */
+ __u32 vendor_id; /* virtio vendor id */
+ __u32 device_id; /* virtio device id */
+ __u64 features; /* device features */
+ __u64 bounce_size; /* bounce buffer size for iommu */
+ __u16 vq_size_max; /* the max size of virtqueue */

The VIRTIO specification allows per-virtqueue sizes. A device can have
two virtqueues, where the first one allows up to 1024 descriptors and
the second one allows only 128 descriptors, for example.


Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
that now. All virtqueues have the same maximum size.

I see struct vpda_config_ops only supports a per-device max vq size:
u16 (*get_vq_num_max)(struct vdpa_device *vdev);

virtio-pci supports per-virtqueue sizes because the struct
virtio_pci_common_cfg->queue_size register is per-queue (controlled by
queue_select).


Oh, yes. I miss queue_select.


I guess this is a question for Jason: will vdpa will keep this limitation?
If yes, then VDUSE can stick to it too without running into problems in
the future.



I think it's better to extend the get_vq_num_max() per virtqueue.

Currently, vDPA assumes the parent to have a global max size. This seems 
to work on most of the parents but not vp-vDPA (which could be backed by 
QEMU, in that case cvq's size is smaller).


Fortunately, we haven't enabled had cvq support in the userspace now.

I can post the fixes.





+ __u16 padding; /* padding */
+ __u32 vq_num; /* the number of virtqueues */
+ __u32 vq_align; /* the allocation alignment of virtqueue's metadata */

I'm not sure what this is?


  This will be used by vring_create_virtqueue() too.

If there is no official definition for the meaning of this value then
"/* same as vring_create_virtqueue()'s vring_align parameter */" would
be clearer. That way the reader knows what to research in order to
understand how this field works.


OK.


I don't remember but maybe it was used to support vrings when the
host/guest have non-4KB page sizes. I wonder if anyone has an official
definition for this value?

Not sure. Maybe we might need some alignment which is less than
PAGE_SIZE sometimes.



So I see CCW always use 4096, but I'm not sure whether or not it's 
smaller than PAGE_SIZE.


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Yongji Xie
On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:
> > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
> > >
> > > On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > > > +/* ioctls */
> > > > +
> > > > +struct vduse_dev_config {
> > > > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > > > + __u32 vendor_id; /* virtio vendor id */
> > > > + __u32 device_id; /* virtio device id */
> > > > + __u64 features; /* device features */
> > > > + __u64 bounce_size; /* bounce buffer size for iommu */
> > > > + __u16 vq_size_max; /* the max size of virtqueue */
> > >
> > > The VIRTIO specification allows per-virtqueue sizes. A device can have
> > > two virtqueues, where the first one allows up to 1024 descriptors and
> > > the second one allows only 128 descriptors, for example.
> > >
> >
> > Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
> > that now. All virtqueues have the same maximum size.
>
> I see struct vpda_config_ops only supports a per-device max vq size:
> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>
> virtio-pci supports per-virtqueue sizes because the struct
> virtio_pci_common_cfg->queue_size register is per-queue (controlled by
> queue_select).
>

Oh, yes. I miss queue_select.

> I guess this is a question for Jason: will vdpa will keep this limitation?
> If yes, then VDUSE can stick to it too without running into problems in
> the future.
>
> > > > + __u16 padding; /* padding */
> > > > + __u32 vq_num; /* the number of virtqueues */
> > > > + __u32 vq_align; /* the allocation alignment of virtqueue's 
> > > > metadata */
> > >
> > > I'm not sure what this is?
> > >
> >
> >  This will be used by vring_create_virtqueue() too.
>
> If there is no official definition for the meaning of this value then
> "/* same as vring_create_virtqueue()'s vring_align parameter */" would
> be clearer. That way the reader knows what to research in order to
> understand how this field works.
>

OK.

> I don't remember but maybe it was used to support vrings when the
> host/guest have non-4KB page sizes. I wonder if anyone has an official
> definition for this value?

Not sure. Maybe we might need some alignment which is less than
PAGE_SIZE sometimes.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-30 Thread Stefan Hajnoczi
On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:
> On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > > +/* ioctls */
> > > +
> > > +struct vduse_dev_config {
> > > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > > + __u32 vendor_id; /* virtio vendor id */
> > > + __u32 device_id; /* virtio device id */
> > > + __u64 features; /* device features */
> > > + __u64 bounce_size; /* bounce buffer size for iommu */
> > > + __u16 vq_size_max; /* the max size of virtqueue */
> >
> > The VIRTIO specification allows per-virtqueue sizes. A device can have
> > two virtqueues, where the first one allows up to 1024 descriptors and
> > the second one allows only 128 descriptors, for example.
> >
> 
> Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
> that now. All virtqueues have the same maximum size.

I see struct vpda_config_ops only supports a per-device max vq size:
u16 (*get_vq_num_max)(struct vdpa_device *vdev);

virtio-pci supports per-virtqueue sizes because the struct
virtio_pci_common_cfg->queue_size register is per-queue (controlled by
queue_select).

I guess this is a question for Jason: will vdpa will keep this limitation?
If yes, then VDUSE can stick to it too without running into problems in
the future.

> > > + __u16 padding; /* padding */
> > > + __u32 vq_num; /* the number of virtqueues */
> > > + __u32 vq_align; /* the allocation alignment of virtqueue's metadata 
> > > */
> >
> > I'm not sure what this is?
> >
> 
>  This will be used by vring_create_virtqueue() too.

If there is no official definition for the meaning of this value then
"/* same as vring_create_virtqueue()'s vring_align parameter */" would
be clearer. That way the reader knows what to research in order to
understand how this field works.

I don't remember but maybe it was used to support vrings when the
host/guest have non-4KB page sizes. I wonder if anyone has an official
definition for this value?


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Jason Wang


在 2021/6/29 上午11:56, Yongji Xie 写道:

On Tue, Jun 29, 2021 at 11:29 AM Jason Wang  wrote:


在 2021/6/29 上午10:26, Yongji Xie 写道:

On Mon, Jun 28, 2021 at 12:40 PM Jason Wang  wrote:

在 2021/6/25 下午12:19, Yongji Xie 写道:

2b) for set_status(): simply relay the message to userspace, reply is no
needed. Userspace will use a command to update the status when the
datapath is stop. The the status could be fetched via get_stats().

2b looks more spec complaint.


Looks good to me. And I think we can use the reply of the message to
update the status instead of introducing a new command.


Just notice this part in virtio_finalize_features():

   virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
   status = dev->config->get_status(dev);
   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {

So we no reply doesn't work for FEATURES_OK.

So my understanding is:

1) We must not use noreply for set_status()
2) We can use noreply for get_status(), but it requires a new ioctl to
update the status.

So it looks to me we need synchronize for both get_status() and
set_status().


We should not send messages to userspace in the FEATURES_OK case. So
the synchronization is not necessary.


As discussed previously, there could be a device that mandates some
features (VIRTIO_F_RING_PACKED). So it can choose to not accept
FEATURES_OK is packed virtqueue is not negotiated.

In this case we need to relay the message to userspace.


OK, I see. If so, I prefer to only use noreply for set_status(). We do
not set the status bit if the message is failed. In this way, we don't
need to change lots of virtio core codes to handle the failure of
set_status()/get_status().



It should work.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Yongji Xie
On Tue, Jun 29, 2021 at 11:29 AM Jason Wang  wrote:
>
>
> 在 2021/6/29 上午10:26, Yongji Xie 写道:
> > On Mon, Jun 28, 2021 at 12:40 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/25 下午12:19, Yongji Xie 写道:
>  2b) for set_status(): simply relay the message to userspace, reply is no
>  needed. Userspace will use a command to update the status when the
>  datapath is stop. The the status could be fetched via get_stats().
> 
>  2b looks more spec complaint.
> 
> >>> Looks good to me. And I think we can use the reply of the message to
> >>> update the status instead of introducing a new command.
> >>>
> >> Just notice this part in virtio_finalize_features():
> >>
> >>   virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> >>   status = dev->config->get_status(dev);
> >>   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> >>
> >> So we no reply doesn't work for FEATURES_OK.
> >>
> >> So my understanding is:
> >>
> >> 1) We must not use noreply for set_status()
> >> 2) We can use noreply for get_status(), but it requires a new ioctl to
> >> update the status.
> >>
> >> So it looks to me we need synchronize for both get_status() and
> >> set_status().
> >>
> > We should not send messages to userspace in the FEATURES_OK case. So
> > the synchronization is not necessary.
>
>
> As discussed previously, there could be a device that mandates some
> features (VIRTIO_F_RING_PACKED). So it can choose to not accept
> FEATURES_OK is packed virtqueue is not negotiated.
>
> In this case we need to relay the message to userspace.
>

OK, I see. If so, I prefer to only use noreply for set_status(). We do
not set the status bit if the message is failed. In this way, we don't
need to change lots of virtio core codes to handle the failure of
set_status()/get_status().

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Jason Wang


在 2021/6/29 上午10:26, Yongji Xie 写道:

On Mon, Jun 28, 2021 at 12:40 PM Jason Wang  wrote:


在 2021/6/25 下午12:19, Yongji Xie 写道:

2b) for set_status(): simply relay the message to userspace, reply is no
needed. Userspace will use a command to update the status when the
datapath is stop. The the status could be fetched via get_stats().

2b looks more spec complaint.


Looks good to me. And I think we can use the reply of the message to
update the status instead of introducing a new command.


Just notice this part in virtio_finalize_features():

  virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
  status = dev->config->get_status(dev);
  if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {

So we no reply doesn't work for FEATURES_OK.

So my understanding is:

1) We must not use noreply for set_status()
2) We can use noreply for get_status(), but it requires a new ioctl to
update the status.

So it looks to me we need synchronize for both get_status() and
set_status().


We should not send messages to userspace in the FEATURES_OK case. So
the synchronization is not necessary.



As discussed previously, there could be a device that mandates some 
features (VIRTIO_F_RING_PACKED). So it can choose to not accept 
FEATURES_OK is packed virtqueue is not negotiated.


In this case we need to relay the message to userspace.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Yongji Xie
On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> > new file mode 100644
> > index ..f21b2e51b5c8
> > --- /dev/null
> > +++ b/include/uapi/linux/vduse.h
> > @@ -0,0 +1,143 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +#ifndef _UAPI_VDUSE_H_
> > +#define _UAPI_VDUSE_H_
> > +
> > +#include 
> > +
> > +#define VDUSE_API_VERSION0
> > +
> > +#define VDUSE_NAME_MAX   256
> > +
> > +/* the control messages definition for read/write */
> > +
> > +enum vduse_req_type {
> > + /* Get the state for virtqueue from userspace */
> > + VDUSE_GET_VQ_STATE,
> > + /* Notify userspace to start the dataplane, no reply */
> > + VDUSE_START_DATAPLANE,
> > + /* Notify userspace to stop the dataplane, no reply */
> > + VDUSE_STOP_DATAPLANE,
> > + /* Notify userspace to update the memory mapping in device IOTLB */
> > + VDUSE_UPDATE_IOTLB,
> > +};
> > +
> > +struct vduse_vq_state {
> > + __u32 index; /* virtqueue index */
> > + __u32 avail_idx; /* virtqueue state (last_avail_idx) */
> > +};
> > +
> > +struct vduse_iova_range {
> > + __u64 start; /* start of the IOVA range */
> > + __u64 last; /* end of the IOVA range */
>
> Please clarify whether this describes a closed range [start, last] or an
> open range [start, last).
>

OK.

> > +};
> > +
> > +struct vduse_dev_request {
> > + __u32 type; /* request type */
> > + __u32 request_id; /* request id */
> > +#define VDUSE_REQ_FLAGS_NO_REPLY (1 << 0) /* No need to reply */
> > + __u32 flags; /* request flags */
> > + __u32 reserved; /* for future use */
> > + union {
> > + struct vduse_vq_state vq_state; /* virtqueue state */
> > + struct vduse_iova_range iova; /* iova range for updating */
> > + __u32 padding[16]; /* padding */
> > + };
> > +};
> > +
> > +struct vduse_dev_response {
> > + __u32 request_id; /* corresponding request id */
> > +#define VDUSE_REQ_RESULT_OK  0x00
> > +#define VDUSE_REQ_RESULT_FAILED  0x01
> > + __u32 result; /* the result of request */
> > + __u32 reserved[2]; /* for future use */
> > + union {
> > + struct vduse_vq_state vq_state; /* virtqueue state */
> > + __u32 padding[16]; /* padding */
> > + };
> > +};
> > +
> > +/* ioctls */
> > +
> > +struct vduse_dev_config {
> > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > + __u32 vendor_id; /* virtio vendor id */
> > + __u32 device_id; /* virtio device id */
> > + __u64 features; /* device features */
> > + __u64 bounce_size; /* bounce buffer size for iommu */
> > + __u16 vq_size_max; /* the max size of virtqueue */
>
> The VIRTIO specification allows per-virtqueue sizes. A device can have
> two virtqueues, where the first one allows up to 1024 descriptors and
> the second one allows only 128 descriptors, for example.
>

Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
that now. All virtqueues have the same maximum size.

> This constant seems to impose the constraint that all virtqueues have
> the same maximum size. Is this really necessary?
>

This will be used by vring_create_virtqueue(). We need to specify the
maximum queue size supported by the device.

> > + __u16 padding; /* padding */
> > + __u32 vq_num; /* the number of virtqueues */
> > + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */
>
> I'm not sure what this is?
>

 This will be used by vring_create_virtqueue() too.

> > + __u32 config_size; /* the size of the configuration space */
> > + __u32 reserved[15]; /* for future use */
> > + __u8 config[0]; /* the buffer of the configuration space */
> > +};
> > +
> > +struct vduse_iotlb_entry {
> > + __u64 offset; /* the mmap offset on fd */
> > + __u64 start; /* start of the IOVA range */
> > + __u64 last; /* last of the IOVA range */
>
> Same here, please specify whether this is an open range or a closed
> range.
>

Sure.

> > +#define VDUSE_ACCESS_RO 0x1
> > +#define VDUSE_ACCESS_WO 0x2
> > +#define VDUSE_ACCESS_RW 0x3
> > + __u8 perm; /* access permission of this range */
> > +};
> > +
> > +struct vduse_config_update {
> > + __u32 offset; /* offset from the beginning of configuration space */
> > + __u32 length; /* the length to write to configuration space */
> > + __u8 buffer[0]; /* buffer used to write from */
> > +};
> > +
> > +struct vduse_vq_info {
> > + __u32 index; /* virtqueue index */
> > + __u32 avail_idx; /* virtqueue state (last_avail_idx) */
> > + __u64 desc_addr; /* address of desc area */
> > + __u64 driver_addr; /* address of driver area */
> > + __u64 device_addr; /* address of device area */
> > + __u32 num; /* the size of virtqueue */
> > + __u8 ready; /* ready status of 

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Yongji Xie
On Mon, Jun 28, 2021 at 12:40 PM Jason Wang  wrote:
>
>
> 在 2021/6/25 下午12:19, Yongji Xie 写道:
> >> 2b) for set_status(): simply relay the message to userspace, reply is no
> >> needed. Userspace will use a command to update the status when the
> >> datapath is stop. The the status could be fetched via get_stats().
> >>
> >> 2b looks more spec complaint.
> >>
> > Looks good to me. And I think we can use the reply of the message to
> > update the status instead of introducing a new command.
> >
>
> Just notice this part in virtio_finalize_features():
>
>  virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>  status = dev->config->get_status(dev);
>  if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
>
> So we no reply doesn't work for FEATURES_OK.
>
> So my understanding is:
>
> 1) We must not use noreply for set_status()
> 2) We can use noreply for get_status(), but it requires a new ioctl to
> update the status.
>
> So it looks to me we need synchronize for both get_status() and
> set_status().
>

We should not send messages to userspace in the FEATURES_OK case. So
the synchronization is not necessary.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-28 Thread Stefan Hajnoczi
On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> new file mode 100644
> index ..f21b2e51b5c8
> --- /dev/null
> +++ b/include/uapi/linux/vduse.h
> @@ -0,0 +1,143 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_VDUSE_H_
> +#define _UAPI_VDUSE_H_
> +
> +#include 
> +
> +#define VDUSE_API_VERSION0
> +
> +#define VDUSE_NAME_MAX   256
> +
> +/* the control messages definition for read/write */
> +
> +enum vduse_req_type {
> + /* Get the state for virtqueue from userspace */
> + VDUSE_GET_VQ_STATE,
> + /* Notify userspace to start the dataplane, no reply */
> + VDUSE_START_DATAPLANE,
> + /* Notify userspace to stop the dataplane, no reply */
> + VDUSE_STOP_DATAPLANE,
> + /* Notify userspace to update the memory mapping in device IOTLB */
> + VDUSE_UPDATE_IOTLB,
> +};
> +
> +struct vduse_vq_state {
> + __u32 index; /* virtqueue index */
> + __u32 avail_idx; /* virtqueue state (last_avail_idx) */
> +};
> +
> +struct vduse_iova_range {
> + __u64 start; /* start of the IOVA range */
> + __u64 last; /* end of the IOVA range */

Please clarify whether this describes a closed range [start, last] or an
open range [start, last).

> +};
> +
> +struct vduse_dev_request {
> + __u32 type; /* request type */
> + __u32 request_id; /* request id */
> +#define VDUSE_REQ_FLAGS_NO_REPLY (1 << 0) /* No need to reply */
> + __u32 flags; /* request flags */
> + __u32 reserved; /* for future use */
> + union {
> + struct vduse_vq_state vq_state; /* virtqueue state */
> + struct vduse_iova_range iova; /* iova range for updating */
> + __u32 padding[16]; /* padding */
> + };
> +};
> +
> +struct vduse_dev_response {
> + __u32 request_id; /* corresponding request id */
> +#define VDUSE_REQ_RESULT_OK  0x00
> +#define VDUSE_REQ_RESULT_FAILED  0x01
> + __u32 result; /* the result of request */
> + __u32 reserved[2]; /* for future use */
> + union {
> + struct vduse_vq_state vq_state; /* virtqueue state */
> + __u32 padding[16]; /* padding */
> + };
> +};
> +
> +/* ioctls */
> +
> +struct vduse_dev_config {
> + char name[VDUSE_NAME_MAX]; /* vduse device name */
> + __u32 vendor_id; /* virtio vendor id */
> + __u32 device_id; /* virtio device id */
> + __u64 features; /* device features */
> + __u64 bounce_size; /* bounce buffer size for iommu */
> + __u16 vq_size_max; /* the max size of virtqueue */

The VIRTIO specification allows per-virtqueue sizes. A device can have
two virtqueues, where the first one allows up to 1024 descriptors and
the second one allows only 128 descriptors, for example.

This constant seems to impose the constraint that all virtqueues have
the same maximum size. Is this really necessary?

> + __u16 padding; /* padding */
> + __u32 vq_num; /* the number of virtqueues */
> + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */

I'm not sure what this is?

> + __u32 config_size; /* the size of the configuration space */
> + __u32 reserved[15]; /* for future use */
> + __u8 config[0]; /* the buffer of the configuration space */
> +};
> +
> +struct vduse_iotlb_entry {
> + __u64 offset; /* the mmap offset on fd */
> + __u64 start; /* start of the IOVA range */
> + __u64 last; /* last of the IOVA range */

Same here, please specify whether this is an open range or a closed
range.

> +#define VDUSE_ACCESS_RO 0x1
> +#define VDUSE_ACCESS_WO 0x2
> +#define VDUSE_ACCESS_RW 0x3
> + __u8 perm; /* access permission of this range */
> +};
> +
> +struct vduse_config_update {
> + __u32 offset; /* offset from the beginning of configuration space */
> + __u32 length; /* the length to write to configuration space */
> + __u8 buffer[0]; /* buffer used to write from */
> +};
> +
> +struct vduse_vq_info {
> + __u32 index; /* virtqueue index */
> + __u32 avail_idx; /* virtqueue state (last_avail_idx) */
> + __u64 desc_addr; /* address of desc area */
> + __u64 driver_addr; /* address of driver area */
> + __u64 device_addr; /* address of device area */
> + __u32 num; /* the size of virtqueue */
> + __u8 ready; /* ready status of virtqueue */
> +};
> +
> +struct vduse_vq_eventfd {
> + __u32 index; /* virtqueue index */
> +#define VDUSE_EVENTFD_DEASSIGN -1
> + int fd; /* eventfd, -1 means de-assigning the eventfd */
> +};
> +
> +#define VDUSE_BASE   0x81
> +
> +/* Get the version of VDUSE API. This is used for future extension */
> +#define VDUSE_GET_API_VERSION_IOR(VDUSE_BASE, 0x00, __u64)
> +
> +/* Set the version of VDUSE API. */
> +#define VDUSE_SET_API_VERSION_IOW(VDUSE_BASE, 0x01, __u64)
> +
> +/* Create a vduse device which is represented by a char device 
> (/dev/vduse/) */
> +#define 

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-27 Thread Jason Wang


在 2021/6/25 下午12:19, Yongji Xie 写道:

2b) for set_status(): simply relay the message to userspace, reply is no
needed. Userspace will use a command to update the status when the
datapath is stop. The the status could be fetched via get_stats().

2b looks more spec complaint.


Looks good to me. And I think we can use the reply of the message to
update the status instead of introducing a new command.



Just notice this part in virtio_finalize_features():

    virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
    status = dev->config->get_status(dev);
    if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {

So we no reply doesn't work for FEATURES_OK.

So my understanding is:

1) We must not use noreply for set_status()
2) We can use noreply for get_status(), but it requires a new ioctl to 
update the status.


So it looks to me we need synchronize for both get_status() and 
set_status().


Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Yongji Xie
On Fri, Jun 25, 2021 at 11:09 AM Jason Wang  wrote:
>
>
> 在 2021/6/24 下午5:16, Yongji Xie 写道:
> > On Thu, Jun 24, 2021 at 4:14 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/24 下午12:46, Yongji Xie 写道:
>  So we need to deal with both FEATURES_OK and reset, but probably not
>  DRIVER_OK.
> 
> >>> OK, I see. Thanks for the explanation. One more question is how about
> >>> clearing the corresponding status bit in get_status() rather than
> >>> making set_status() fail. Since the spec recommends this way for
> >>> validation which is done in virtio_dev_remove() and
> >>> virtio_finalize_features().
> >>>
> >>> Thanks,
> >>> Yongji
> >>>
> >> I think you can. Or it would be even better that we just don't set the
> >> bit during set_status().
> >>
> > Yes, that's what I mean.
> >
> >> I just realize that in vdpa_reset() we had:
> >>
> >> static inline void vdpa_reset(struct vdpa_device *vdev)
> >> {
> >>   const struct vdpa_config_ops *ops = vdev->config;
> >>
> >>   vdev->features_valid = false;
> >>   ops->set_status(vdev, 0);
> >> }
> >>
> >> We probably need to add the synchronization here. E.g re-read with a
> >> timeout.
> >>
> > Looks like the timeout is already in set_status().
>
>
> Do you mean the VDUSE's implementation?
>

Yes.

>
> >   Do we really need a
> > duplicated one here?
>
>
> 1) this is the timeout at the vDPA layer instead of the VDUSE layer.

OK, I get it.

> 2) it really depends on what's the meaning of the timeout for set_status
> of VDUSE.
>
> Do we want:
>
> 2a) for set_status(): relay the message to userspace and wait for the
> userspace to quiescence the datapath
>
> or
>
> 2b) for set_status(): simply relay the message to userspace, reply is no
> needed. Userspace will use a command to update the status when the
> datapath is stop. The the status could be fetched via get_stats().
>
> 2b looks more spec complaint.
>

Looks good to me. And I think we can use the reply of the message to
update the status instead of introducing a new command.

> > And how to handle failure? Adding a return value
> > to virtio_config_ops->reset() and passing the error to the upper
> > layer?
>
>
> Something like this.
>

OK.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Jason Wang


在 2021/6/24 下午5:16, Yongji Xie 写道:

On Thu, Jun 24, 2021 at 4:14 PM Jason Wang  wrote:


在 2021/6/24 下午12:46, Yongji Xie 写道:

So we need to deal with both FEATURES_OK and reset, but probably not
DRIVER_OK.


OK, I see. Thanks for the explanation. One more question is how about
clearing the corresponding status bit in get_status() rather than
making set_status() fail. Since the spec recommends this way for
validation which is done in virtio_dev_remove() and
virtio_finalize_features().

Thanks,
Yongji


I think you can. Or it would be even better that we just don't set the
bit during set_status().


Yes, that's what I mean.


I just realize that in vdpa_reset() we had:

static inline void vdpa_reset(struct vdpa_device *vdev)
{
  const struct vdpa_config_ops *ops = vdev->config;

  vdev->features_valid = false;
  ops->set_status(vdev, 0);
}

We probably need to add the synchronization here. E.g re-read with a
timeout.


Looks like the timeout is already in set_status().



Do you mean the VDUSE's implementation?



  Do we really need a
duplicated one here?



1) this is the timeout at the vDPA layer instead of the VDUSE layer.
2) it really depends on what's the meaning of the timeout for set_status 
of VDUSE.


Do we want:

2a) for set_status(): relay the message to userspace and wait for the 
userspace to quiescence the datapath


or

2b) for set_status(): simply relay the message to userspace, reply is no 
needed. Userspace will use a command to update the status when the 
datapath is stop. The the status could be fetched via get_stats().


2b looks more spec complaint.


And how to handle failure? Adding a return value
to virtio_config_ops->reset() and passing the error to the upper
layer?



Something like this.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Yongji Xie
On Thu, Jun 24, 2021 at 4:14 PM Jason Wang  wrote:
>
>
> 在 2021/6/24 下午12:46, Yongji Xie 写道:
> >> So we need to deal with both FEATURES_OK and reset, but probably not
> >> DRIVER_OK.
> >>
> > OK, I see. Thanks for the explanation. One more question is how about
> > clearing the corresponding status bit in get_status() rather than
> > making set_status() fail. Since the spec recommends this way for
> > validation which is done in virtio_dev_remove() and
> > virtio_finalize_features().
> >
> > Thanks,
> > Yongji
> >
>
> I think you can. Or it would be even better that we just don't set the
> bit during set_status().
>

Yes, that's what I mean.

> I just realize that in vdpa_reset() we had:
>
> static inline void vdpa_reset(struct vdpa_device *vdev)
> {
>  const struct vdpa_config_ops *ops = vdev->config;
>
>  vdev->features_valid = false;
>  ops->set_status(vdev, 0);
> }
>
> We probably need to add the synchronization here. E.g re-read with a
> timeout.
>

Looks like the timeout is already in set_status(). Do we really need a
duplicated one here? And how to handle failure? Adding a return value
to virtio_config_ops->reset() and passing the error to the upper
layer?

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-24 Thread Jason Wang


在 2021/6/24 下午12:46, Yongji Xie 写道:

So we need to deal with both FEATURES_OK and reset, but probably not
DRIVER_OK.


OK, I see. Thanks for the explanation. One more question is how about
clearing the corresponding status bit in get_status() rather than
making set_status() fail. Since the spec recommends this way for
validation which is done in virtio_dev_remove() and
virtio_finalize_features().

Thanks,
Yongji



I think you can. Or it would be even better that we just don't set the 
bit during set_status().


I just realize that in vdpa_reset() we had:

static inline void vdpa_reset(struct vdpa_device *vdev)
{
    const struct vdpa_config_ops *ops = vdev->config;

    vdev->features_valid = false;
    ops->set_status(vdev, 0);
}

We probably need to add the synchronization here. E.g re-read with a 
timeout.


Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-23 Thread Yongji Xie
On Thu, Jun 24, 2021 at 11:35 AM Jason Wang  wrote:
>
>
> 在 2021/6/23 下午1:50, Yongji Xie 写道:
> > On Wed, Jun 23, 2021 at 11:31 AM Jason Wang  wrote:
> >>
> >> 在 2021/6/22 下午4:14, Yongji Xie 写道:
> >>> On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:
>  在 2021/6/22 下午3:22, Yongji Xie 写道:
> >> We need fix a way to propagate the error to the userspace.
> >>
> >> E.g if we want to stop the deivce, we will delay the status reset until
> >> we get respose from the userspace?
> >>
> > I didn't get how to delay the status reset. And should it be a DoS
> > that we want to fix if the userspace doesn't give a response forever?
>  You're right. So let's make set_status() can fail first, then propagate
>  its failure via VHOST_VDPA_SET_STATUS.
> 
> >>> OK. So we only need to propagate the failure in the vhost-vdpa case, 
> >>> right?
> >>
> >> I think not, we need to deal with the reset for virtio as well:
> >>
> >> E.g in register_virtio_devices(), we have:
> >>
> >>   /* We always start by resetting the device, in case a previous
> >>* driver messed it up.  This also tests that code path a
> >> little. */
> >> dev->config->reset(dev);
> >>
> >> We probably need to make reset can fail and then fail the
> >> register_virtio_device() as well.
> >>
> > OK, looks like virtio_add_status() and virtio_device_ready()[1] should
> > be also modified if we need to propagate the failure in the
> > virtio-vdpa case. Or do we only need to care about the reset case?
> >
> > [1] 
> > https://lore.kernel.org/lkml/20210517093428.670-1-xieyon...@bytedance.com/
>
>
> My understanding is DRIVER_OK is not something that needs to be validated:
>
> "
>
> DRIVER_OK (4)
> Indicates that the driver is set up and ready to drive the device.
>
> "
>
> Since the spec doesn't require to re-read the and check if DRIVER_OK is
> set in 3.1.1 Driver Requirements: Device Initialization.
>
> It's more about "telling the device that driver is ready."
>
> But we don have some status bit that requires the synchronization with
> the device.
>
> 1) FEATURES_OK, spec requires to re-read the status bit to check whether
> or it it was set by the device:
>
> "
>
> Re-read device status to ensure the FEATURES_OK bit is still set:
> otherwise, the device does not support our subset of features and the
> device is unusable.
>
> "
>
> This is useful for some device which can only support a subset of the
> features. E.g a device that can only work for packed virtqueue. This
> means the current design of set_features won't work, we need either:
>
> 1a) relay the set_features request to userspace
>
> or
>
> 1b) introduce a mandated_device_features during device creation and
> validate the driver features during the set_features(), and don't set
> FEATURES_OK if they don't match.
>
>
> 2) Some transports (PCI) requires to re-read the status to ensure the
> synchronization.
>
> "
>
> After writing 0 to device_status, the driver MUST wait for a read of
> device_status to return 0 before reinitializing the device.
>
> "
>
> So we need to deal with both FEATURES_OK and reset, but probably not
> DRIVER_OK.
>

OK, I see. Thanks for the explanation. One more question is how about
clearing the corresponding status bit in get_status() rather than
making set_status() fail. Since the spec recommends this way for
validation which is done in virtio_dev_remove() and
virtio_finalize_features().

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-23 Thread Jason Wang


在 2021/6/23 下午1:50, Yongji Xie 写道:

On Wed, Jun 23, 2021 at 11:31 AM Jason Wang  wrote:


在 2021/6/22 下午4:14, Yongji Xie 写道:

On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:

在 2021/6/22 下午3:22, Yongji Xie 写道:

We need fix a way to propagate the error to the userspace.

E.g if we want to stop the deivce, we will delay the status reset until
we get respose from the userspace?


I didn't get how to delay the status reset. And should it be a DoS
that we want to fix if the userspace doesn't give a response forever?

You're right. So let's make set_status() can fail first, then propagate
its failure via VHOST_VDPA_SET_STATUS.


OK. So we only need to propagate the failure in the vhost-vdpa case, right?


I think not, we need to deal with the reset for virtio as well:

E.g in register_virtio_devices(), we have:

  /* We always start by resetting the device, in case a previous
   * driver messed it up.  This also tests that code path a
little. */
dev->config->reset(dev);

We probably need to make reset can fail and then fail the
register_virtio_device() as well.


OK, looks like virtio_add_status() and virtio_device_ready()[1] should
be also modified if we need to propagate the failure in the
virtio-vdpa case. Or do we only need to care about the reset case?

[1] https://lore.kernel.org/lkml/20210517093428.670-1-xieyon...@bytedance.com/



My understanding is DRIVER_OK is not something that needs to be validated:

"

DRIVER_OK (4)
Indicates that the driver is set up and ready to drive the device.

"

Since the spec doesn't require to re-read the and check if DRIVER_OK is 
set in 3.1.1 Driver Requirements: Device Initialization.


It's more about "telling the device that driver is ready."

But we don have some status bit that requires the synchronization with 
the device.


1) FEATURES_OK, spec requires to re-read the status bit to check whether 
or it it was set by the device:


"

Re-read device status to ensure the FEATURES_OK bit is still set: 
otherwise, the device does not support our subset of features and the 
device is unusable.


"

This is useful for some device which can only support a subset of the 
features. E.g a device that can only work for packed virtqueue. This 
means the current design of set_features won't work, we need either:


1a) relay the set_features request to userspace

or

1b) introduce a mandated_device_features during device creation and 
validate the driver features during the set_features(), and don't set 
FEATURES_OK if they don't match.



2) Some transports (PCI) requires to re-read the status to ensure the 
synchronization.


"

After writing 0 to device_status, the driver MUST wait for a read of 
device_status to return 0 before reinitializing the device.


"

So we need to deal with both FEATURES_OK and reset, but probably not 
DRIVER_OK.


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
On Wed, Jun 23, 2021 at 11:31 AM Jason Wang  wrote:
>
>
> 在 2021/6/22 下午4:14, Yongji Xie 写道:
> > On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/22 下午3:22, Yongji Xie 写道:
>  We need fix a way to propagate the error to the userspace.
> 
>  E.g if we want to stop the deivce, we will delay the status reset until
>  we get respose from the userspace?
> 
> >>> I didn't get how to delay the status reset. And should it be a DoS
> >>> that we want to fix if the userspace doesn't give a response forever?
> >>
> >> You're right. So let's make set_status() can fail first, then propagate
> >> its failure via VHOST_VDPA_SET_STATUS.
> >>
> > OK. So we only need to propagate the failure in the vhost-vdpa case, right?
>
>
> I think not, we need to deal with the reset for virtio as well:
>
> E.g in register_virtio_devices(), we have:
>
>  /* We always start by resetting the device, in case a previous
>   * driver messed it up.  This also tests that code path a
> little. */
>dev->config->reset(dev);
>
> We probably need to make reset can fail and then fail the
> register_virtio_device() as well.
>

OK, looks like virtio_add_status() and virtio_device_ready()[1] should
be also modified if we need to propagate the failure in the
virtio-vdpa case. Or do we only need to care about the reset case?

[1] https://lore.kernel.org/lkml/20210517093428.670-1-xieyon...@bytedance.com/

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Jason Wang


在 2021/6/22 下午4:14, Yongji Xie 写道:

On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:


在 2021/6/22 下午3:22, Yongji Xie 写道:

We need fix a way to propagate the error to the userspace.

E.g if we want to stop the deivce, we will delay the status reset until
we get respose from the userspace?


I didn't get how to delay the status reset. And should it be a DoS
that we want to fix if the userspace doesn't give a response forever?


You're right. So let's make set_status() can fail first, then propagate
its failure via VHOST_VDPA_SET_STATUS.


OK. So we only need to propagate the failure in the vhost-vdpa case, right?



I think not, we need to deal with the reset for virtio as well:

E.g in register_virtio_devices(), we have:

    /* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a 
little. */

  dev->config->reset(dev);

We probably need to make reset can fail and then fail the 
register_virtio_device() as well.


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
On Tue, Jun 22, 2021 at 3:50 PM Jason Wang  wrote:
>
>
> 在 2021/6/22 下午3:22, Yongji Xie 写道:
> >> We need fix a way to propagate the error to the userspace.
> >>
> >> E.g if we want to stop the deivce, we will delay the status reset until
> >> we get respose from the userspace?
> >>
> > I didn't get how to delay the status reset. And should it be a DoS
> > that we want to fix if the userspace doesn't give a response forever?
>
>
> You're right. So let's make set_status() can fail first, then propagate
> its failure via VHOST_VDPA_SET_STATUS.
>

OK. So we only need to propagate the failure in the vhost-vdpa case, right?

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Jason Wang


在 2021/6/22 下午3:22, Yongji Xie 写道:

We need fix a way to propagate the error to the userspace.

E.g if we want to stop the deivce, we will delay the status reset until
we get respose from the userspace?


I didn't get how to delay the status reset. And should it be a DoS
that we want to fix if the userspace doesn't give a response forever?



You're right. So let's make set_status() can fail first, then propagate 
its failure via VHOST_VDPA_SET_STATUS.






+ }
+}
+
+static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ return dev->config_size;
+}
+
+static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int 
offset,
+   void *buf, unsigned int len)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ memcpy(buf, dev->config + offset, len);
+}
+
+static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int 
offset,
+ const void *buf, unsigned int len)
+{
+ /* Now we only support read-only configuration space */
+}
+
+static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ return dev->generation;
+}
+
+static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
+ struct vhost_iotlb *iotlb)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+ int ret;
+
+ ret = vduse_domain_set_map(dev->domain, iotlb);
+ if (ret)
+ return ret;
+
+ ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX);
+ if (ret) {
+ vduse_domain_clear_map(dev->domain, iotlb);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void vduse_vdpa_free(struct vdpa_device *vdpa)
+{
+ struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+ dev->vdev = NULL;
+}
+
+static const struct vdpa_config_ops vduse_vdpa_config_ops = {
+ .set_vq_address = vduse_vdpa_set_vq_address,
+ .kick_vq= vduse_vdpa_kick_vq,
+ .set_vq_cb  = vduse_vdpa_set_vq_cb,
+ .set_vq_num = vduse_vdpa_set_vq_num,
+ .set_vq_ready   = vduse_vdpa_set_vq_ready,
+ .get_vq_ready   = vduse_vdpa_get_vq_ready,
+ .set_vq_state   = vduse_vdpa_set_vq_state,
+ .get_vq_state   = vduse_vdpa_get_vq_state,
+ .get_vq_align   = vduse_vdpa_get_vq_align,
+ .get_features   = vduse_vdpa_get_features,
+ .set_features   = vduse_vdpa_set_features,
+ .set_config_cb  = vduse_vdpa_set_config_cb,
+ .get_vq_num_max = vduse_vdpa_get_vq_num_max,
+ .get_device_id  = vduse_vdpa_get_device_id,
+ .get_vendor_id  = vduse_vdpa_get_vendor_id,
+ .get_status = vduse_vdpa_get_status,
+ .set_status = vduse_vdpa_set_status,
+ .get_config_size= vduse_vdpa_get_config_size,
+ .get_config = vduse_vdpa_get_config,
+ .set_config = vduse_vdpa_set_config,
+ .get_generation = vduse_vdpa_get_generation,
+ .set_map= vduse_vdpa_set_map,
+ .free   = vduse_vdpa_free,
+};
+
+static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page,
+  unsigned long offset, size_t size,
+  enum dma_data_direction dir,
+  unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+
+ return vduse_domain_map_page(domain, page, offset, size, dir, attrs);
+}
+
+static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+
+ return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs);
+}
+
+static void *vduse_dev_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_addr, gfp_t flag,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;
+ unsigned long iova;
+ void *addr;
+
+ *dma_addr = DMA_MAPPING_ERROR;
+ addr = vduse_domain_alloc_coherent(domain, size,
+ (dma_addr_t *), flag, attrs);
+ if (!addr)
+ return NULL;
+
+ *dma_addr = (dma_addr_t)iova;
+
+ return addr;
+}
+
+static void vduse_dev_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_addr,
+ unsigned long attrs)
+{
+ struct vduse_dev *vdev = dev_to_vduse(dev);
+ struct vduse_iova_domain *domain = vdev->domain;

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-22 Thread Yongji Xie
.On Tue, Jun 22, 2021 at 1:07 PM Jason Wang  wrote:
>
>
> 在 2021/6/21 下午6:41, Yongji Xie 写道:
> > On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:
> >>
> >> 在 2021/6/15 下午10:13, Xie Yongji 写道:
> >>> This VDUSE driver enables implementing vDPA devices in userspace.
> >>> The vDPA device's control path is handled in kernel and the data
> >>> path is handled in userspace.
> >>>
> >>> A message mechnism is used by VDUSE driver to forward some control
> >>> messages such as starting/stopping datapath to userspace. Userspace
> >>> can use read()/write() to receive/reply those control messages.
> >>>
> >>> And some ioctls are introduced to help userspace to implement the
> >>> data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
> >>> descriptors referring to vDPA device's iova regions. Then userspace
> >>> can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
> >>> and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
> >>> and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
> >>> ioctls can be used to inject interrupt and setup the kickfd for
> >>> virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
> >>> configuration space and inject a config interrupt.
> >>>
> >>> Signed-off-by: Xie Yongji 
> >>> ---
> >>>Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
> >>>drivers/vdpa/Kconfig   |   10 +
> >>>drivers/vdpa/Makefile  |1 +
> >>>drivers/vdpa/vdpa_user/Makefile|5 +
> >>>drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
> >>> 
> >>>include/uapi/linux/vduse.h |  143 ++
> >>>6 files changed, 1613 insertions(+)
> >>>create mode 100644 drivers/vdpa/vdpa_user/Makefile
> >>>create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
> >>>create mode 100644 include/uapi/linux/vduse.h
> >>>
> >>> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> >>> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> index 9bfc2b510c64..acd95e9dcfe7 100644
> >>> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> >>> @@ -300,6 +300,7 @@ Code  Seq#Include File
> >>>Comments
> >>>'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> >>> conflict!
> >>>'|'   00-7F  linux/media.h
> >>>0x80  00-1F  linux/fb.h
> >>> +0x81  00-1F  linux/vduse.h
> >>>0x89  00-06  arch/x86/include/asm/sockios.h
> >>>0x89  0B-DF  linux/sockios.h
> >>>0x89  E0-EF  linux/sockios.h 
> >>> SIOCPROTOPRIVATE range
> >>> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> >>> index a503c1b2bfd9..6e23bce6433a 100644
> >>> --- a/drivers/vdpa/Kconfig
> >>> +++ b/drivers/vdpa/Kconfig
> >>> @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
> >>>  vDPA block device simulator which terminates IO request in a
> >>>  memory buffer.
> >>>
> >>> +config VDPA_USER
> >>> + tristate "VDUSE (vDPA Device in Userspace) support"
> >>> + depends on EVENTFD && MMU && HAS_DMA
> >>> + select DMA_OPS
> >>> + select VHOST_IOTLB
> >>> + select IOMMU_IOVA
> >>> + help
> >>> +   With VDUSE it is possible to emulate a vDPA Device
> >>> +   in a userspace program.
> >>> +
> >>>config IFCVF
> >>>tristate "Intel IFC VF vDPA driver"
> >>>depends on PCI_MSI
> >>> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> >>> index 67fe7f3d6943..f02ebed33f19 100644
> >>> --- a/drivers/vdpa/Makefile
> >>> +++ b/drivers/vdpa/Makefile
> >>> @@ -1,6 +1,7 @@
> >>># SPDX-License-Identifier: GPL-2.0
> >>>obj-$(CONFIG_VDPA) += vdpa.o
> >>>obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> >>> +obj-$(CONFIG_VDPA_USER) += vdpa_user/
> >>>obj-$(CONFIG_IFCVF)+= ifcvf/
> >>>obj-$(CONFIG_MLX5_VDPA) += mlx5/
> >>>obj-$(CONFIG_VP_VDPA)+= virtio_pci/
> >>> diff --git a/drivers/vdpa/vdpa_user/Makefile 
> >>> b/drivers/vdpa/vdpa_user/Makefile
> >>> new file mode 100644
> >>> index ..260e0b26af99
> >>> --- /dev/null
> >>> +++ b/drivers/vdpa/vdpa_user/Makefile
> >>> @@ -0,0 +1,5 @@
> >>> +# SPDX-License-Identifier: GPL-2.0
> >>> +
> >>> +vduse-y := vduse_dev.o iova_domain.o
> >>> +
> >>> +obj-$(CONFIG_VDPA_USER) += vduse.o
> >>> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> >>> b/drivers/vdpa/vdpa_user/vduse_dev.c
> >>> new file mode 100644
> >>> index ..5271cbd15e28
> >>> --- /dev/null
> >>> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> >>> @@ -0,0 +1,1453 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * VDUSE: vDPA Device in Userspace
> >>> + *
> >>> + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All 
> >>> rights reserved.
> >>> + *
> >>> + * Author: Xie Yongji 
> >>> + *
> >>> + */
> >>> +
> >>> 

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Jason Wang


在 2021/6/21 下午6:41, Yongji Xie 写道:

On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:


在 2021/6/15 下午10:13, Xie Yongji 写道:

This VDUSE driver enables implementing vDPA devices in userspace.
The vDPA device's control path is handled in kernel and the data
path is handled in userspace.

A message mechnism is used by VDUSE driver to forward some control
messages such as starting/stopping datapath to userspace. Userspace
can use read()/write() to receive/reply those control messages.

And some ioctls are introduced to help userspace to implement the
data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
descriptors referring to vDPA device's iova regions. Then userspace
can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
ioctls can be used to inject interrupt and setup the kickfd for
virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
configuration space and inject a config interrupt.

Signed-off-by: Xie Yongji 
---
   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
   drivers/vdpa/Kconfig   |   10 +
   drivers/vdpa/Makefile  |1 +
   drivers/vdpa/vdpa_user/Makefile|5 +
   drivers/vdpa/vdpa_user/vduse_dev.c | 1453 

   include/uapi/linux/vduse.h |  143 ++
   6 files changed, 1613 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/Makefile
   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
   create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b510c64..acd95e9dcfe7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
conflict!
   '|'   00-7F  linux/media.h
   0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
   0x89  00-06  arch/x86/include/asm/sockios.h
   0x89  0B-DF  linux/sockios.h
   0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
 vDPA block device simulator which terminates IO request in a
 memory buffer.

+config VDPA_USER
+ tristate "VDUSE (vDPA Device in Userspace) support"
+ depends on EVENTFD && MMU && HAS_DMA
+ select DMA_OPS
+ select VHOST_IOTLB
+ select IOMMU_IOVA
+ help
+   With VDUSE it is possible to emulate a vDPA Device
+   in a userspace program.
+
   config IFCVF
   tristate "Intel IFC VF vDPA driver"
   depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
   # SPDX-License-Identifier: GPL-2.0
   obj-$(CONFIG_VDPA) += vdpa.o
   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
   obj-$(CONFIG_IFCVF)+= ifcvf/
   obj-$(CONFIG_MLX5_VDPA) += mlx5/
   obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..5271cbd15e28
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1453 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+ u16 index;
+ u32 num;
+ u32 avail_idx;
+ u64 desc_addr;
+ u64 driver_addr;
+ u64 device_addr;
+ bool ready;
+ bool kicked;
+ spinlock_t 

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Yongji Xie
On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:
>
>
> 在 2021/6/15 下午10:13, Xie Yongji 写道:
> > This VDUSE driver enables implementing vDPA devices in userspace.
> > The vDPA device's control path is handled in kernel and the data
> > path is handled in userspace.
> >
> > A message mechnism is used by VDUSE driver to forward some control
> > messages such as starting/stopping datapath to userspace. Userspace
> > can use read()/write() to receive/reply those control messages.
> >
> > And some ioctls are introduced to help userspace to implement the
> > data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
> > descriptors referring to vDPA device's iova regions. Then userspace
> > can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
> > and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
> > and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
> > ioctls can be used to inject interrupt and setup the kickfd for
> > virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
> > configuration space and inject a config interrupt.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
> >   drivers/vdpa/Kconfig   |   10 +
> >   drivers/vdpa/Makefile  |1 +
> >   drivers/vdpa/vdpa_user/Makefile|5 +
> >   drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
> > 
> >   include/uapi/linux/vduse.h |  143 ++
> >   6 files changed, 1613 insertions(+)
> >   create mode 100644 drivers/vdpa/vdpa_user/Makefile
> >   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
> >   create mode 100644 include/uapi/linux/vduse.h
> >
> > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> > b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > index 9bfc2b510c64..acd95e9dcfe7 100644
> > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > @@ -300,6 +300,7 @@ Code  Seq#Include File  
> >  Comments
> >   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> > conflict!
> >   '|'   00-7F  linux/media.h
> >   0x80  00-1F  linux/fb.h
> > +0x81  00-1F  linux/vduse.h
> >   0x89  00-06  arch/x86/include/asm/sockios.h
> >   0x89  0B-DF  linux/sockios.h
> >   0x89  E0-EF  linux/sockios.h 
> > SIOCPROTOPRIVATE range
> > diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> > index a503c1b2bfd9..6e23bce6433a 100644
> > --- a/drivers/vdpa/Kconfig
> > +++ b/drivers/vdpa/Kconfig
> > @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
> > vDPA block device simulator which terminates IO request in a
> > memory buffer.
> >
> > +config VDPA_USER
> > + tristate "VDUSE (vDPA Device in Userspace) support"
> > + depends on EVENTFD && MMU && HAS_DMA
> > + select DMA_OPS
> > + select VHOST_IOTLB
> > + select IOMMU_IOVA
> > + help
> > +   With VDUSE it is possible to emulate a vDPA Device
> > +   in a userspace program.
> > +
> >   config IFCVF
> >   tristate "Intel IFC VF vDPA driver"
> >   depends on PCI_MSI
> > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> > index 67fe7f3d6943..f02ebed33f19 100644
> > --- a/drivers/vdpa/Makefile
> > +++ b/drivers/vdpa/Makefile
> > @@ -1,6 +1,7 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   obj-$(CONFIG_VDPA) += vdpa.o
> >   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> > +obj-$(CONFIG_VDPA_USER) += vdpa_user/
> >   obj-$(CONFIG_IFCVF)+= ifcvf/
> >   obj-$(CONFIG_MLX5_VDPA) += mlx5/
> >   obj-$(CONFIG_VP_VDPA)+= virtio_pci/
> > diff --git a/drivers/vdpa/vdpa_user/Makefile 
> > b/drivers/vdpa/vdpa_user/Makefile
> > new file mode 100644
> > index ..260e0b26af99
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +vduse-y := vduse_dev.o iova_domain.o
> > +
> > +obj-$(CONFIG_VDPA_USER) += vduse.o
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > new file mode 100644
> > index ..5271cbd15e28
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -0,0 +1,1453 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * VDUSE: vDPA Device in Userspace
> > + *
> > + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All 
> > rights reserved.
> > + *
> > + * Author: Xie Yongji 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "iova_domain.h"
> > +
> > +#define DRV_AUTHOR   

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Jason Wang


在 2021/6/15 下午10:13, Xie Yongji 写道:

This VDUSE driver enables implementing vDPA devices in userspace.
The vDPA device's control path is handled in kernel and the data
path is handled in userspace.

A message mechnism is used by VDUSE driver to forward some control
messages such as starting/stopping datapath to userspace. Userspace
can use read()/write() to receive/reply those control messages.

And some ioctls are introduced to help userspace to implement the
data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
descriptors referring to vDPA device's iova regions. Then userspace
can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
ioctls can be used to inject interrupt and setup the kickfd for
virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
configuration space and inject a config interrupt.

Signed-off-by: Xie Yongji 
---
  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
  drivers/vdpa/Kconfig   |   10 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/vdpa_user/Makefile|5 +
  drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
  include/uapi/linux/vduse.h |  143 ++
  6 files changed, 1613 insertions(+)
  create mode 100644 drivers/vdpa/vdpa_user/Makefile
  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
  create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b510c64..acd95e9dcfe7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
  'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
  '|'   00-7F  linux/media.h
  0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
  0x89  00-06  arch/x86/include/asm/sockios.h
  0x89  0B-DF  linux/sockios.h
  0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
  
+config VDPA_USER

+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
  config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  obj-$(CONFIG_VDPA) += vdpa.o
  obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
  obj-$(CONFIG_IFCVF)+= ifcvf/
  obj-$(CONFIG_MLX5_VDPA) += mlx5/
  obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..5271cbd15e28
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1453 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u32 num;
+   u32 avail_idx;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   struct eventfd_ctx *kickfd;
+