[PATCH] drm/virtio: kick vq outside of the vq lock

2019-07-10 Thread Chia-I Wu
Replace virtqueue_kick by virtqueue_kick_prepare, which requires
serialization, and virtqueue_notify, which does not.  Repurpose the
return values to indicate whether the vq should be notified.

This fixes a bad spinlock contention when the host is qemu.  When
the guest calls virtqueue_notify, the qemu vcpu thread exits the
guest and waits for the qemu iothread to perform the MMIO.  If the
qemu iothread is still processing the prior buffer, and if the prior
buffer is cheap to GPU, the iothread will go ahead and generate an
IRQ.  A worker thread in the guest might start running
virtio_gpu_dequeue_ctrl_func.  If virtqueue_notify was called with
the vq lock held, the worker thread would have to busy wait inside
virtio_gpu_dequeue_ctrl_func.

v2: fix scrambled commit message

Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 6c1a90717535..e96f88fe5c83 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -291,11 +291,9 @@ static int virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   ret = virtqueue_kick_prepare(vq);
}
 
-   if (!ret)
-   ret = vq->num_free;
return ret;
 }
 
@@ -307,6 +305,10 @@ static int virtio_gpu_queue_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
spin_lock(&vgdev->ctrlq.qlock);
rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(&vgdev->ctrlq.qlock);
+
+   if (rc > 0)
+   virtqueue_notify(vgdev->ctrlq.vq);
+
return rc;
 }
 
@@ -339,6 +341,10 @@ static int virtio_gpu_queue_fenced_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
virtio_gpu_fence_emit(vgdev, hdr, fence);
rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(&vgdev->ctrlq.qlock);
+
+   if (rc > 0)
+   virtqueue_notify(vgdev->ctrlq.vq);
+
return rc;
 }
 
@@ -369,13 +375,14 @@ static int virtio_gpu_queue_cursor(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   ret = virtqueue_kick_prepare(vq);
}
 
spin_unlock(&vgdev->cursorq.qlock);
 
-   if (!ret)
-   ret = vq->num_free;
+   if (ret > 0)
+   virtqueue_notify(vq);
+
return ret;
 }
 
-- 
2.22.0.410.gd8fdbe21b5-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 08/18] drm/virtio: rework virtio_gpu_execbuffer_ioctl fencing

2019-07-10 Thread Chia-I Wu
On Thu, Jul 4, 2019 at 11:46 AM Chia-I Wu  wrote:
>
> On Thu, Jul 4, 2019 at 4:25 AM Gerd Hoffmann  wrote:
> >
> >   Hi,
> >
> > > > if (fence)
> > > > virtio_gpu_fence_emit(vgdev, hdr, fence);
> > > > +   if (vbuf->objs) {
> > > > +   virtio_gpu_array_add_fence(vbuf->objs, &fence->f);
> > > > +   virtio_gpu_array_unlock_resv(vbuf->objs);
> > > > +   }
> > > This is with the spinlock held.  Maybe we should move the
> > > virtio_gpu_array_unlock_resv call out of the critical section.
> >
> > That would bring back the race ...
> Right...
> >
> > > I am actually more concerned about virtio_gpu_array_add_fence, but it
> > > is also harder to move.  Should we add a kref to the object array?
> >
> > Yep, refcounting would be the other way to fix the race.
> >
> > > This bothers me because I recently ran into a CPU-bound game with very
> > > bad lock contention here.
> >
> > Hmm.  Any clue where this comes from?  Multiple threads competing for
> > virtio buffers I guess?  Maybe we should have larger virtqueues?
> The game was single-threaded.  I guess it was the game and Xorg
> competing for virtio buffers.  That was also on an older kernel
> without explicit fences.  The userspace had to create dummy resources
> frequently to do VIRTIO_IOCTL_RESOURCE_WAIT.
>
> I think this is fine for now as far as I am concerned.  I can look
> into this more closely after this series lands.
It was virtio_gpu_dequeue_ctrl_func who wanted to grab the lock to
handle the responses.  I sent a patch for it

  https://patchwork.freedesktop.org/series/63529/

>
>
> >
> > cheers,
> >   Gerd
> >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] drm/virtio: kick vq outside of the vq lock

2019-07-10 Thread Chia-I Wu
Replace virtqueue_kick by virtqueue_kick_prepare, which requires
serialization, and virtqueue_notify, which does not.  Repurpose the
return values to indicate whether the vq should be notified.

This fixes a lock contention with qemu host.  When the guest calls
vibad rtqueue_notify, the qemu vcpu thread exits the guest and waits
for the qemu iothread to perform the MMIO.  If the qemu iothread is
still processing the prior buffer, and if the prior buffer is cheap
to GPU, the iothread will go ahead and generate an IRQ for the
guest.  A worker thread in the guest will call
virtio_gpu_dequeue_ctrl_func.  If virtqueue_notify was called with
the vq lock held, the worker thread would busy wait inside
virtio_gpu_dequeue_ctrl_func.

Signed-off-by: Chia-I Wu 
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 6c1a90717535..e96f88fe5c83 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -291,11 +291,9 @@ static int virtio_gpu_queue_ctrl_buffer_locked(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   ret = virtqueue_kick_prepare(vq);
}
 
-   if (!ret)
-   ret = vq->num_free;
return ret;
 }
 
@@ -307,6 +305,10 @@ static int virtio_gpu_queue_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
spin_lock(&vgdev->ctrlq.qlock);
rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(&vgdev->ctrlq.qlock);
+
+   if (rc > 0)
+   virtqueue_notify(vgdev->ctrlq.vq);
+
return rc;
 }
 
@@ -339,6 +341,10 @@ static int virtio_gpu_queue_fenced_ctrl_buffer(struct 
virtio_gpu_device *vgdev,
virtio_gpu_fence_emit(vgdev, hdr, fence);
rc = virtio_gpu_queue_ctrl_buffer_locked(vgdev, vbuf);
spin_unlock(&vgdev->ctrlq.qlock);
+
+   if (rc > 0)
+   virtqueue_notify(vgdev->ctrlq.vq);
+
return rc;
 }
 
@@ -369,13 +375,14 @@ static int virtio_gpu_queue_cursor(struct 
virtio_gpu_device *vgdev,
trace_virtio_gpu_cmd_queue(vq,
(struct virtio_gpu_ctrl_hdr *)vbuf->buf);
 
-   virtqueue_kick(vq);
+   ret = virtqueue_kick_prepare(vq);
}
 
spin_unlock(&vgdev->cursorq.qlock);
 
-   if (!ret)
-   ret = vq->num_free;
+   if (ret > 0)
+   virtqueue_notify(vq);
+
return ret;
 }
 
-- 
2.22.0.410.gd8fdbe21b5-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] virtio_pmem: fix sparse warning

2019-07-10 Thread Pankaj Gupta
This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment 
(different base types)
nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2 
(different base types)
nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret

Reported-by: kbuild test robot 
Signed-off-by: Pankaj Gupta 
---

This fixes a warning, so submitting it as a separate
patch on top of virtio pmem series.

 include/uapi/linux/virtio_pmem.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
index efcd72f2d20d..7a7435281362 100644
--- a/include/uapi/linux/virtio_pmem.h
+++ b/include/uapi/linux/virtio_pmem.h
@@ -10,7 +10,7 @@
 #ifndef _UAPI_LINUX_VIRTIO_PMEM_H
 #define _UAPI_LINUX_VIRTIO_PMEM_H
 
-#include 
+#include 
 #include 
 #include 
 
@@ -23,12 +23,12 @@ struct virtio_pmem_config {
 
 struct virtio_pmem_resp {
/* Host return status corresponding to flush request */
-   __u32 ret;
+   __virtio32 ret;
 };
 
 struct virtio_pmem_req {
/* command type */
-   __u32 type;
+   __virtio32 type;
 };
 
 #endif
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock

2019-07-10 Thread Stefano Garzarella
Hi,
as Jason suggested some months ago, I looked better at the virtio-net driver to
understand if we can reuse some parts also in the virtio-vsock driver, since we
have similar challenges (mergeable buffers, page allocation, small
packets, etc.).

Initially, I would add the skbuff in the virtio-vsock in order to re-use
receive_*() functions.
Then I would move receive_[small, big, mergeable]() and
add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
XDP part on the virtio-net driver), but I think it is feasible.

The idea is to create a virtio-skb.[h,c] where put these functions and a new
object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
some fields of struct receive_queue). This is an idea of virtio-skb.h that
I have in mind:
struct virtskb;

struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);

int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);

For the Guest->Host path it should be easier, so maybe I can add a
"virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
of xmit_skb().

Let me know if you have in mind better names or if I should put these function
in another place.

I would like to leave the control part completely separate, so, for example,
the two drivers will negotiate the features independently and they will call
the right virtskb_receive_*() function based on the negotiation.

I already started to work on it, but before to do more steps and send an RFC
patch, I would like to hear your opinion.
Do you think that makes sense?
Do you see any issue or a better solution?

Thanks in advance,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/6] drm/fb-helper: Instanciate shadow FB if configured in device's mode_config

2019-07-10 Thread Daniel Vetter
On Sun, Jul 07, 2019 at 06:14:50PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 07.07.19 um 16:37 schrieb Noralf Trønnes:
> > 
> > 
> > Den 05.07.2019 11.26, skrev Thomas Zimmermann:
> >> Generic framebuffer emulation uses a shadow buffer for framebuffers with
> >> dirty() function. If drivers want to use the shadow FB without such a
> >> function, they can now set prefer_shadow or prefer_shadow_fbdev in their
> >> mode_config structures. The former flag is exported to userspace, the 
> >> latter
> >> flag is fbdev-only.
> >>
> >> Signed-off-by: Thomas Zimmermann 
> >> ---
> >>  drivers/gpu/drm/drm_fb_helper.c | 19 ++-
> >>  include/drm/drm_mode_config.h   |  5 +
> >>  2 files changed, 19 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_fb_helper.c 
> >> b/drivers/gpu/drm/drm_fb_helper.c
> >> index 7ba6a0255821..56ef169e1814 100644
> >> --- a/drivers/gpu/drm/drm_fb_helper.c
> >> +++ b/drivers/gpu/drm/drm_fb_helper.c
> >> @@ -421,7 +421,9 @@ static void drm_fb_helper_dirty_work(struct 
> >> work_struct *work)
> >>return;
> >>drm_fb_helper_dirty_blit_real(helper, &clip_copy);
> >>}
> >> -  helper->fb->funcs->dirty(helper->fb, NULL, 0, 0, &clip_copy, 1);
> >> +  if (helper->fb->funcs->dirty)
> >> +  helper->fb->funcs->dirty(helper->fb, NULL, 0, 0,
> >> +   &clip_copy, 1);
> >>  
> >>if (helper->buffer)
> >>drm_client_buffer_vunmap(helper->buffer);
> >> @@ -620,9 +622,6 @@ static void drm_fb_helper_dirty(struct fb_info *info, 
> >> u32 x, u32 y,
> >>struct drm_clip_rect *clip = &helper->dirty_clip;
> >>unsigned long flags;
> >>  
> >> -  if (!helper->fb->funcs->dirty)
> >> -  return;
> > 
> > drm_fb_helper_dirty() is called unconditionally by
> > drm_fb_helper_sys_imageblit() et al, so we need check with
> > drm_fbdev_use_shadow_fb() here.
> > 
> >> -
> >>spin_lock_irqsave(&helper->dirty_lock, flags);
> >>clip->x1 = min_t(u32, clip->x1, x);
> >>clip->y1 = min_t(u32, clip->y1, y);
> >> @@ -2166,6 +2165,16 @@ static struct fb_deferred_io drm_fbdev_defio = {
> >>.deferred_io= drm_fb_helper_deferred_io,
> >>  };
> >>  
> >> +static bool drm_fbdev_use_shadow_fb(struct drm_fb_helper *fb_helper)
> >> +{
> >> +  struct drm_device *dev = fb_helper->dev;
> >> +  struct drm_framebuffer *fb = fb_helper->fb;
> >> +
> >> +  return dev->mode_config.prefer_shadow_fbdev |
> >> + dev->mode_config.prefer_shadow |
> > 
> > Use logical OR here
> > 
> >> + !!fb->funcs->dirty;
> > 
> > and you can drop the the double NOT here.
> > 
> >> +}
> >> +
> >>  /**
> >>   * drm_fb_helper_generic_probe - Generic fbdev emulation probe helper
> >>   * @fb_helper: fbdev helper structure
> >> @@ -2213,7 +,7 @@ int drm_fb_helper_generic_probe(struct drm_fb_helper 
> >> *fb_helper,
> >>  
> >>drm_fb_helper_fill_info(fbi, fb_helper, sizes);
> >>  
> >> -  if (fb->funcs->dirty) {
> >> +  if (drm_fbdev_use_shadow_fb(fb_helper)) {
> >>struct fb_ops *fbops;
> >>void *shadow;
> >>  
> >> diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> >> index 759d462d028b..e1c751aca353 100644
> >> --- a/include/drm/drm_mode_config.h
> >> +++ b/include/drm/drm_mode_config.h
> >> @@ -347,6 +347,8 @@ struct drm_mode_config_funcs {
> >>   * @output_poll_work: delayed work for polling in process context
> >>   * @preferred_depth: preferred RBG pixel depth, used by fb helpers
> >>   * @prefer_shadow: hint to userspace to prefer shadow-fb rendering
> >> + * @prefer_shadow_fbdev: hint to framebuffer emulation to prefer 
> >> shadow-fb \
> >> +  rendering
> > 
> > It's preferred to have the doc together with the struct member.
> 
> I just tried to follow the file's existing style, but OK, I don't mind.

If it annoys you too much, convert all the member docs in the header
comment into the inline style. That's what I usually do when the first
inline kerneldoc comment is warranted (like here), but not yet rolled out.

Or just mix&match if you feel like not doing that, we have lots of that
already :-)
-Daniel

> 
> > it's less likely to be forgotten when things change. And we don't use
> > line cont. when the doc line is too long. Just continue on the next line
> > after an asterix.
> > 
> >>   * @cursor_width: hint to userspace for max cursor width
> >>   * @cursor_height: hint to userspace for max cursor height
> >>   * @helper_private: mid-layer private data
> >> @@ -852,6 +854,9 @@ struct drm_mode_config {
> >>/* dumb ioctl parameters */
> >>uint32_t preferred_depth, prefer_shadow;
> >>  
> >> +  /* fbdev parameters */
> > 
> > No need for this comment.
> > 
> > Doc can look like this, I've done s/framebuffer/fbdev/:
> > /**
> >  * @prefer_shadow_fbdev:
> >  *
> >  * Hint to fbdev emulation to prefer shadow-fb rendering.
> >  */
> > 
> 

[PATCH AUTOSEL 5.1 03/11] drm/virtio: move drm_connector_update_edid_property() call

2019-07-10 Thread Sasha Levin
From: Gerd Hoffmann 

[ Upstream commit 41de4be6f6efa4132b29af51158cd672d93f2543 ]

drm_connector_update_edid_property can sleep, we must not
call it while holding a spinlock.  Move the callsite.

Fixes: b4b01b4995fb ("drm/virtio: add edid support")
Reported-by: Max Filippov 
Signed-off-by: Gerd Hoffmann 
Tested-by: Max Filippov 
Tested-by: Cornelia Huck 
Acked-by: Cornelia Huck 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20190405044602.2334-1-kra...@redhat.com
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 6bc2008b0d0d..3ef24f89ef93 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -620,11 +620,11 @@ static void virtio_gpu_cmd_get_edid_cb(struct 
virtio_gpu_device *vgdev,
output = vgdev->outputs + scanout;
 
new_edid = drm_do_get_edid(&output->conn, virtio_get_edid_block, resp);
+   drm_connector_update_edid_property(&output->conn, new_edid);
 
spin_lock(&vgdev->display_info_lock);
old_edid = output->edid;
output->edid = new_edid;
-   drm_connector_update_edid_property(&output->conn, output->edid);
spin_unlock(&vgdev->display_info_lock);
 
kfree(old_edid);
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio_pmem: fix sparse warning

2019-07-10 Thread Pankaj Gupta
This patch fixes below sparse warning related to __virtio 
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment (different base types)
nd_virtio.c:56:28:expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2 (different base types)
nd_virtio.c:93:59:expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:got unsigned int [unsigned] [usertype] ret

Signed-off-by: Pankaj Gupta 
Reported-by: kbuild test robot 
---

This fixes a warning, so submitting it as a separate
patch on top of virtio pmem series. 
 
 include/uapi/linux/virtio_pmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
index efcd72f2d20d..f89129bf1f84 100644
--- a/include/uapi/linux/virtio_pmem.h
+++ b/include/uapi/linux/virtio_pmem.h
@@ -23,12 +23,12 @@ struct virtio_pmem_config {
 
 struct virtio_pmem_resp {
/* Host return status corresponding to flush request */
-   __u32 ret;
+   __virtio32 ret;
 };
 
 struct virtio_pmem_req {
/* command type */
-   __u32 type;
+   __virtio32 type;
 };
 
 #endif
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-10 Thread Jason Wang


On 2019/7/10 下午2:22, Tiwei Bie wrote:

On Wed, Jul 10, 2019 at 10:26:10AM +0800, Jason Wang wrote:

On 2019/7/9 下午2:33, Tiwei Bie wrote:

On Tue, Jul 09, 2019 at 10:50:38AM +0800, Jason Wang wrote:

On 2019/7/8 下午2:16, Tiwei Bie wrote:

On Fri, Jul 05, 2019 at 08:49:46AM -0600, Alex Williamson wrote:

On Thu, 4 Jul 2019 14:21:34 +0800
Tiwei Bie  wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

VFIO really can't prevent vendor specific ioctls on the device file
descriptor for mdevs, but a) we'd want to be sure the ioctl address
space can't collide with ioctls we'd use for vfio defined purposes and
b) maybe the VFIO user API isn't what you want in the first place if
you intend to mostly/entirely ignore the defined ioctl set and replace
them with your own.  In the case of the latter, you're also not getting
the advantages of the existing VFIO userspace code, so why expose a
VFIO device at all.

Yeah, I totally agree.

I guess the original idea is to reuse the VFIO DMA/IOMMU API for this. Then
we have the chance to reuse vfio codes in qemu for dealing with e.g vIOMMU.

Yeah, you are right. We have several choices here:

#1. We expose a VFIO device, so we can reuse the VFIO container/group
  based DMA API and potentially reuse a lot of VFIO code in QEMU.

  But in this case, we have two choices for the VFIO device interface
  (i.e. the interface on top of VFIO device fd):

  A) we may invent a new vhost protocol (as demonstrated by the code
 in this RFC) on VFIO device fd to make it work in VFIO's way,