Re: [RFC PATCH] drm/virtio: Export resource handles via DMA-buf API
Hi, > up later when given a buffer index. But we would still need to make > the DMA-buf itself importable. For virtio-gpu I guess that would mean > returning an sg_table backed by the shadow buffer pages. The virtio-gpu driver in drm-misc-next supports dma-buf exports. cheers, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 1/7] mdev: class id support
On 2019/10/16 上午12:38, Alex Williamson wrote: On Fri, 11 Oct 2019 16:15:51 +0800 Jason Wang wrote: diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index b558d4cfd082..724e9b9841d8 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data) } EXPORT_SYMBOL(mdev_set_drvdata); +void mdev_set_class(struct mdev_device *mdev, u16 id) +{ + mdev->class_id = id; +} +EXPORT_SYMBOL(mdev_set_class); + struct device *mdev_dev(struct mdev_device *mdev) { return &mdev->dev; @@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void *data) * mdev_register_device : Register a device * @dev: device structure representing parent device. * @ops: Parent device operation structure to be registered. + * @id: class id. * * Add device to list of registered parent devices. * Returns a negative value on error, otherwise 0. @@ -324,6 +331,9 @@ int mdev_device_create(struct kobject *kobj, if (ret) goto ops_create_fail; + if (!mdev->class_id) This is a sanity test failure of the parent driver on a privileged path, I think it's fair to print a warning when this occurs rather than only return an errno to the user. In fact, ret is not set to an error value here, so it looks like this fails to create the device but returns success. Thanks, Alex Will fix. Thanks + goto class_id_fail; + ret = device_add(&mdev->dev); if (ret) goto add_fail; @@ -340,6 +350,7 @@ int mdev_device_create(struct kobject *kobj, sysfs_fail: device_del(&mdev->dev); +class_id_fail: add_fail: parent->ops->remove(mdev); ops_create_fail: ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct
On 2019/10/16 上午4:20, Michael S. Tsirkin wrote: On Mon, Oct 14, 2019 at 09:43:25AM +0800, Jason Wang wrote: On 2019/10/13 上午4:27, Michael S. Tsirkin wrote: On Sat, Oct 12, 2019 at 03:28:49PM +0800, Jason Wang wrote: On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: The idea is to support multiple ring formats by converting to a format-independent array of descriptors. This costs extra cycles, but we gain in ability to fetch a batch of descriptors in one go, which is good for code cache locality. To simplify benchmarking, I kept the old code around so one can switch back and forth by writing into a module parameter. This will go away in the final submission. This patch causes a minor performance degradation, it's been kept as simple as possible for ease of review. Next patch gets us back the performance by adding batching. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/test.c | 17 ++- drivers/vhost/vhost.c | 299 +- drivers/vhost/vhost.h | 16 +++ 3 files changed, 327 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 056308008288..39a018a7af2d 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -18,6 +18,9 @@ #include "test.h" #include "vhost.h" +static int newcode = 0; +module_param(newcode, int, 0644); + /* Max number of bytes transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others. */ #define VHOST_TEST_WEIGHT 0x8 @@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n) vhost_disable_notify(&n->dev, vq); for (;;) { - head = vhost_get_vq_desc(vq, vq->iov, -ARRAY_SIZE(vq->iov), -&out, &in, -NULL, NULL); + if (newcode) + head = vhost_get_vq_desc_batch(vq, vq->iov, + ARRAY_SIZE(vq->iov), + &out, &in, + NULL, NULL); + else + head = vhost_get_vq_desc(vq, vq->iov, +ARRAY_SIZE(vq->iov), +&out, &in, +NULL, NULL); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 36ca2cf419bf..36661d6cb51f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { vq->num = 1; + vq->ndescs = 0; vq->desc = NULL; vq->avail = NULL; vq->used = NULL; @@ -369,6 +370,9 @@ static int vhost_worker(void *data) static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) { + kfree(vq->descs); + vq->descs = NULL; + vq->max_descs = 0; kfree(vq->indirect); vq->indirect = NULL; kfree(vq->log); @@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; + vq->max_descs = dev->iov_limit; + vq->descs = kmalloc_array(vq->max_descs, + sizeof(*vq->descs), + GFP_KERNEL); Is iov_limit too much here? It can obviously increase the footprint. I guess the batching can only be done for descriptor without indirect or next set. Then we may batch 16 or 64. Thanks Yes, next patch only batches up to 64. But we do need iov_limit because guest can pass a long chain of scatter/gather. We already have iovecs in a huge array so this does not look like a big deal. If we ever teach the code to avoid the huge iov arrays by handling huge s/g lists piece by piece, we can make the desc array smaller at the same point. Another possible issue, if we try to batch descriptor chain when we've already batched some descriptors, we may reach the limit then some of the descriptors might need re-read. Or we may need circular index (head, tail) in this case? Thanks We never supported more than IOV_MAX descriptors. And we don't batch more than iov_limit - IOV_MAX. Ok, but what happens when we've already batched 63 descriptors then come a 3 descriptor chain? And it looks to me we need forget the cached descriptor during set_vring_base() Thanks so buffer never overflows. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'
From: Stefano Garzarella Date: Tue, 15 Oct 2019 17:00:51 +0200 > The 'work' field was introduced with commit 06a8fc78367d0 > ("VSOCK: Introduce virtio_vsock_common.ko") > but it is never used in the code, so we can remove it to save > memory allocated in the per-packet 'struct virtio_vsock_pkt' > > Suggested-by: Michael S. Tsirkin > Signed-off-by: Stefano Garzarella Michael, will you take this? I assume so... Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 1/2] vhost: IFC VF hardware operation layer
On Wed, 16 Oct 2019 09:03:17 +0800 Zhu Lingshan wrote: > + IFC_INFO(&dev->dev, "PCI capability mapping:\n" > + "common cfg: %p\n" > + "notify base: %p\n" > + "isr cfg: %p\n" > + "device cfg: %p\n" > + "multiplier: %u\n", > + hw->common_cfg, > + hw->notify_base, > + hw->isr, > + hw->dev_cfg, > + hw->notify_off_multiplier); Since kernel messages go to syslog, syslog does not handle multi-line messages very well. This should be a single line. Also, this is the kind of message that should be at the debug level; something that is useful to the driver developers but not something that needs to be filling up every end users log. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 1/2] vhost: IFC VF hardware operation layer
On Wed, 16 Oct 2019 09:03:17 +0800 Zhu Lingshan wrote: > +int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev) > +{ > + int ret; > + u8 pos; > + struct virtio_pci_cap cap; > + u32 i; > + u16 notify_off; For network code, the preferred declaration style is reverse christmas tree. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PULL] vhost: cleanups and fixes
The pull request you sent on Tue, 15 Oct 2019 17:19:08 -0400: > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/3b1f00aceb7a67bf079a5a64aa5c6baf78a8f442 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PULL] vhost: cleanups and fixes
The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce: Linux 5.4-rc2 (2019-10-06 14:27:30 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 245cdd9fbd396483d501db83047116e2530f245f: vhost/test: stop device before reset (2019-10-13 09:38:27 -0400) virtio: fixes Some minor bugfixes Signed-off-by: Michael S. Tsirkin Michael S. Tsirkin (3): tools/virtio: more stubs tools/virtio: xen stub vhost/test: stop device before reset drivers/vhost/test.c | 2 ++ tools/virtio/crypto/hash.h | 0 tools/virtio/linux/dma-mapping.h | 2 ++ tools/virtio/xen/xen.h | 6 ++ 4 files changed, 10 insertions(+) create mode 100644 tools/virtio/crypto/hash.h create mode 100644 tools/virtio/xen/xen.h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct
On Mon, Oct 14, 2019 at 09:43:25AM +0800, Jason Wang wrote: > > On 2019/10/13 上午4:27, Michael S. Tsirkin wrote: > > On Sat, Oct 12, 2019 at 03:28:49PM +0800, Jason Wang wrote: > > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > > > The idea is to support multiple ring formats by converting > > > > to a format-independent array of descriptors. > > > > > > > > This costs extra cycles, but we gain in ability > > > > to fetch a batch of descriptors in one go, which > > > > is good for code cache locality. > > > > > > > > To simplify benchmarking, I kept the old code > > > > around so one can switch back and forth by > > > > writing into a module parameter. > > > > This will go away in the final submission. > > > > > > > > This patch causes a minor performance degradation, > > > > it's been kept as simple as possible for ease of review. > > > > Next patch gets us back the performance by adding batching. > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > >drivers/vhost/test.c | 17 ++- > > > >drivers/vhost/vhost.c | 299 > > > > +- > > > >drivers/vhost/vhost.h | 16 +++ > > > >3 files changed, 327 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > > > index 056308008288..39a018a7af2d 100644 > > > > --- a/drivers/vhost/test.c > > > > +++ b/drivers/vhost/test.c > > > > @@ -18,6 +18,9 @@ > > > >#include "test.h" > > > >#include "vhost.h" > > > > +static int newcode = 0; > > > > +module_param(newcode, int, 0644); > > > > + > > > >/* Max number of bytes transferred before requeueing the job. > > > > * Using this limit prevents one virtqueue from starving others. */ > > > >#define VHOST_TEST_WEIGHT 0x8 > > > > @@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n) > > > > vhost_disable_notify(&n->dev, vq); > > > > for (;;) { > > > > - head = vhost_get_vq_desc(vq, vq->iov, > > > > -ARRAY_SIZE(vq->iov), > > > > -&out, &in, > > > > -NULL, NULL); > > > > + if (newcode) > > > > + head = vhost_get_vq_desc_batch(vq, vq->iov, > > > > + > > > > ARRAY_SIZE(vq->iov), > > > > + &out, &in, > > > > + NULL, NULL); > > > > + else > > > > + head = vhost_get_vq_desc(vq, vq->iov, > > > > +ARRAY_SIZE(vq->iov), > > > > +&out, &in, > > > > +NULL, NULL); > > > > /* On error, stop handling until the next kick. */ > > > > if (unlikely(head < 0)) > > > > break; > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index 36ca2cf419bf..36661d6cb51f 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, > > > >struct vhost_virtqueue *vq) > > > >{ > > > > vq->num = 1; > > > > + vq->ndescs = 0; > > > > vq->desc = NULL; > > > > vq->avail = NULL; > > > > vq->used = NULL; > > > > @@ -369,6 +370,9 @@ static int vhost_worker(void *data) > > > >static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) > > > >{ > > > > + kfree(vq->descs); > > > > + vq->descs = NULL; > > > > + vq->max_descs = 0; > > > > kfree(vq->indirect); > > > > vq->indirect = NULL; > > > > kfree(vq->log); > > > > @@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct > > > > vhost_dev *dev) > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > vq = dev->vqs[i]; > > > > + vq->max_descs = dev->iov_limit; > > > > + vq->descs = kmalloc_array(vq->max_descs, > > > > + sizeof(*vq->descs), > > > > + GFP_KERNEL); > > > > > > Is iov_limit too much here? It can obviously increase the footprint. I > > > guess > > > the batching can only be done for descriptor without indirect or next set. > > > Then we may batch 16 or 64. > > > > > > Thanks > > Yes, next patch only batches up to 64. But we do need iov_limit because > > guest can pass a long chain of scatter/gather. > > We already have iovecs in a huge array so this does not look like > > a big deal. If we ever teach the code to avoid the huge > > iov arrays by handling huge s/g lists piece by piece, > > we can make the desc array smaller at the same point. > > > > Another possible issue, if we try to batch descriptor chain
[PATCH 4/5] virtiofs: Count pending forgets as in_flight forgets
If virtqueue is full, we put forget requests on a list and these forgets are dispatched later using a worker. As of now we don't count these forgets in fsvq->in_flight variable. This means when queue is being drained, we have to have special logic to first drain these pending requests and then wait for fsvq->in_flight to go to zero. By counting pending forgets in fsvq->in_flight, we can get rid of special logic and just wait for in_flight to go to zero. Worker thread will kick and drain all the forgets anyway, leading in_flight to zero. I also need similar logic for normal request queue in next patch where I am about to defer request submission in the worker context if queue is full. This simplifies the code a bit. Also add two helper functions to inc/dec in_flight. Decrement in_flight helper will later used to call completion when in_flight reaches zero. Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 44 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e0fcf3030951..625de45fa471 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -67,6 +67,19 @@ static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue *vq) return &vq_to_fsvq(vq)->fud->pq; } +/* Should be called with fsvq->lock held. */ +static inline void inc_in_flight_req(struct virtio_fs_vq *fsvq) +{ + fsvq->in_flight++; +} + +/* Should be called with fsvq->lock held. */ +static inline void dec_in_flight_req(struct virtio_fs_vq *fsvq) +{ + WARN_ON(fsvq->in_flight <= 0); + fsvq->in_flight--; +} + static void release_virtio_fs_obj(struct kref *ref) { struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount); @@ -110,22 +123,6 @@ static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq) flush_delayed_work(&fsvq->dispatch_work); } -static inline void drain_hiprio_queued_reqs(struct virtio_fs_vq *fsvq) -{ - struct virtio_fs_forget *forget; - - spin_lock(&fsvq->lock); - while (1) { - forget = list_first_entry_or_null(&fsvq->queued_reqs, - struct virtio_fs_forget, list); - if (!forget) - break; - list_del(&forget->list); - kfree(forget); - } - spin_unlock(&fsvq->lock); -} - static void virtio_fs_drain_all_queues(struct virtio_fs *fs) { struct virtio_fs_vq *fsvq; @@ -133,9 +130,6 @@ static void virtio_fs_drain_all_queues(struct virtio_fs *fs) for (i = 0; i < fs->nvqs; i++) { fsvq = &fs->vqs[i]; - if (i == VQ_HIPRIO) - drain_hiprio_queued_reqs(fsvq); - virtio_fs_drain_queue(fsvq); } } @@ -254,7 +248,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) while ((req = virtqueue_get_buf(vq, &len)) != NULL) { kfree(req); - fsvq->in_flight--; + dec_in_flight_req(fsvq); } } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); spin_unlock(&fsvq->lock); @@ -306,6 +300,7 @@ static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) list_del(&forget->list); if (!fsvq->connected) { + dec_in_flight_req(fsvq); spin_unlock(&fsvq->lock); kfree(forget); continue; @@ -327,13 +322,13 @@ static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) } else { pr_debug("virtio-fs: Could not queue FORGET: err=%d. Dropping it.\n", ret); + dec_in_flight_req(fsvq); kfree(forget); } spin_unlock(&fsvq->lock); return; } - fsvq->in_flight++; notify = virtqueue_kick_prepare(vq); spin_unlock(&fsvq->lock); @@ -472,7 +467,7 @@ static void virtio_fs_requests_done_work(struct work_struct *work) fuse_request_end(fc, req); spin_lock(&fsvq->lock); - fsvq->in_flight--; + dec_in_flight_req(fsvq); spin_unlock(&fsvq->lock); } } @@ -730,6 +725,7 @@ __releases(fiq->lock) list_add_tail(&forget->list, &fsvq->queued_reqs); schedule_delayed_work(&fsvq->dispatch_work, msecs_to_jiffies(1)); + inc_in_flight_req(fsvq); } else { pr_debug("virtio-fs: Could not queue FORGET: err=%d. Dropping it.\n", ret); @@ -739,7 +735,
[PATCH 3/5] virtiofs: Set FR_SENT flag only after request has been sent
FR_SENT flag should be set when request has been sent successfuly sent over virtqueue. This is used by interrupt logic to figure out if interrupt request should be sent or not. Also add it to fqp->processing list after sending it successfully. Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 3b7f7409e77b..e0fcf3030951 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -857,6 +857,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, unsigned int i; int ret; bool notify; + struct fuse_pqueue *fpq; /* Does the sglist fit on the stack? */ total_sgs = sg_count_fuse_req(req); @@ -911,6 +912,15 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, goto out; } + /* Request successfuly sent. */ + fpq = &fsvq->fud->pq; + spin_lock(&fpq->lock); + list_add_tail(&req->list, fpq->processing); + spin_unlock(&fpq->lock); + set_bit(FR_SENT, &req->flags); + /* matches barrier in request_wait_answer() */ + smp_mb__after_atomic(); + fsvq->in_flight++; notify = virtqueue_kick_prepare(vq); @@ -939,7 +949,6 @@ __releases(fiq->lock) struct virtio_fs *fs; struct fuse_conn *fc; struct fuse_req *req; - struct fuse_pqueue *fpq; struct virtio_fs_vq *fsvq; int ret; @@ -958,14 +967,6 @@ __releases(fiq->lock) req->in.h.nodeid, req->in.h.len, fuse_len_args(req->args->out_numargs, req->args->out_args)); - fpq = &fs->vqs[queue_id].fud->pq; - spin_lock(&fpq->lock); - list_add_tail(&req->list, fpq->processing); - spin_unlock(&fpq->lock); - set_bit(FR_SENT, &req->flags); - /* matches barrier in request_wait_answer() */ - smp_mb__after_atomic(); - retry: fsvq = &fs->vqs[queue_id]; ret = virtio_fs_enqueue_req(fsvq, req); @@ -978,10 +979,6 @@ __releases(fiq->lock) } req->out.h.error = ret; pr_err("virtio-fs: virtio_fs_enqueue_req() failed %d\n", ret); - spin_lock(&fpq->lock); - clear_bit(FR_SENT, &req->flags); - list_del_init(&req->list); - spin_unlock(&fpq->lock); /* Can't end request in submission context. Use a worker */ spin_lock(&fsvq->lock); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/5] virtiofs: Fix couple of deadlocks
Hi, We have couple of places which can result in deadlock. This patch series fixes these. We can be called with fc->bg_lock (for background requests) while submitting a request. This leads to two constraints. - We can't end requests in submitter's context and call fuse_end_request() as it tries to take fc->bg_lock as well. So queue these requests on a list and use a worker to end these requests. - If virtqueue is full, we can wait with fc->bg_lock held for queue to have space. Worker which is completing the request gets blocked on fc->bg_lock as well. And that means requests are not completing, that means descriptors are not being freed and that means submitter can't make progress. Deadlock. Fix this by punting the requests to a list and retry submission later with the help of a worker. Thanks Vivek Vivek Goyal (5): virtiofs: Do not end request in submission context virtiofs: No need to check fpq->connected state virtiofs: Set FR_SENT flag only after request has been sent virtiofs: Count pending forgets as in_flight forgets virtiofs: Retry request submission from worker context fs/fuse/virtio_fs.c | 165 +--- 1 file changed, 111 insertions(+), 54 deletions(-) -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/5] virtiofs: Do not end request in submission context
Submission context can hold some locks which end request code tries to hold again and deadlock can occur. For example, fc->bg_lock. If a background request is being submitted, it might hold fc->bg_lock and if we could not submit request (because device went away) and tried to end request, then deadlock happens. During testing, I also got a warning from deadlock detection code. So put requests on a list and end requests from a worker thread. I got following warning from deadlock detector. [ 603.137138] WARNING: possible recursive locking detected [ 603.137142] [ 603.137144] blogbench/2036 is trying to acquire lock: [ 603.137149] f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse] [ 603.140701] [ 603.140701] but task is already holding lock: [ 603.140703] f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse] [ 603.140713] [ 603.140713] other info that might help us debug this: [ 603.140714] Possible unsafe locking scenario: [ 603.140714] [ 603.140715]CPU0 [ 603.140716] [ 603.140716] lock(&(&fc->bg_lock)->rlock); [ 603.140718] lock(&(&fc->bg_lock)->rlock); [ 603.140719] [ 603.140719] *** DEADLOCK *** Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 38 ++ 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 6af3f131e468..24ac6f8bf3f7 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -30,6 +30,7 @@ struct virtio_fs_vq { struct virtqueue *vq; /* protected by ->lock */ struct work_struct done_work; struct list_head queued_reqs; + struct list_head end_reqs; /* End these requests */ struct delayed_work dispatch_work; struct fuse_dev *fud; bool connected; @@ -259,8 +260,27 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) spin_unlock(&fsvq->lock); } -static void virtio_fs_dummy_dispatch_work(struct work_struct *work) +static void virtio_fs_request_dispatch_work(struct work_struct *work) { + struct fuse_req *req; + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, +dispatch_work.work); + struct fuse_conn *fc = fsvq->fud->fc; + + pr_debug("virtio-fs: worker %s called.\n", __func__); + while (1) { + spin_lock(&fsvq->lock); + req = list_first_entry_or_null(&fsvq->end_reqs, struct fuse_req, + list); + if (!req) { + spin_unlock(&fsvq->lock); + return; + } + + list_del_init(&req->list); + spin_unlock(&fsvq->lock); + fuse_request_end(fc, req); + } } static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) @@ -502,6 +522,7 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev, names[VQ_HIPRIO] = fs->vqs[VQ_HIPRIO].name; INIT_WORK(&fs->vqs[VQ_HIPRIO].done_work, virtio_fs_hiprio_done_work); INIT_LIST_HEAD(&fs->vqs[VQ_HIPRIO].queued_reqs); + INIT_LIST_HEAD(&fs->vqs[VQ_HIPRIO].end_reqs); INIT_DELAYED_WORK(&fs->vqs[VQ_HIPRIO].dispatch_work, virtio_fs_hiprio_dispatch_work); spin_lock_init(&fs->vqs[VQ_HIPRIO].lock); @@ -511,8 +532,9 @@ static int virtio_fs_setup_vqs(struct virtio_device *vdev, spin_lock_init(&fs->vqs[i].lock); INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work); INIT_DELAYED_WORK(&fs->vqs[i].dispatch_work, - virtio_fs_dummy_dispatch_work); + virtio_fs_request_dispatch_work); INIT_LIST_HEAD(&fs->vqs[i].queued_reqs); + INIT_LIST_HEAD(&fs->vqs[i].end_reqs); snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name), "requests.%u", i - VQ_REQUEST); callbacks[i] = virtio_fs_vq_done; @@ -918,6 +940,7 @@ __releases(fiq->lock) struct fuse_conn *fc; struct fuse_req *req; struct fuse_pqueue *fpq; + struct virtio_fs_vq *fsvq; int ret; WARN_ON(list_empty(&fiq->pending)); @@ -951,7 +974,8 @@ __releases(fiq->lock) smp_mb__after_atomic(); retry: - ret = virtio_fs_enqueue_req(&fs->vqs[queue_id], req); + fsvq = &fs->vqs[queue_id]; + ret = virtio_fs_enqueue_req(fsvq, req); if (ret < 0) { if (ret == -ENOMEM || ret == -ENOSPC) { /* Virtqueue full. Retry submission */ @@ -965,7 +989,13 @@ __releases(fiq->lock) clear_bit(FR_SENT, &req->flags); list_del_init(&req->list); spin_unlock(&fpq->lock); - fuse_requ
[PATCH 5/5] virtiofs: Retry request submission from worker context
If regular request queue gets full, currently we sleep for a bit and retrying submission in submitter's context. This assumes submitter is not holding any spin lock. But this assumption is not true for background requests. For background requests, we are called with fc->bg_lock held. This can lead to deadlock where one thread is trying submission with fc->bg_lock held while request completion thread has called fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As request completion thread gets blocked, it does not make further progress and that means queue does not get empty and submitter can't submit more requests. To solve this issue, retry submission with the help of a worker, instead of retrying in submitter's context. We already do this for hiprio/forget requests. Reported-by: Chirantan Ekbote Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 59 ++--- 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 625de45fa471..58e568ef54ef 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -55,6 +55,9 @@ struct virtio_fs_forget { struct list_head list; }; +static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, +struct fuse_req *req, bool in_flight); + static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) { struct virtio_fs *fs = vq->vdev->priv; @@ -260,6 +263,7 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work) struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, dispatch_work.work); struct fuse_conn *fc = fsvq->fud->fc; + int ret; pr_debug("virtio-fs: worker %s called.\n", __func__); while (1) { @@ -268,13 +272,43 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work) list); if (!req) { spin_unlock(&fsvq->lock); - return; + break; } list_del_init(&req->list); spin_unlock(&fsvq->lock); fuse_request_end(fc, req); } + + /* Dispatch pending requests */ + while (1) { + spin_lock(&fsvq->lock); + req = list_first_entry_or_null(&fsvq->queued_reqs, + struct fuse_req, list); + if (!req) { + spin_unlock(&fsvq->lock); + return; + } + list_del_init(&req->list); + spin_unlock(&fsvq->lock); + + ret = virtio_fs_enqueue_req(fsvq, req, true); + if (ret < 0) { + if (ret == -ENOMEM || ret == -ENOSPC) { + spin_lock(&fsvq->lock); + list_add_tail(&req->list, &fsvq->queued_reqs); + schedule_delayed_work(&fsvq->dispatch_work, + msecs_to_jiffies(1)); + spin_unlock(&fsvq->lock); + return; + } + req->out.h.error = ret; + dec_in_flight_req(fsvq); + pr_err("virtio-fs: virtio_fs_enqueue_req() failed %d\n", + ret); + fuse_request_end(fc, req); + } + } } static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) @@ -837,7 +871,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg, /* Add a request to a virtqueue and kick the device */ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, -struct fuse_req *req) +struct fuse_req *req, bool in_flight) { /* requests need at least 4 elements */ struct scatterlist *stack_sgs[6]; @@ -917,7 +951,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq, /* matches barrier in request_wait_answer() */ smp_mb__after_atomic(); - inc_in_flight_req(fsvq); + if (!in_flight) + inc_in_flight_req(fsvq); notify = virtqueue_kick_prepare(vq); spin_unlock(&fsvq->lock); @@ -963,15 +998,21 @@ __releases(fiq->lock) req->in.h.nodeid, req->in.h.len, fuse_len_args(req->args->out_numargs, req->args->out_args)); -retry: fsvq = &fs->vqs[queue_id]; - ret = virtio_fs_enqueue_req(fsvq, req); + ret = virtio_fs_enqueue_req(fsvq, req, false); if (ret < 0) { if (ret == -ENOMEM || ret == -ENOSPC) { - /* Virtqueue full. Retry submission */ - /* TODO use completion instead of timeout */ -
[PATCH 2/5] virtiofs: No need to check fpq->connected state
In virtiofs we keep per queue connected state in virtio_fs_vq->connected and use that to end request if queue is not connected. And virtiofs does not even touch fpq->connected state. We probably need to merge these two at some point of time. For now, simplify the code a bit and do not worry about checking state of fpq->connected. Signed-off-by: Vivek Goyal --- fs/fuse/virtio_fs.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 24ac6f8bf3f7..3b7f7409e77b 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -960,13 +960,6 @@ __releases(fiq->lock) fpq = &fs->vqs[queue_id].fud->pq; spin_lock(&fpq->lock); - if (!fpq->connected) { - spin_unlock(&fpq->lock); - req->out.h.error = -ENODEV; - pr_err("virtio-fs: %s disconnected\n", __func__); - fuse_request_end(fc, req); - return; - } list_add_tail(&req->list, fpq->processing); spin_unlock(&fpq->lock); set_bit(FR_SENT, &req->flags); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 4/7] mdev: introduce device specific ops
On Tue, 15 Oct 2019 20:17:01 +0800 Jason Wang wrote: > On 2019/10/15 下午6:41, Cornelia Huck wrote: > > On Fri, 11 Oct 2019 16:15:54 +0800 > > Jason Wang wrote: > > > >> Currently, except for the create and remove, the rest of > >> mdev_parent_ops is designed for vfio-mdev driver only and may not help > >> for kernel mdev driver. With the help of class id, this patch > >> introduces device specific callbacks inside mdev_device > >> structure. This allows different set of callback to be used by > >> vfio-mdev and virtio-mdev. > >> > >> Signed-off-by: Jason Wang > >> --- > >> .../driver-api/vfio-mediated-device.rst | 22 +--- > >> MAINTAINERS | 1 + > >> drivers/gpu/drm/i915/gvt/kvmgt.c | 18 --- > >> drivers/s390/cio/vfio_ccw_ops.c | 18 --- > >> drivers/s390/crypto/vfio_ap_ops.c | 14 +++-- > >> drivers/vfio/mdev/mdev_core.c | 9 +++- > >> drivers/vfio/mdev/mdev_private.h | 1 + > >> drivers/vfio/mdev/vfio_mdev.c | 37 ++--- > >> include/linux/mdev.h | 42 +++ > >> include/linux/vfio_mdev.h | 52 +++ > >> samples/vfio-mdev/mbochs.c| 20 --- > >> samples/vfio-mdev/mdpy.c | 21 +--- > >> samples/vfio-mdev/mtty.c | 18 --- > >> 13 files changed, 177 insertions(+), 96 deletions(-) > >> create mode 100644 include/linux/vfio_mdev.h > >> > >> diff --git a/Documentation/driver-api/vfio-mediated-device.rst > >> b/Documentation/driver-api/vfio-mediated-device.rst > >> index 2035e48da7b2..553574ebba73 100644 > >> --- a/Documentation/driver-api/vfio-mediated-device.rst > >> +++ b/Documentation/driver-api/vfio-mediated-device.rst > >> @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or > >> any other categorization. > >> Vendor drivers are expected to be fully asynchronous in this respect or > >> provide their own internal resource protection.) > >> > >> -The callbacks in the mdev_parent_ops structure are as follows: > >> +In order to support multiple types of device/driver, device needs to > >> +provide both class_id and device_ops through: > > "As multiple types of mediated devices may be supported, the device > > needs to set up the class id and the device specific callbacks via:" > > > > ? > > > >> > >> -* open: open callback of mediated device > >> -* close: close callback of mediated device > >> -* ioctl: ioctl callback of mediated device > >> +void mdev_set_class(struct mdev_device *mdev, u16 id, const void > >> *ops); > >> + > >> +The class_id is used to be paired with ids in id_table in mdev_driver > >> +structure for probing the correct driver. > > "The class id (specified in id) is used to match a device with an mdev > > driver via its id table." > > > > ? > > > >> The device_ops is device > >> +specific callbacks which can be get through mdev_get_dev_ops() > >> +function by mdev bus driver. > > "The device specific callbacks (specified in *ops) are obtainable via > > mdev_get_dev_ops() (for use by the mdev bus driver.)" > > > > ? > > > >> For vfio-mdev device, its device specific > >> +ops are as follows: > > "A vfio-mdev device (class id MDEV_ID_VFIO) uses the following > > device-specific ops:" > > > > ? > > > All you propose is better than what I wrote, will change the docs. > > > > > >> + > >> +* open: open callback of vfio mediated device > >> +* close: close callback of vfio mediated device > >> +* ioctl: ioctl callback of vfio mediated device > >> * read : read emulation callback > >> * write: write emulation callback > >> * mmap: mmap emulation callback > >> @@ -167,9 +176,10 @@ register itself with the mdev core driver:: > >>extern int mdev_register_device(struct device *dev, > >> const struct mdev_parent_ops *ops); > >> > >> -It is also required to specify the class_id through:: > >> +It is also required to specify the class_id and device specific ops > >> through:: > >> > >> - extern int mdev_set_class(struct device *dev, u16 id); > >> + extern int mdev_set_class(struct device *dev, u16 id, > >> +const void *ops); > > Apologies if that has already been discussed, but do we want a 1:1 > > relationship between id and ops, or can different devices with the same > > id register different ops? > > > I think we have a N:1 mapping between id and ops, e.g we want both > virtio-mdev and vhost-mdev use a single set of device ops. The contents of the ops structure is essentially defined by the id, which is why I was leaning towards them being defined together. They are effectively interlocked, the id defines which mdev "endpoint" driver is loaded and that driver requires mdev_get_dev_ops() to return the structure required by the driver. I wish t
Re: [PATCH V3 1/7] mdev: class id support
On Fri, 11 Oct 2019 16:15:51 +0800 Jason Wang wrote: > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b558d4cfd082..724e9b9841d8 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -45,6 +45,12 @@ void mdev_set_drvdata(struct mdev_device *mdev, void *data) > } > EXPORT_SYMBOL(mdev_set_drvdata); > > +void mdev_set_class(struct mdev_device *mdev, u16 id) > +{ > + mdev->class_id = id; > +} > +EXPORT_SYMBOL(mdev_set_class); > + > struct device *mdev_dev(struct mdev_device *mdev) > { > return &mdev->dev; > @@ -135,6 +141,7 @@ static int mdev_device_remove_cb(struct device *dev, void > *data) > * mdev_register_device : Register a device > * @dev: device structure representing parent device. > * @ops: Parent device operation structure to be registered. > + * @id: class id. > * > * Add device to list of registered parent devices. > * Returns a negative value on error, otherwise 0. > @@ -324,6 +331,9 @@ int mdev_device_create(struct kobject *kobj, > if (ret) > goto ops_create_fail; > > + if (!mdev->class_id) This is a sanity test failure of the parent driver on a privileged path, I think it's fair to print a warning when this occurs rather than only return an errno to the user. In fact, ret is not set to an error value here, so it looks like this fails to create the device but returns success. Thanks, Alex > + goto class_id_fail; > + > ret = device_add(&mdev->dev); > if (ret) > goto add_fail; > @@ -340,6 +350,7 @@ int mdev_device_create(struct kobject *kobj, > > sysfs_fail: > device_del(&mdev->dev); > +class_id_fail: > add_fail: > parent->ops->remove(mdev); > ops_create_fail: ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'
The 'work' field was introduced with commit 06a8fc78367d0 ("VSOCK: Introduce virtio_vsock_common.ko") but it is never used in the code, so we can remove it to save memory allocated in the per-packet 'struct virtio_vsock_pkt' Suggested-by: Michael S. Tsirkin Signed-off-by: Stefano Garzarella --- include/linux/virtio_vsock.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 4c7781f4b29b..07875ccc7bb5 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -48,7 +48,6 @@ struct virtio_vsock_sock { struct virtio_vsock_pkt { struct virtio_vsock_hdr hdr; - struct work_struct work; struct list_head list; /* socket refcnt not held, only use for cancellation */ struct vsock_sock *vsk; -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
0-copy (was Re: question: asynchronous notification from vhost)
On Tue, Oct 15, 2019 at 11:23:13AM +0200, Guennadi Liakhovetski wrote: > Hi, > > I'm developing a virtualised audio / DSP virtio and vhost driver pair > and I'm currently somewhat stuck trying to figure out how to > asynchronously notify the guest from the vhost driver. I'm using the > vhost_add_used_and_signal() function to return data back to the guest > in the guest context, when the guest initiated an operation, that's > working well. But how do I "kick" the guest from an asynchronous, e.g. > from an interrupt thread context? I think I've solved that by using a vhost work queue. That brings me one step further, possibly, to the last larger problem to solve: the actual data transmission. My preference would be to use zero-copy for that, but so far I only see one example of zero-copy in drivers/vhost - in net.c and only in TX direction. As far as I understand that isn't a principal limitation of the vhost / virtio API, rather it's related to certain networking specifics. Am I right? Is it supposed to work in both firections, or are there problems? Thanks Guennadi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 0/7] mdev based hardware virtio offloading support
On Tue, Oct 15, 2019 at 11:37:17AM +0800, Jason Wang wrote: > > On 2019/10/15 上午1:49, Stefan Hajnoczi wrote: > > On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote: > > > There are hardware that can do virtio datapath offloading while having > > > its own control path. This path tries to implement a mdev based > > > unified API to support using kernel virtio driver to drive those > > > devices. This is done by introducing a new mdev transport for virtio > > > (virtio_mdev) and register itself as a new kind of mdev driver. Then > > > it provides a unified way for kernel virtio driver to talk with mdev > > > device implementation. > > > > > > Though the series only contains kernel driver support, the goal is to > > > make the transport generic enough to support userspace drivers. This > > > means vhost-mdev[1] could be built on top as well by resuing the > > > transport. > > > > > > A sample driver is also implemented which simulate a virito-net > > > loopback ethernet device on top of vringh + workqueue. This could be > > > used as a reference implementation for real hardware driver. > > > > > > Consider mdev framework only support VFIO device and driver right now, > > > this series also extend it to support other types. This is done > > > through introducing class id to the device and pairing it with > > > id_talbe claimed by the driver. On top, this seris also decouple > > > device specific parents ops out of the common ones. > > I was curious so I took a quick look and posted comments. > > > > I guess this driver runs inside the guest since it registers virtio > > devices? > > > It could run in either guest or host. But the main focus is to run in the > host then we can use virtio drivers in containers. > > > > > > If this is used with physical PCI devices that support datapath > > offloading then how are physical devices presented to the guest without > > SR-IOV? > > > We will do control path meditation through vhost-mdev[1] and vhost-vfio[2]. > Then we will present a full virtio compatible ethernet device for guest. > > SR-IOV is not a must, any mdev device that implements the API defined in > patch 5 can be used by this framework. What I'm trying to understand is: if you want to present a virtio-pci device to the guest (e.g. using vhost-mdev or vhost-vfio), then how is that related to this patch series? Does this mean this patch series is useful mostly for presenting virtio devices to containers or the host? Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 4/7] mdev: introduce device specific ops
On 2019/10/15 下午6:41, Cornelia Huck wrote: On Fri, 11 Oct 2019 16:15:54 +0800 Jason Wang wrote: Currently, except for the create and remove, the rest of mdev_parent_ops is designed for vfio-mdev driver only and may not help for kernel mdev driver. With the help of class id, this patch introduces device specific callbacks inside mdev_device structure. This allows different set of callback to be used by vfio-mdev and virtio-mdev. Signed-off-by: Jason Wang --- .../driver-api/vfio-mediated-device.rst | 22 +--- MAINTAINERS | 1 + drivers/gpu/drm/i915/gvt/kvmgt.c | 18 --- drivers/s390/cio/vfio_ccw_ops.c | 18 --- drivers/s390/crypto/vfio_ap_ops.c | 14 +++-- drivers/vfio/mdev/mdev_core.c | 9 +++- drivers/vfio/mdev/mdev_private.h | 1 + drivers/vfio/mdev/vfio_mdev.c | 37 ++--- include/linux/mdev.h | 42 +++ include/linux/vfio_mdev.h | 52 +++ samples/vfio-mdev/mbochs.c| 20 --- samples/vfio-mdev/mdpy.c | 21 +--- samples/vfio-mdev/mtty.c | 18 --- 13 files changed, 177 insertions(+), 96 deletions(-) create mode 100644 include/linux/vfio_mdev.h diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst index 2035e48da7b2..553574ebba73 100644 --- a/Documentation/driver-api/vfio-mediated-device.rst +++ b/Documentation/driver-api/vfio-mediated-device.rst @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or any other categorization. Vendor drivers are expected to be fully asynchronous in this respect or provide their own internal resource protection.) -The callbacks in the mdev_parent_ops structure are as follows: +In order to support multiple types of device/driver, device needs to +provide both class_id and device_ops through: "As multiple types of mediated devices may be supported, the device needs to set up the class id and the device specific callbacks via:" ? -* open: open callback of mediated device -* close: close callback of mediated device -* ioctl: ioctl callback of mediated device +void mdev_set_class(struct mdev_device *mdev, u16 id, const void *ops); + +The class_id is used to be paired with ids in id_table in mdev_driver +structure for probing the correct driver. "The class id (specified in id) is used to match a device with an mdev driver via its id table." ? The device_ops is device +specific callbacks which can be get through mdev_get_dev_ops() +function by mdev bus driver. "The device specific callbacks (specified in *ops) are obtainable via mdev_get_dev_ops() (for use by the mdev bus driver.)" ? For vfio-mdev device, its device specific +ops are as follows: "A vfio-mdev device (class id MDEV_ID_VFIO) uses the following device-specific ops:" ? All you propose is better than what I wrote, will change the docs. + +* open: open callback of vfio mediated device +* close: close callback of vfio mediated device +* ioctl: ioctl callback of vfio mediated device * read : read emulation callback * write: write emulation callback * mmap: mmap emulation callback @@ -167,9 +176,10 @@ register itself with the mdev core driver:: extern int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops); -It is also required to specify the class_id through:: +It is also required to specify the class_id and device specific ops through:: - extern int mdev_set_class(struct device *dev, u16 id); + extern int mdev_set_class(struct device *dev, u16 id, + const void *ops); Apologies if that has already been discussed, but do we want a 1:1 relationship between id and ops, or can different devices with the same id register different ops? I think we have a N:1 mapping between id and ops, e.g we want both virtio-mdev and vhost-mdev use a single set of device ops. Thanks If the former, would it make sense to first register the ops for an id (once), and then have the ->create callback only set the class id (with the core doing the lookup of the ops)? However, the mdev_parent_ops structure is not required in the function call that a driver should use to unregister itself with the mdev core driver:: ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 2/7] mdev: bus uevent support
On 2019/10/15 下午6:27, Cornelia Huck wrote: On Fri, 11 Oct 2019 16:15:52 +0800 Jason Wang wrote: This patch adds bus uevent support for mdev bus in order to allow cooperation with userspace. Signed-off-by: Jason Wang --- drivers/vfio/mdev/mdev_driver.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c index b7c40ce86ee3..319d886ffaf7 100644 --- a/drivers/vfio/mdev/mdev_driver.c +++ b/drivers/vfio/mdev/mdev_driver.c @@ -82,9 +82,17 @@ static int mdev_match(struct device *dev, struct device_driver *drv) return 0; } +static int mdev_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return add_uevent_var(env, "MODALIAS=mdev:c%02X", mdev->class_id); +} + struct bus_type mdev_bus_type = { .name = "mdev", .match = mdev_match, + .uevent = mdev_uevent, .probe = mdev_probe, .remove = mdev_remove, }; I'd merge that into the previous patch. Will do. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 1/7] mdev: class id support
On 2019/10/15 下午6:26, Cornelia Huck wrote: On Fri, 11 Oct 2019 16:15:51 +0800 Jason Wang wrote: Mdev bus only supports vfio driver right now, so it doesn't implement match method. But in the future, we may add drivers other than vfio, the first driver could be virtio-mdev. This means we need to add device class id support in bus match method to pair the mdev device and mdev driver correctly. So this patch adds id_table to mdev_driver and class_id for mdev device with the match method for mdev bus. Signed-off-by: Jason Wang --- Documentation/driver-api/vfio-mediated-device.rst | 7 ++- drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + drivers/s390/cio/vfio_ccw_ops.c | 1 + drivers/s390/crypto/vfio_ap_ops.c | 1 + drivers/vfio/mdev/mdev_core.c | 11 +++ drivers/vfio/mdev/mdev_driver.c | 14 ++ drivers/vfio/mdev/mdev_private.h | 1 + drivers/vfio/mdev/vfio_mdev.c | 6 ++ include/linux/mdev.h | 8 include/linux/mod_devicetable.h | 8 samples/vfio-mdev/mbochs.c| 1 + samples/vfio-mdev/mdpy.c | 1 + samples/vfio-mdev/mtty.c | 1 + 13 files changed, 60 insertions(+), 1 deletion(-) diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst index 25eb7d5b834b..2035e48da7b2 100644 --- a/Documentation/driver-api/vfio-mediated-device.rst +++ b/Documentation/driver-api/vfio-mediated-device.rst @@ -102,12 +102,14 @@ structure to represent a mediated device's driver:: * @probe: called when new device created * @remove: called when device removed * @driver: device driver structure + * @id_table: the ids serviced by this driver */ struct mdev_driver { const char *name; int (*probe) (struct device *dev); void (*remove) (struct device *dev); struct device_driverdriver; +const struct mdev_class_id *id_table; }; A mediated bus driver for mdev should use this structure in the function calls @@ -165,12 +167,15 @@ register itself with the mdev core driver:: extern int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops); +It is also required to specify the class_id through:: + + extern int mdev_set_class(struct device *dev, u16 id); Should the document state explicitly that this should be done in the ->create() callback? Yes, it's better. Also, I think that the class_id might be different for different mdevs (even if the parent is the same) -- should that be mentioned explicitly? Yes, depends on the mdev_supported_type. Thanks + However, the mdev_parent_ops structure is not required in the function call that a driver should use to unregister itself with the mdev core driver:: extern void mdev_unregister_device(struct device *dev); - Mediated Device Management Interface Through sysfs == (...) Looks reasonable to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports
On Sat, Oct 12, 2019 at 06:38:46PM -0400, Michael S. Tsirkin wrote: > On Fri, Oct 11, 2019 at 04:34:57PM +0200, Stefano Garzarella wrote: > > On Fri, Oct 11, 2019 at 10:19:13AM -0400, Michael S. Tsirkin wrote: > > > On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote: > > > > We are implementing a test suite for the VSOCK sockets and we discovered > > > > that vmci_transport never allowed half-closed socket on the host side. > > > > > > > > As Jorgen explained [1] this is due to the implementation of VMCI. > > > > > > > > Since we want to have the same behaviour across all transports, this > > > > series adds a section in the "Implementation notes" to exaplain this > > > > behaviour, and changes the vhost_transport to behave the same way. > > > > > > > > [1] https://patchwork.ozlabs.org/cover/847998/#1831400 > > > > > > Half closed sockets are very useful, and lots of > > > applications use tricks to swap a vsock for a tcp socket, > > > which might as a result break. > > > > Got it! > > > > > > > > If VMCI really cares it can implement an ioctl to > > > allow applications to detect that half closed sockets aren't supported. > > > > > > It does not look like VMCI wants to bother (users do not read > > > kernel implementation notes) so it does not really care. > > > So why do we want to cripple other transports intentionally? > > > > The main reason is that we are developing the test suite and we noticed > > the miss match. Since we want to make sure that applications behave in > > the same way on different transports, we thought we would solve it that > > way. > > > > But what you are saying (also in the reply of the patches) is actually > > quite right. Not being publicized, applications do not expect this behavior, > > so please discard this series. > > > > My problem during the tests, was trying to figure out if half-closed > > sockets were supported or not, so as you say adding an IOCTL or maybe > > better a getsockopt() could solve the problem. > > > > What do you think? > > > > Thanks, > > Stefano > > Sure, why not. The aim is for applications using AF_VSOCK sockets to run on any transport. When the semantics differ between transports it creates a compatibility problem. That said, I do think keeping the standard sockets behavior is reasonable. If applications have problems on VMCI a sockopt may be necessary :(. Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 4/7] mdev: introduce device specific ops
On Fri, 11 Oct 2019 16:15:54 +0800 Jason Wang wrote: > Currently, except for the create and remove, the rest of > mdev_parent_ops is designed for vfio-mdev driver only and may not help > for kernel mdev driver. With the help of class id, this patch > introduces device specific callbacks inside mdev_device > structure. This allows different set of callback to be used by > vfio-mdev and virtio-mdev. > > Signed-off-by: Jason Wang > --- > .../driver-api/vfio-mediated-device.rst | 22 +--- > MAINTAINERS | 1 + > drivers/gpu/drm/i915/gvt/kvmgt.c | 18 --- > drivers/s390/cio/vfio_ccw_ops.c | 18 --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++-- > drivers/vfio/mdev/mdev_core.c | 9 +++- > drivers/vfio/mdev/mdev_private.h | 1 + > drivers/vfio/mdev/vfio_mdev.c | 37 ++--- > include/linux/mdev.h | 42 +++ > include/linux/vfio_mdev.h | 52 +++ > samples/vfio-mdev/mbochs.c| 20 --- > samples/vfio-mdev/mdpy.c | 21 +--- > samples/vfio-mdev/mtty.c | 18 --- > 13 files changed, 177 insertions(+), 96 deletions(-) > create mode 100644 include/linux/vfio_mdev.h > > diff --git a/Documentation/driver-api/vfio-mediated-device.rst > b/Documentation/driver-api/vfio-mediated-device.rst > index 2035e48da7b2..553574ebba73 100644 > --- a/Documentation/driver-api/vfio-mediated-device.rst > +++ b/Documentation/driver-api/vfio-mediated-device.rst > @@ -152,11 +152,20 @@ callbacks per mdev parent device, per mdev type, or any > other categorization. > Vendor drivers are expected to be fully asynchronous in this respect or > provide their own internal resource protection.) > > -The callbacks in the mdev_parent_ops structure are as follows: > +In order to support multiple types of device/driver, device needs to > +provide both class_id and device_ops through: "As multiple types of mediated devices may be supported, the device needs to set up the class id and the device specific callbacks via:" ? > > -* open: open callback of mediated device > -* close: close callback of mediated device > -* ioctl: ioctl callback of mediated device > +void mdev_set_class(struct mdev_device *mdev, u16 id, const void *ops); > + > +The class_id is used to be paired with ids in id_table in mdev_driver > +structure for probing the correct driver. "The class id (specified in id) is used to match a device with an mdev driver via its id table." ? > The device_ops is device > +specific callbacks which can be get through mdev_get_dev_ops() > +function by mdev bus driver. "The device specific callbacks (specified in *ops) are obtainable via mdev_get_dev_ops() (for use by the mdev bus driver.)" ? > For vfio-mdev device, its device specific > +ops are as follows: "A vfio-mdev device (class id MDEV_ID_VFIO) uses the following device-specific ops:" ? > + > +* open: open callback of vfio mediated device > +* close: close callback of vfio mediated device > +* ioctl: ioctl callback of vfio mediated device > * read : read emulation callback > * write: write emulation callback > * mmap: mmap emulation callback > @@ -167,9 +176,10 @@ register itself with the mdev core driver:: > extern int mdev_register_device(struct device *dev, >const struct mdev_parent_ops *ops); > > -It is also required to specify the class_id through:: > +It is also required to specify the class_id and device specific ops through:: > > - extern int mdev_set_class(struct device *dev, u16 id); > + extern int mdev_set_class(struct device *dev, u16 id, > + const void *ops); Apologies if that has already been discussed, but do we want a 1:1 relationship between id and ops, or can different devices with the same id register different ops? If the former, would it make sense to first register the ops for an id (once), and then have the ->create callback only set the class id (with the core doing the lookup of the ops)? > > However, the mdev_parent_ops structure is not required in the function call > that a driver should use to unregister itself with the mdev core driver:: ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 2/7] mdev: bus uevent support
On Fri, 11 Oct 2019 16:15:52 +0800 Jason Wang wrote: > This patch adds bus uevent support for mdev bus in order to allow > cooperation with userspace. > > Signed-off-by: Jason Wang > --- > drivers/vfio/mdev/mdev_driver.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c > index b7c40ce86ee3..319d886ffaf7 100644 > --- a/drivers/vfio/mdev/mdev_driver.c > +++ b/drivers/vfio/mdev/mdev_driver.c > @@ -82,9 +82,17 @@ static int mdev_match(struct device *dev, struct > device_driver *drv) > return 0; > } > > +static int mdev_uevent(struct device *dev, struct kobj_uevent_env *env) > +{ > + struct mdev_device *mdev = to_mdev_device(dev); > + > + return add_uevent_var(env, "MODALIAS=mdev:c%02X", mdev->class_id); > +} > + > struct bus_type mdev_bus_type = { > .name = "mdev", > .match = mdev_match, > + .uevent = mdev_uevent, > .probe = mdev_probe, > .remove = mdev_remove, > }; I'd merge that into the previous patch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 1/7] mdev: class id support
On Fri, 11 Oct 2019 16:15:51 +0800 Jason Wang wrote: > Mdev bus only supports vfio driver right now, so it doesn't implement > match method. But in the future, we may add drivers other than vfio, > the first driver could be virtio-mdev. This means we need to add > device class id support in bus match method to pair the mdev device > and mdev driver correctly. > > So this patch adds id_table to mdev_driver and class_id for mdev > device with the match method for mdev bus. > > Signed-off-by: Jason Wang > --- > Documentation/driver-api/vfio-mediated-device.rst | 7 ++- > drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + > drivers/s390/cio/vfio_ccw_ops.c | 1 + > drivers/s390/crypto/vfio_ap_ops.c | 1 + > drivers/vfio/mdev/mdev_core.c | 11 +++ > drivers/vfio/mdev/mdev_driver.c | 14 ++ > drivers/vfio/mdev/mdev_private.h | 1 + > drivers/vfio/mdev/vfio_mdev.c | 6 ++ > include/linux/mdev.h | 8 > include/linux/mod_devicetable.h | 8 > samples/vfio-mdev/mbochs.c| 1 + > samples/vfio-mdev/mdpy.c | 1 + > samples/vfio-mdev/mtty.c | 1 + > 13 files changed, 60 insertions(+), 1 deletion(-) > > diff --git a/Documentation/driver-api/vfio-mediated-device.rst > b/Documentation/driver-api/vfio-mediated-device.rst > index 25eb7d5b834b..2035e48da7b2 100644 > --- a/Documentation/driver-api/vfio-mediated-device.rst > +++ b/Documentation/driver-api/vfio-mediated-device.rst > @@ -102,12 +102,14 @@ structure to represent a mediated device's driver:: >* @probe: called when new device created >* @remove: called when device removed >* @driver: device driver structure > + * @id_table: the ids serviced by this driver >*/ > struct mdev_driver { >const char *name; >int (*probe) (struct device *dev); >void (*remove) (struct device *dev); >struct device_driverdriver; > + const struct mdev_class_id *id_table; > }; > > A mediated bus driver for mdev should use this structure in the function > calls > @@ -165,12 +167,15 @@ register itself with the mdev core driver:: > extern int mdev_register_device(struct device *dev, >const struct mdev_parent_ops *ops); > > +It is also required to specify the class_id through:: > + > + extern int mdev_set_class(struct device *dev, u16 id); Should the document state explicitly that this should be done in the ->create() callback? Also, I think that the class_id might be different for different mdevs (even if the parent is the same) -- should that be mentioned explicitly? > + > However, the mdev_parent_ops structure is not required in the function call > that a driver should use to unregister itself with the mdev core driver:: > > extern void mdev_unregister_device(struct device *dev); > > - > Mediated Device Management Interface Through sysfs > == > (...) Looks reasonable to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
question: asynchronous notification from vhost
Hi, I'm developing a virtualised audio / DSP virtio and vhost driver pair and I'm currently somewhat stuck trying to figure out how to asynchronously notify the guest from the vhost driver. I'm using the vhost_add_used_and_signal() function to return data back to the guest in the guest context, when the guest initiated an operation, that's working well. But how do I "kick" the guest from an asynchronous, e.g. from an interrupt thread context? Thanks Guennadi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted
On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote: > From: Thiago Jung Bauermann > > Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must > be set by both device and guest driver. However, as a hack, when DMA API > returns physical addresses, guest driver can use the DMA API; even though > device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical > addresses. Sorry, but this is a complete bullshit hack. Driver must always use the DMA API if they do DMA, and if virtio devices use physical addresses that needs to be returned through the platform firmware interfaces for the dma setup. If you don't do that yet (which based on previous informations you don't), you need to fix it, and we can then quirk old implementations that already are out in the field. In other words: we finally need to fix that virtio mess and not pile hacks on top of hacks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()
On Mon, Oct 14, 2019 at 11:29:24AM +0100, Robin Murphy wrote: >> However, I would like to see the commit message (and maybe the inline >> comments) expanded a bit on what the distinction here is about. Some >> of the text from the next patch would be suitable, about DMA addresses >> usually being in a different address space but not in the case of >> bounce buffering. > > Right, this needs a much tighter definition. "DMA address happens to be a > valid physical address" is true of various IOMMU setups too, but I can't > believe it's meaningful in such cases. > > If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA address > is physical address of DMA data (not necessarily the original buffer)" - > wouldn't dma_is_direct() suffice? It would. But drivers have absolutely no business knowing any of this. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()
On Mon, Oct 14, 2019 at 11:29:24AM +0100, Robin Murphy wrote: > On 14/10/2019 05:51, David Gibson wrote: > >On Fri, Oct 11, 2019 at 06:25:18PM -0700, Ram Pai wrote: > >>From: Thiago Jung Bauermann > >> > >>In order to safely use the DMA API, virtio needs to know whether DMA > >>addresses are in fact physical addresses and for that purpose, > >>dma_addr_is_phys_addr() is introduced. > >> > >>cc: Benjamin Herrenschmidt > >>cc: David Gibson > >>cc: Michael Ellerman > >>cc: Paul Mackerras > >>cc: Michael Roth > >>cc: Alexey Kardashevskiy > >>cc: Paul Burton > >>cc: Robin Murphy > >>cc: Bartlomiej Zolnierkiewicz > >>cc: Marek Szyprowski > >>cc: Christoph Hellwig > >>Suggested-by: Michael S. Tsirkin > >>Signed-off-by: Ram Pai > >>Signed-off-by: Thiago Jung Bauermann > > > >The change itself looks ok, so > > > >Reviewed-by: David Gibson > > > >However, I would like to see the commit message (and maybe the inline > >comments) expanded a bit on what the distinction here is about. Some > >of the text from the next patch would be suitable, about DMA addresses > >usually being in a different address space but not in the case of > >bounce buffering. > > Right, this needs a much tighter definition. "DMA address happens to > be a valid physical address" is true of various IOMMU setups too, > but I can't believe it's meaningful in such cases. The definition by itself is meaningful AFAICT. At its core its just indicating whether DMA addresses are physical addresses or not. However its up to the caller to use it meaningfully. For non-virtio caller, dma_addr_is_phys_addr() by itself may or may not be meaningful. But for a virtio caller, this information when paired with mem_encrypt_active() is meaningful. For IOMMU setups DMA API will get used regardless of "DMA address happens to be a valid physical address" > > If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA > address is physical address of DMA data (not necessarily the > original buffer)" - wouldn't dma_is_direct() suffice? This may also work, I think. MST: thoughts? RP ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization