Re: [PATCH] vhost_net: fix double fget()
On Tue, May 17, 2022 at 10:00:03PM +, Al Viro wrote: > On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote: > > > Signed-off-by: Al Viro > > > Signed-off-by: Jason Wang > > > > Acked-by: Michael S. Tsirkin > > > > and this is stable material I guess. > > It is, except that commit message ought to be cleaned up. Something > along the lines of > > > Fix double fget() in vhost_net_set_backend() > > Descriptor table is a shared resource; two fget() on the same descriptor > may return different struct file references. get_tap_ptr_ring() is > called after we'd found (and pinned) the socket we'll be using and it > tries to find the private tun/tap data structures associated with it. > Redoing the lookup by the same file descriptor we'd used to get the > socket is racy - we need to same struct file. > > Thanks to Jason for spotting a braino in the original variant of patch - > I'd missed the use of fd == -1 for disabling backend, and in that case > we can end up with sock == NULL and sock != oldsock. > > > Does the above sound sane for commit message? And which tree would you > prefer it to go through? I can take it in vfs.git#fixes, or you could > take it into your tree... Acked-by: Michael S. Tsirkin for the new message and merging through your tree. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost_net: fix double fget()
On Wed, May 18, 2022 at 6:00 AM Al Viro wrote: > > On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote: > > > Signed-off-by: Al Viro > > > Signed-off-by: Jason Wang > > > > Acked-by: Michael S. Tsirkin > > > > and this is stable material I guess. > > It is, except that commit message ought to be cleaned up. Something > along the lines of > > > Fix double fget() in vhost_net_set_backend() > > Descriptor table is a shared resource; two fget() on the same descriptor > may return different struct file references. get_tap_ptr_ring() is > called after we'd found (and pinned) the socket we'll be using and it > tries to find the private tun/tap data structures associated with it. > Redoing the lookup by the same file descriptor we'd used to get the > socket is racy - we need to same struct file. > > Thanks to Jason for spotting a braino in the original variant of patch - > I'd missed the use of fd == -1 for disabling backend, and in that case > we can end up with sock == NULL and sock != oldsock. > > > Does the above sound sane for commit message? Yes. > And which tree would you > prefer it to go through? I can take it in vfs.git#fixes, or you could > take it into your tree... > Consider Michael gave an ack, it would be fine if you want to take via your tree. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 9/9] virtio: use WARN_ON() to warn illegal status value
We used to use BUG_ON() in virtio_device_ready() to detect illegal status value, this seems sub-optimal since the value is under the control of the device. Switch to use WARN_ON() instead. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Signed-off-by: Jason Wang --- include/linux/virtio_config.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index d4edfd7d91bb..9a36051ceb76 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev) { unsigned status = dev->config->get_status(dev); - BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); + WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); /* * The virtio_synchronize_cbs() makes sure vring_interrupt() -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 8/9] virtio: harden vring IRQ
This is a rework on the previous IRQ hardening that is done for virtio-pci where several drawbacks were found and were reverted: 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ that is used by some device such as virtio-blk 2) done only for PCI transport The vq->broken is re-used in this patch for implementing the IRQ hardening. The vq->broken is set to true during both initialization and reset. And the vq->broken is set to false in virtio_device_ready(). Then vring_interrupt() can check and return when vq->broken is true. And in this case, switch to return IRQ_NONE to let the interrupt core aware of such invalid interrupt to prevent IRQ storm. The reason of using a per queue variable instead of a per device one is that we may need it for per queue reset hardening in the future. Note that the hardening is only done for vring interrupt since the config interrupt hardening is already done in commit 22b7050a024d7 ("virtio: defer config changed notifications"). But the method that is used by config interrupt can't be reused by the vring interrupt handler because it uses spinlock to do the synchronization which is expensive. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Signed-off-by: Jason Wang --- drivers/s390/virtio/virtio_ccw.c | 4 drivers/virtio/virtio.c| 15 --- drivers/virtio/virtio_mmio.c | 5 + drivers/virtio/virtio_pci_modern_dev.c | 5 + drivers/virtio/virtio_ring.c | 11 +++ include/linux/virtio_config.h | 20 6 files changed, 53 insertions(+), 7 deletions(-) diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 22d36594bcdd..6f4c83c6c9a7 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device *vdev, u8 status) ccw->flags = 0; ccw->count = sizeof(status); ccw->cda = (__u32)(unsigned long)>dma_area->status; + /* We use ssch for setting the status which is a serializing +* instruction that guarantees the memory writes have +* completed before ssch. +*/ ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS); /* Write failed? We assume status is unchanged. */ if (ret) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 8dde44ea044a..1053f59d0a25 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev) * */ void virtio_reset_device(struct virtio_device *dev) { + /* +* The below virtio_synchronize_cbs() guarantees that any +* interrupt for this line arriving after +* virtio_synchronize_vqs() has completed is guaranteed to see +* vq->broken as true. +*/ + virtio_break_device(dev); + virtio_synchronize_cbs(dev); + dev->config->reset(dev); } EXPORT_SYMBOL_GPL(virtio_reset_device); @@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev) dev->config_enabled = false; dev->config_change_pending = false; + INIT_LIST_HEAD(>vqs); + spin_lock_init(>vqs_list_lock); + /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ virtio_reset_device(dev); @@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev) /* Acknowledge that we've seen the device. */ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); - INIT_LIST_HEAD(>vqs); - spin_lock_init(>vqs_list_lock); - /* * device_add() causes the bus infrastructure to look for a matching * driver. diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 4a3b66e4e198..11c137526d1d 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 status) /* We should never be setting status to 0. */ BUG_ON(status == 0); + /* +* Per memory-barriers.txt, wmb() is not needed to guarantee +* that the the cache coherent memory writes have completed +* before writing to the MMIO region. +*/ writel(status, vm_dev->base + VIRTIO_MMIO_STATUS); } diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c index 591738ad3d56..91c9c0412730 100644 --- a/drivers/virtio/virtio_pci_modern_dev.c +++ b/drivers/virtio/virtio_pci_modern_dev.c @@ -466,6 +466,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev, { struct virtio_pci_common_cfg __iomem *cfg =
[PATCH V5 7/9] virtio: allow to unbreak virtqueue
This patch allows the new introduced __virtio_break_device() to unbreak the virtqueue. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Signed-off-by: Jason Wang --- drivers/virtio/virtio_ring.c | 21 + include/linux/virtio.h | 1 + 2 files changed, 22 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cfb028ca238e..5b7df7c455f0 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2397,6 +2397,27 @@ void virtio_break_device(struct virtio_device *dev) } EXPORT_SYMBOL_GPL(virtio_break_device); +/* + * This should allow the device to be used by the driver. You may + * need to grab appropriate locks to flush. This should only be used + * in some specific case e.g (probing and restoring). Driver should + * not call this directly. + */ +void __virtio_unbreak_device(struct virtio_device *dev) +{ + struct virtqueue *_vq; + + spin_lock(>vqs_list_lock); + list_for_each_entry(_vq, >vqs, list) { + struct vring_virtqueue *vq = to_vvq(_vq); + + /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ + WRITE_ONCE(vq->broken, false); + } + spin_unlock(>vqs_list_lock); +} +EXPORT_SYMBOL_GPL(__virtio_unbreak_device); + dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 5464f398912a..d8fdf170637c 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev); bool is_virtio_device(struct device *dev); void virtio_break_device(struct virtio_device *dev); +void __virtio_unbreak_device(struct virtio_device *dev); void virtio_config_changed(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 6/9] virtio-ccw: implement synchronize_cbs()
This patch tries to implement the synchronize_cbs() for ccw. For the vring_interrupt() that is called via virtio_airq_handler(), the synchronization is simply done via the airq_info's lock. For the vring_interrupt() that is called via virtio_ccw_int_handler(), a per device rwlock is introduced ans used in the synchronization method. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Signed-off-by: Jason Wang --- drivers/s390/virtio/virtio_ccw.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index d35e7a3f7067..22d36594bcdd 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -62,6 +62,7 @@ struct virtio_ccw_device { unsigned int revision; /* Transport revision */ wait_queue_head_t wait_q; spinlock_t lock; + rwlock_t irq_lock; struct mutex io_lock; /* Serializes I/O requests */ struct list_head virtqueues; bool is_thinint; @@ -984,6 +985,27 @@ static const char *virtio_ccw_bus_name(struct virtio_device *vdev) return dev_name(>cdev->dev); } +static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev) +{ + struct virtio_ccw_device *vcdev = to_vc_device(vdev); + struct airq_info *info = vcdev->airq_info; + + if (info) { + /* +* Synchronize with the vring_interrupt() with airq indicator +*/ + write_lock_irq(>lock); + write_unlock_irq(>lock); + } else { + /* +* Synchronize with the vring_interrupt() called by +* virtio_ccw_int_handler(). +*/ + write_lock_irq(>irq_lock); + write_unlock_irq(>irq_lock); + } +} + static const struct virtio_config_ops virtio_ccw_config_ops = { .get_features = virtio_ccw_get_features, .finalize_features = virtio_ccw_finalize_features, @@ -995,6 +1017,7 @@ static const struct virtio_config_ops virtio_ccw_config_ops = { .find_vqs = virtio_ccw_find_vqs, .del_vqs = virtio_ccw_del_vqs, .bus_name = virtio_ccw_bus_name, + .synchronize_cbs = virtio_ccw_synchronize_cbs, }; @@ -1106,6 +1129,8 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev, vcdev->err = -EIO; } virtio_ccw_check_activity(vcdev, activity); + /* Local interrupt should be disabled at this time */ + read_lock(>irq_lock); for_each_set_bit(i, indicators(vcdev), sizeof(*indicators(vcdev)) * BITS_PER_BYTE) { /* The bit clear must happen before the vring kick. */ @@ -1114,6 +1139,7 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev, vq = virtio_ccw_vq_by_ind(vcdev, i); vring_interrupt(0, vq); } + read_unlock(>irq_lock); if (test_bit(0, indicators2(vcdev))) { virtio_config_changed(>vdev); clear_bit(0, indicators2(vcdev)); @@ -1284,6 +1310,7 @@ static int virtio_ccw_online(struct ccw_device *cdev) init_waitqueue_head(>wait_q); INIT_LIST_HEAD(>virtqueues); spin_lock_init(>lock); + rwlock_init(>irq_lock); mutex_init(>io_lock); spin_lock_irqsave(get_ccwdev_lock(cdev), flags); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 5/9] virtio-mmio: implement synchronize_cbs()
Simply synchronize the platform irq that is used by us. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Jason Wang --- drivers/virtio/virtio_mmio.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 56128b9c46eb..4a3b66e4e198 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -345,6 +345,14 @@ static void vm_del_vqs(struct virtio_device *vdev) free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev); } + +static void vm_synchronize_cbs(struct virtio_device *vdev) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + + synchronize_irq(platform_get_irq(vm_dev->pdev, 0)); +} + static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, void (*callback)(struct virtqueue *vq), const char *name, bool ctx) @@ -541,6 +549,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = { .finalize_features = vm_finalize_features, .bus_name = vm_bus_name, .get_shm_region = vm_get_shm_region, + .synchronize_cbs = vm_synchronize_cbs, }; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 4/9] virtio-pci: implement synchronize_cbs()
We can simply reuse vp_synchronize_vectors() for .synchronize_cbs(). Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Jason Wang --- drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 6f4e34ce96b8..207985107150 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = { .reset = vp_reset, .find_vqs = vp_find_vqs, .del_vqs= vp_del_vqs, + .synchronize_cbs = vp_synchronize_vectors, .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index a2671a20ef77..18c2190e3059 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -394,6 +394,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = { .reset = vp_reset, .find_vqs = vp_modern_find_vqs, .del_vqs= vp_del_vqs, + .synchronize_cbs = vp_synchronize_vectors, .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, @@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = { .reset = vp_reset, .find_vqs = vp_modern_find_vqs, .del_vqs= vp_del_vqs, + .synchronize_cbs = vp_synchronize_vectors, .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 3/9] virtio: introduce config op to synchronize vring callbacks
This patch introduces new virtio config op to vring callbacks. Transport specific method is required to make sure the write before this function is visible to the vring_interrupt() that is called after the return of this function. For the transport that doesn't provide synchronize_vqs(), use synchornize_rcu() which synchronize with IRQ implicitly as a fallback. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Jason Wang --- include/linux/virtio_config.h | 25 + 1 file changed, 25 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index b341dd62aa4d..25be018810a7 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -57,6 +57,11 @@ struct virtio_shm_region { * include a NULL entry for vqs unused by driver * Returns 0 on success or error status * @del_vqs: free virtqueues found by find_vqs(). + * @synchronize_cbs: synchronize with the virtqueue callbacks (optional) + * The function guarantees that all memory operations on the + * queue before it are visible to the vring_interrupt() that is + * called after it. + * vdev: the virtio_device * @get_features: get the array of feature bits for this device. * vdev: the virtio_device * Returns the first 64 feature bits (all we currently need). @@ -89,6 +94,7 @@ struct virtio_config_ops { const char * const names[], const bool *ctx, struct irq_affinity *desc); void (*del_vqs)(struct virtio_device *); + void (*synchronize_cbs)(struct virtio_device *); u64 (*get_features)(struct virtio_device *vdev); int (*finalize_features)(struct virtio_device *vdev); const char *(*bus_name)(struct virtio_device *vdev); @@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, desc); } +/** + * virtio_synchronize_cbs - synchronize with virtqueue callbacks + * @vdev: the device + */ +static inline +void virtio_synchronize_cbs(struct virtio_device *dev) +{ + if (dev->config->synchronize_cbs) { + dev->config->synchronize_cbs(dev); + } else { + /* +* A best effort fallback to synchronize with +* interrupts, preemption and softirq disabled +* regions. See comment above synchronize_rcu(). +*/ + synchronize_rcu(); + } +} + /** * virtio_device_ready - enable vq use in probe function * @vdev: the device -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 2/9] virtio: use virtio_reset_device() when possible
This allows us to do common extension without duplicating code. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Jason Wang --- drivers/virtio/virtio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 75c8d560bbd3..8dde44ea044a 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev) /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ - dev->config->reset(dev); + virtio_reset_device(dev); /* Acknowledge that we've seen the device. */ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); @@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev) /* We always start by resetting the device, in case a previous * driver messed it up. */ - dev->config->reset(dev); + virtio_reset_device(dev); /* Acknowledge that we've seen the device. */ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 1/9] virtio: use virtio_device_ready() in virtio_device_restore()
From: Stefano Garzarella It will allow us to do extension on virtio_device_ready() without duplicating code. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s...@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Stefano Garzarella Signed-off-by: Jason Wang --- drivers/virtio/virtio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 22f15f444f75..75c8d560bbd3 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev) goto err; } - /* Finally, tell the device we're all set */ - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + /* If restore didn't do it, mark device DRIVER_OK ourselves. */ + if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) + virtio_device_ready(dev); virtio_config_enable(dev); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V5 0/9] rework on the IRQ hardening of virtio
Hi All: This is a rework on the IRQ hardening for virtio which is done previously by the following commits are reverted: 9e35276a5344 ("virtio_pci: harden MSI-X interrupts") 080cd7c3ac87 ("virtio-pci: harden INTX interrupts") The reason is that it depends on the IRQF_NO_AUTOEN which may conflict with the assumption of the affinity managed IRQ that is used by some virtio drivers. And what's more, it is only done for virtio-pci but not other transports. In this rework, I try to implement a general virtio solution which borrows the idea of the INTX hardening by re-using per virtqueue boolean vq->broken and toggle it in virtio_device_ready() and virtio_reset_device(). Then we can simply reuse the existing checks in the vring_interrupt() and return early if the driver is not ready. Note that, I only did compile test on ccw and MMIO transport. Please review. Changes since V4: - use spin_lock_irq()/spin_unlock_irq() to synchronize with vring_interrupt() for ccw - use spin_lock()/spin_unlock() to protect vring_interrupt() for non airq - add comment to explain the ordering implications of set_status() for PCI, ccw and MMIO - various tweaks on the comments and changelogs Changes since V3: - Rename synchornize_vqs() to synchronize_cbs() - tweak the comment for synchronize_cbs() - switch to use a dedicated helper __virtio_unbreak_device() and document it should be only used for probing - switch to use rwlock to synchornize the non airq for ccw Changes since V2: - add ccw and MMIO support - rename synchronize_vqs() to synchronize_cbs() - switch to re-use vq->broken instead of introducing new device attributes for the future virtqueue reset support - remove unnecssary READ_ONCE()/WRITE_ONCE() - a new patch to remove device triggerable BUG_ON() - more tweaks on the comments Changes since v1: - Use transport specific irq synchronization method when possible - Drop the module parameter and enable the hardening unconditonally - Tweak the barrier/ordering facilities used in the code - Reanme irq_soft_enabled to driver_ready - Avoid unnecssary IRQ synchornization (e.g during boot) Jason Wang (8): virtio: use virtio_reset_device() when possible virtio: introduce config op to synchronize vring callbacks virtio-pci: implement synchronize_cbs() virtio-mmio: implement synchronize_cbs() virtio-ccw: implement synchronize_cbs() virtio: allow to unbreak virtqueue virtio: harden vring IRQ virtio: use WARN_ON() to warning illegal status value Stefano Garzarella (1): virtio: use virtio_device_ready() in virtio_device_restore() drivers/s390/virtio/virtio_ccw.c | 31 + drivers/virtio/virtio.c| 24 + drivers/virtio/virtio_mmio.c | 14 drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 2 ++ drivers/virtio/virtio_pci_modern_dev.c | 5 +++ drivers/virtio/virtio_ring.c | 32 +++--- include/linux/virtio.h | 1 + include/linux/virtio_config.h | 47 +- 9 files changed, 145 insertions(+), 12 deletions(-) -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 2/5] vdpa: Add support for querying vendor statistics
On 5/17/2022 6:13 AM, Eli Cohen wrote: Allows to read vendor statistics of a vdpa device. The specific statistics data are received from the upstream driver in the form of an (attribute name, attribute value) pairs. An example of statistics for mlx5_vdpa device are: received_desc - number of descriptors received by the virtqueue completed_desc - number of descriptors completed by the virtqueue A descriptor using indirect buffers is still counted as 1. In addition, N chained descriptors are counted correctly N times as one would expect. A new callback was added to vdpa_config_ops which provides the means for the vdpa driver to return statistics results. The interface allows for reading all the supported virtqueues, including the control virtqueue if it exists. Below are some examples taken from mlx5_vdpa which are introduced in the following patch: 1. Read statistics for the virtqueue at index 1 $ vdpa dev vstats show vdpa-a qidx 1 vdpa-a: queue_type tx queue_index 1 received_desc 3844836 completed_desc 3844836 2. Read statistics for the virtqueue at index 32 $ vdpa dev vstats show vdpa-a qidx 32 vdpa-a: queue_type control_vq queue_index 32 received_desc 62 completed_desc 62 3. Read statisitics for the virtqueue at index 0 with json output $ vdpa -j dev vstats show vdpa-a qidx 0 {"vstats":{"vdpa-a":{ "queue_type":"rx","queue_index":0,"name":"received_desc","value":417776,\ "name":"completed_desc","value":417548}}} 4. Read statistics for the virtqueue at index 0 with preety json output $ vdpa -jp dev vstats show vdpa-a qidx 0 { "vstats": { "vdpa-a": { "queue_type": "rx", "queue_index": 0, "name": "received_desc", "value": 417776, "name": "completed_desc", "value": 417548 } } } Signed-off-by: Eli Cohen --- drivers/vdpa/vdpa.c | 160 ++ include/linux/vdpa.h | 3 + include/uapi/linux/vdpa.h | 6 ++ 3 files changed, 169 insertions(+) diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index fac89a0d8178..1b810961ccc3 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -914,6 +914,106 @@ vdpa_dev_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, return err; } +static int vdpa_fill_stats_rec(struct vdpa_device *vdev, struct sk_buff *msg, + struct genl_info *info, u32 index) +{ + struct virtio_net_config config = {}; + u64 features; + u16 max_vqp; + u8 status; + int err; + This entire function should perhaps need to take cf_mutex (worth converting to rwsem as well) to guard against concurrent modification via .set_status() or .set_config() from the upper stack. Similar to vdpa_dev_config_fill(). + status = vdev->config->get_status(vdev); + if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { + NL_SET_ERR_MSG_MOD(info->extack, "feature negotiation not complete"); + return -EAGAIN; + } + vdpa_get_config(vdev, 0, , sizeof(config)); This should use vdpa_get_config_unlocked() without lock. Regards, -Siwei + + max_vqp = le16_to_cpu(config.max_virtqueue_pairs); + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp)) + return -EMSGSIZE; + + features = vdev->config->get_driver_features(vdev); + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, + features, VDPA_ATTR_PAD)) + return -EMSGSIZE; + + if (nla_put_u32(msg, VDPA_ATTR_DEV_QUEUE_INDEX, index)) + return -EMSGSIZE; + + err = vdev->config->get_vendor_vq_stats(vdev, index, msg, info->extack); + if (err) + return err; + + return 0; +} + +static int vendor_stats_fill(struct vdpa_device *vdev, struct sk_buff *msg, +struct genl_info *info, u32 index) +{ + int err; + + if (!vdev->config->get_vendor_vq_stats) + return -EOPNOTSUPP; + + err = vdpa_fill_stats_rec(vdev, msg, info, index); + if (err) + return err; + + return 0; +} + +static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev, + struct sk_buff *msg, + struct genl_info *info, u32 index) +{ + u32 device_id; + void *hdr; + int err; + u32 portid = info->snd_portid; + u32 seq = info->snd_seq; + u32 flags = 0; + + hdr = genlmsg_put(msg, portid, seq, _nl_family, flags, + VDPA_CMD_DEV_VSTATS_GET); + if (!hdr) + return -EMSGSIZE; + + if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev))) { + err = -EMSGSIZE; + goto undo_msg; + } + + device_id = vdev->config->get_device_id(vdev); + if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) {
Re: [PATCH v7 5/5] vdpa/mlx5: Use readers/writers semaphore instead of mutex
On 5/17/2022 6:13 AM, Eli Cohen wrote: Reading statistics could be done intensively and by several processes concurrently. Reader's lock is sufficient in this case. Change reslock from mutex to a rwsem. Suggested-by: Si-Wei Liu Signed-off-by: Eli Cohen Reviewed-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 41 ++- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 2b815ef850c8..57cfc64248b7 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -155,7 +155,7 @@ struct mlx5_vdpa_net { * since memory map might change and we need to destroy and create * resources while driver in operational. */ - struct mutex reslock; + struct rw_semaphore reslock; struct mlx5_flow_table *rxft; struct mlx5_fc *rx_counter; struct mlx5_flow_handle *rx_rule_ucast; @@ -1695,7 +1695,7 @@ static void mlx5_cvq_kick_handler(struct work_struct *work) ndev = to_mlx5_vdpa_ndev(mvdev); cvq = >cvq; - mutex_lock(>reslock); + down_write(>reslock); if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) goto out; @@ -1746,7 +1746,7 @@ static void mlx5_cvq_kick_handler(struct work_struct *work) } out: - mutex_unlock(>reslock); + up_write(>reslock); } static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) @@ -2244,7 +2244,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev) struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); int err; - WARN_ON(!mutex_is_locked(>reslock)); + WARN_ON(!rwsem_is_locked(>reslock)); if (ndev->setup) { mlx5_vdpa_warn(mvdev, "setup driver called for already setup driver\n"); @@ -2292,7 +2292,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev) static void teardown_driver(struct mlx5_vdpa_net *ndev) { - WARN_ON(!mutex_is_locked(>reslock)); + WARN_ON(!rwsem_is_locked(>reslock)); if (!ndev->setup) return; @@ -2322,7 +2322,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) print_status(mvdev, status, true); - mutex_lock(>reslock); + down_write(>reslock); if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { @@ -2338,14 +2338,14 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) } ndev->mvdev.status = status; - mutex_unlock(>reslock); + up_write(>reslock); return; err_setup: mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED; err_clear: - mutex_unlock(>reslock); + up_write(>reslock); } static int mlx5_vdpa_reset(struct vdpa_device *vdev) @@ -2356,7 +2356,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev) print_status(mvdev, 0, true); mlx5_vdpa_info(mvdev, "performing device reset\n"); - mutex_lock(>reslock); + down_write(>reslock); teardown_driver(ndev); clear_vqs_ready(ndev); mlx5_vdpa_destroy_mr(>mvdev); @@ -2371,7 +2371,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev) if (mlx5_vdpa_create_mr(mvdev, NULL)) mlx5_vdpa_warn(mvdev, "create MR failed\n"); } - mutex_unlock(>reslock); + up_write(>reslock); return 0; } @@ -2411,7 +2411,7 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb bool change_map; int err; - mutex_lock(>reslock); + down_write(>reslock); err = mlx5_vdpa_handle_set_map(mvdev, iotlb, _map); if (err) { @@ -2423,7 +2423,7 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb err = mlx5_vdpa_change_map(mvdev, iotlb); err: - mutex_unlock(>reslock); + up_write(>reslock); return err; } @@ -2442,7 +2442,6 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev) mlx5_mpfs_del_mac(pfmdev, ndev->config.mac); } mlx5_vdpa_free_resources(>mvdev); - mutex_destroy(>reslock); kfree(ndev->event_cbs); kfree(ndev->vqs); } @@ -2527,7 +2526,7 @@ static int mlx5_vdpa_get_vendor_vq_stats(struct vdpa_device *vdev, u16 idx, u64 completed_desc; int err = 0; - mutex_lock(>reslock); + down_read(>reslock); if (!is_index_valid(mvdev, idx)) { NL_SET_ERR_MSG_MOD(extack, "virtqueue index is not valid"); err = -EINVAL; @@ -2566,7 +2565,7 @@ static int mlx5_vdpa_get_vendor_vq_stats(struct vdpa_device *vdev, u16 idx, err = 0; out_err: - mutex_unlock(>reslock); + up_read(>reslock); return err; } @@ -2835,18 +2834,18 @@ static int
RE: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device
> From: gre...@linuxfoundation.org > Sent: Tuesday, May 17, 2022 9:54 AM > > On Tue, May 17, 2022 at 12:21:03PM +, Parav Pandit wrote: > > Hi Greg, Yongji, > > > > > From: Yongji Xie > > > Sent: Tuesday, May 17, 2022 3:25 AM > > > > > > On Tue, May 17, 2022 at 4:06 AM Parav Pandit > wrote: > > > > > > > > > > > > > > > > > From: Xie Yongji > > > > > Sent: Monday, May 16, 2022 2:04 AM > > > > > > > > > > Introduce a device object for vdpa management device to control > > > > > its lifecycle. > > > > Why is this needed? > > > > What is the limitation of current approach that requires new > > > > device for > > > mgmtdev? > > > > The primary motivation is missing in the commit log here. > > > > > > > > > > OK, I will add one. This patch actually comes from the discussion: > > > > > > > > > https://www.spinics.net/lists/linux-virtualization/msg56371.html > > > > > > The "vdpa_mgmt_dev" makes things a little confusing. Embedding the > > > device struct in it to control its lifecycle simplifies the logic > > > and makes the driver model clear. > > > > > No. it doesn't. > > > > vdpa_mgmt_dev is really the handle for all the netlink socket messages > targeted. > > It is not really a struct device. > > Why can it not be one? It has lifetime rules that must be followed, so might > as well use the built-in code to handle this, right? > Lifetime rules are followed. That is, during unloading of the driver for the struct device* dev, mgmtdev must be unregistered. Following the exact mirror of creation. I general before deleting the struct device *dev, its upper layer user mgmt. dev must be unregistered. There will be zero user of this struct mgmt_dev, after invocation of current API vdpa_mgmtdev_unregister(). This is the onus on above unregister API. Any cleanup cannot be differed until newly thought dev->release(). > What is wrong with it being a struct device? I am not sure if its fully wrong. First for this struct device, there is no real driver bind to it. Secondly, it is simply an overkill without any visible use. vdpa_mgmt_dev struct has only register/unregister life cycle. Unregistration of it stops and destroy any outstanding operation on it. Any code accessing mgmtdev post unregistration simply won't work. Anything pending cannot be freed in the newly sighted release() callback. Freeing anything in the release is too late because hw resources are destroy after unregister and before release(). Contrary to it, doing so will hide bugs because such struct will live longer now if done as struct device. On a side note, these devices are in 500 to 1K range and expected to grow more. Creating yet another struct device and linking it to its parent would result in additional sysfs links, udev event explosion for no absolute use or doesn't bring any object lifecycle changes. vduse is a special case in the kernel, that has to create a dummy device because there is no real device (like pci) linked to it that can service vdpa commands. > > > We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing. > > Where is the "management" device if this is not that? > > What does "handle" mean here? It a handle to on which commands are arriving via socket from user space. These commands are serviced by the struct device *dev. DMA operation etc also happen on *dev. You can think of it as unique handle to refer to the kernel like netdev->ifindex on netlink messages. > > > And regarding vduse_dev_release() and existing empty release function, > they can be dynamically allocated. > > This is because they are really the struct device. > > I do not understand this statement, sorry. It was bad sentence, my bad. What I wanted to say is, probably better explained with real patch below. In context of [1], struct vduse_mgmtdev empty release function can be avoided by below inline patch [2]. [1] https://www.spinics.net/lists/linux-virtualization/msg56371.html [2] patch: >From f87d557fe939a1632473dd11526a87301adbab8d Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Wed, 18 May 2022 01:22:21 +0300 Subject: [PATCH] vduse: Tie vduse mgmtdev and its device vduse management device is not backed by any real device such as PCI. Hence it doesn't have any parent device linked to it. Kernel driver model in [1] suggests avoiding an empty device release callback. Hence tie the mgmtdevice object's life cycle to an allocate dummy struct device instead of having it static. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/core-api/kobject.rst?h=v5.18-rc7#n284 Signed-off-by: Parav Pandit --- drivers/vdpa/vdpa_user/vduse_dev.c | 60 ++ 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index f85d1a08ed87..ebe272575fb8 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1475,16 +1475,12 @@ static char
Re: [PATCH] vhost_net: fix double fget()
On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote: > > Signed-off-by: Al Viro > > Signed-off-by: Jason Wang > > Acked-by: Michael S. Tsirkin > > and this is stable material I guess. It is, except that commit message ought to be cleaned up. Something along the lines of Fix double fget() in vhost_net_set_backend() Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Does the above sound sane for commit message? And which tree would you prefer it to go through? I can take it in vfs.git#fixes, or you could take it into your tree... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 7/8] vhost-test: drop flush after vhost_dev_cleanup
The flush after vhost_dev_cleanup is not needed because: 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread so the flush call will just return since the worker has not device. 2. It's not needed. The comment about jobs re-queueing themselves does not look correct because handle_vq does not requeue work. Signed-off-by: Mike Christie Acked-by: Jason Wang --- drivers/vhost/test.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index de39151366c5..6c139f18bc54 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -158,9 +158,6 @@ static int vhost_test_release(struct inode *inode, struct file *f) vhost_test_flush(n); vhost_dev_stop(>dev); vhost_dev_cleanup(>dev); - /* We do an extra flush before freeing memory, -* since jobs can re-queue themselves. */ - vhost_test_flush(n); kfree(n->dev.vqs); kfree(n); return 0; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 5/8] vhost_vsock: simplify vhost_vsock_flush()
From: Andrey Ryabinin vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev) before vhost_work_dev_flush(>dev). This seems pointless as vsock->vqs[i].poll.dev is the same as >dev and several flushes in a row doesn't do anything useful, one is just enough. Signed-off-by: Andrey Ryabinin Reviewed-by: Stefano Garzarella Signed-off-by: Mike Christie Acked-by: Jason Wang --- drivers/vhost/vsock.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index a4c8ae92a0fb..96be63697117 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -705,11 +705,6 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) static void vhost_vsock_flush(struct vhost_vsock *vsock) { - int i; - - for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) - if (vsock->vqs[i].handle_kick) - vhost_work_dev_flush(vsock->vqs[i].poll.dev); vhost_work_dev_flush(>dev); } -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 8/8] vhost: rename vhost_work_dev_flush
This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/net.c | 4 ++-- drivers/vhost/scsi.c | 2 +- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 10 +- drivers/vhost/vhost.h | 2 +- drivers/vhost/vsock.c | 2 +- 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 047b7b05109a..0e4ff6a08f5f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1376,7 +1376,7 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock, static void vhost_net_flush(struct vhost_net *n) { - vhost_work_dev_flush(>dev); + vhost_dev_flush(>dev); if (n->vqs[VHOST_NET_VQ_TX].ubufs) { mutex_lock(>vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = true; @@ -1566,7 +1566,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) } if (oldsock) { - vhost_work_dev_flush(>dev); + vhost_dev_flush(>dev); sockfd_put(oldsock); } diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 94535c813ef7..ffd9e6c2ffc1 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1436,7 +1436,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs) kref_put(_inflight[i]->kref, vhost_scsi_done_inflight); /* Flush both the vhost poll and vhost work */ - vhost_work_dev_flush(>dev); + vhost_dev_flush(>dev); /* Wait for all reqs issued before the flush to be finished */ for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 6c139f18bc54..bc8e7fb1e635 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep) static void vhost_test_flush(struct vhost_test *n) { - vhost_work_dev_flush(>dev); + vhost_dev_flush(>dev); } static int vhost_test_release(struct inode *inode, struct file *f) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9f8de04bb673..716a80c61fa9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -231,7 +231,7 @@ void vhost_poll_stop(struct vhost_poll *poll) } EXPORT_SYMBOL_GPL(vhost_poll_stop); -void vhost_work_dev_flush(struct vhost_dev *dev) +void vhost_dev_flush(struct vhost_dev *dev) { struct vhost_flush_struct flush; @@ -243,7 +243,7 @@ void vhost_work_dev_flush(struct vhost_dev *dev) wait_for_completion(_event); } } -EXPORT_SYMBOL_GPL(vhost_work_dev_flush); +EXPORT_SYMBOL_GPL(vhost_dev_flush); void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { @@ -530,7 +530,7 @@ static int vhost_attach_cgroups(struct vhost_dev *dev) attach.owner = current; vhost_work_init(, vhost_attach_cgroups_work); vhost_work_queue(dev, ); - vhost_work_dev_flush(dev); + vhost_dev_flush(dev); return attach.ret; } @@ -657,7 +657,7 @@ void vhost_dev_stop(struct vhost_dev *dev) vhost_poll_stop(>vqs[i]->poll); } - vhost_work_dev_flush(dev); + vhost_dev_flush(dev); } EXPORT_SYMBOL_GPL(vhost_dev_stop); @@ -1711,7 +1711,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg mutex_unlock(>mutex); if (pollstop && vq->handle_kick) - vhost_work_dev_flush(vq->poll.dev); + vhost_dev_flush(vq->poll.dev); return r; } EXPORT_SYMBOL_GPL(vhost_vring_ioctl); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index aeb8e1ad1496..d02adf1b2bf8 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -45,7 +45,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, int vhost_poll_start(struct vhost_poll *poll, struct file *file); void vhost_poll_stop(struct vhost_poll *poll); void vhost_poll_queue(struct vhost_poll *poll); -void vhost_work_dev_flush(struct vhost_dev *dev); +void vhost_dev_flush(struct vhost_dev *dev); struct vhost_log { u64 addr; diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 96be63697117..368330417bde 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -705,7 +705,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) static void vhost_vsock_flush(struct vhost_vsock *vsock) { - vhost_work_dev_flush(>dev); + vhost_dev_flush(>dev); } static void vhost_vsock_reset_orphans(struct sock *sk) -- 2.25.1
[PATCH V3 6/8] vhost-scsi: drop flush after vhost_dev_cleanup
The flush after vhost_dev_cleanup is not needed because: 1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread so the flush call will just return since the worker has not device. 2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs the mutex and if the backend is NULL will return without queueing a work. vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex then drops the mutex and does a flush. So we know when vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend no evt related work will be able to requeue. The flush would then make sure any queued evts are run and return. Signed-off-by: Mike Christie Acked-by: Jason Wang --- drivers/vhost/scsi.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 532e204f2b1b..94535c813ef7 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, struct file *f) vhost_scsi_clear_endpoint(vs, ); vhost_dev_stop(>dev); vhost_dev_cleanup(>dev); - /* Jobs can re-queue themselves in evt kick handler. Do extra flush. */ - vhost_scsi_flush(vs); kfree(vs->dev.vqs); kvfree(vs); return 0; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 3/8] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls
From: Andrey Ryabinin vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing vhost_dev pointer obtained via 'n->poll[index].dev' and 'n->vqs[index].vq.poll.dev'. This is actually the same pointer, initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init() Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly. Do the flushes only once instead of several flush calls in a row which seems rather useless. Signed-off-by: Andrey Ryabinin [drop vhost_dev forward declaration in vhost.h] Signed-off-by: Mike Christie Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/net.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 4e55ad8c942a..047b7b05109a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1374,16 +1374,9 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock, *rx_sock = vhost_net_stop_vq(n, >vqs[VHOST_NET_VQ_RX].vq); } -static void vhost_net_flush_vq(struct vhost_net *n, int index) -{ - vhost_work_dev_flush(n->poll[index].dev); - vhost_work_dev_flush(n->vqs[index].vq.poll.dev); -} - static void vhost_net_flush(struct vhost_net *n) { - vhost_net_flush_vq(n, VHOST_NET_VQ_TX); - vhost_net_flush_vq(n, VHOST_NET_VQ_RX); + vhost_work_dev_flush(>dev); if (n->vqs[VHOST_NET_VQ_TX].ubufs) { mutex_lock(>vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = true; @@ -1573,7 +1566,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) } if (oldsock) { - vhost_net_flush_vq(n, index); + vhost_work_dev_flush(>dev); sockfd_put(oldsock); } -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 1/8] vhost: get rid of vhost_poll_flush() wrapper
From: Andrey Ryabinin vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/net.c | 4 ++-- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 12 ++-- drivers/vhost/vhost.h | 1 - drivers/vhost/vsock.c | 2 +- 5 files changed, 6 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 792ab5f23647..4e55ad8c942a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1376,8 +1376,8 @@ static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock, static void vhost_net_flush_vq(struct vhost_net *n, int index) { - vhost_poll_flush(n->poll + index); - vhost_poll_flush(>vqs[index].vq.poll); + vhost_work_dev_flush(n->poll[index].dev); + vhost_work_dev_flush(n->vqs[index].vq.poll.dev); } static void vhost_net_flush(struct vhost_net *n) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 05740cba1cd8..f0ac9e35f5d6 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep) static void vhost_test_flush_vq(struct vhost_test *n, int index) { - vhost_poll_flush(>vqs[index].poll); + vhost_work_dev_flush(n->vqs[index].poll.dev); } static void vhost_test_flush(struct vhost_test *n) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d02173fb290c..1d84a2818c6f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -245,14 +245,6 @@ void vhost_work_dev_flush(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_work_dev_flush); -/* Flush any work that has been scheduled. When calling this, don't hold any - * locks that are also used by the callback. */ -void vhost_poll_flush(struct vhost_poll *poll) -{ - vhost_work_dev_flush(poll->dev); -} -EXPORT_SYMBOL_GPL(vhost_poll_flush); - void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { if (!dev->worker) @@ -663,7 +655,7 @@ void vhost_dev_stop(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) { vhost_poll_stop(>vqs[i]->poll); - vhost_poll_flush(>vqs[i]->poll); + vhost_work_dev_flush(dev->vqs[i]->poll.dev); } } } @@ -1719,7 +1711,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg mutex_unlock(>mutex); if (pollstop && vq->handle_kick) - vhost_poll_flush(>poll); + vhost_work_dev_flush(vq->poll.dev); return r; } EXPORT_SYMBOL_GPL(vhost_vring_ioctl); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 638bb640d6b4..aeb8e1ad1496 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -44,7 +44,6 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, __poll_t mask, struct vhost_dev *dev); int vhost_poll_start(struct vhost_poll *poll, struct file *file); void vhost_poll_stop(struct vhost_poll *poll); -void vhost_poll_flush(struct vhost_poll *poll); void vhost_poll_queue(struct vhost_poll *poll); void vhost_work_dev_flush(struct vhost_dev *dev); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e6c9d41db1de..a4c8ae92a0fb 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -709,7 +709,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock) for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) if (vsock->vqs[i].handle_kick) - vhost_poll_flush(>vqs[i].poll); + vhost_work_dev_flush(vsock->vqs[i].poll.dev); vhost_work_dev_flush(>dev); } -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 4/8] vhost_test: remove vhost_test_flush_vq()
From: Andrey Ryabinin vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush() which seems have no value. It's just easier to call vhost_work_dev_flush() directly. Besides there is no point in obtaining vhost_dev pointer via 'n->vqs[index].poll.dev' while we can just use >dev. It's the same pointers, see vhost_test_open()/vhost_dev_init(). Signed-off-by: Andrey Ryabinin Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Acked-by: Jason Wang --- drivers/vhost/test.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index f0ac9e35f5d6..de39151366c5 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -144,14 +144,9 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep) *privatep = vhost_test_stop_vq(n, n->vqs + VHOST_TEST_VQ); } -static void vhost_test_flush_vq(struct vhost_test *n, int index) -{ - vhost_work_dev_flush(n->vqs[index].poll.dev); -} - static void vhost_test_flush(struct vhost_test *n) { - vhost_test_flush_vq(n, VHOST_TEST_VQ); + vhost_work_dev_flush(>dev); } static int vhost_test_release(struct inode *inode, struct file *f) @@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int test) goto err; if (oldpriv) { - vhost_test_flush_vq(n, index); + vhost_test_flush(n); } } @@ -303,7 +298,7 @@ static long vhost_test_set_backend(struct vhost_test *n, unsigned index, int fd) mutex_unlock(>mutex); if (enable) { - vhost_test_flush_vq(n, index); + vhost_test_flush(n); } mutex_unlock(>dev.mutex); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 0/8] vhost flush cleanups
The following patches made over linus's/mst's tree are Andrey Ryabinin's flush cleanups and some from me. They reduce the number of flush calls and remove some bogus ones where we don't even have a worker running anymore or they were based on outdated or incorrect assumptions. V3: - Fix test.c flush use. V2: - Added patch to rename vhost_work_dev_flush to just vhost_dev_flush to handle review comment from Jason about the naming not being so great. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V3 2/8] vhost: flush dev once during vhost_dev_stop
When vhost_work_dev_flush returns all work queued at that time will have completed. There is then no need to flush after every vhost_poll_stop call, and we can move the flush call to after the loop that stops the pollers. Signed-off-by: Mike Christie Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/vhost.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 1d84a2818c6f..9f8de04bb673 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -653,11 +653,11 @@ void vhost_dev_stop(struct vhost_dev *dev) int i; for (i = 0; i < dev->nvqs; ++i) { - if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) { + if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) vhost_poll_stop(>vqs[i]->poll); - vhost_work_dev_flush(dev->vqs[i]->poll.dev); - } } + + vhost_work_dev_flush(dev); } EXPORT_SYMBOL_GPL(vhost_dev_stop); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device
On Tue, May 17, 2022 at 12:21:03PM +, Parav Pandit wrote: > Hi Greg, Yongji, > > > From: Yongji Xie > > Sent: Tuesday, May 17, 2022 3:25 AM > > > > On Tue, May 17, 2022 at 4:06 AM Parav Pandit wrote: > > > > > > > > > > > > > From: Xie Yongji > > > > Sent: Monday, May 16, 2022 2:04 AM > > > > > > > > Introduce a device object for vdpa management device to control its > > > > lifecycle. > > > Why is this needed? > > > What is the limitation of current approach that requires new device for > > mgmtdev? > > > The primary motivation is missing in the commit log here. > > > > > > > OK, I will add one. This patch actually comes from the discussion: > > > > > > https://www.spinics.net/lists/linux-virtualization/msg56371.html > > > > The "vdpa_mgmt_dev" makes things a little confusing. Embedding the > > device struct in it to control its lifecycle simplifies the logic and makes > > the > > driver model clear. > > > No. it doesn't. > > vdpa_mgmt_dev is really the handle for all the netlink socket messages > targeted. > It is not really a struct device. Why can it not be one? It has lifetime rules that must be followed, so might as well use the built-in code to handle this, right? What is wrong with it being a struct device? > We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing. Where is the "management" device if this is not that? What does "handle" mean here? > And regarding vduse_dev_release() and existing empty release function, they > can be dynamically allocated. > This is because they are really the struct device. I do not understand this statement, sorry. thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 4/8] vhost_test: remove vhost_test_flush_vq()
On 5/17/22 9:11 AM, Stefano Garzarella wrote: >> >> static int vhost_test_release(struct inode *inode, struct file *f) >> @@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int >> test) >> goto err; >> >> if (oldpriv) { >> - vhost_test_flush_vq(n, index); >> + vhost_test_flush(n, index); > ^ > Should we remove the `index` parameter? You are right. I'll fix up and repost. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time
On 5/17/22 1:54 AM, Will Deacon wrote: > On Fri, May 13, 2022 at 04:32:53PM -0700, Srivatsa S. Bhat wrote: >> On 5/13/22 10:46 AM, Elliot Berman wrote: >>> From: Prakruthi Deepak Heragu >>> [...] >>> static int stolen_time_cpu_down_prepare(unsigned int cpu) >>> { >>> + struct pvclock_vcpu_stolen_time *kaddr = NULL; >>> struct pv_time_stolen_time_region *reg; >>> >>> reg = this_cpu_ptr(_time_region); >>> if (!reg->kaddr) >>> return 0; >>> >>> - memunmap(reg->kaddr); >>> - memset(reg, 0, sizeof(*reg)); >>> + kaddr = rcu_replace_pointer(reg->kaddr, NULL, true); >>> + synchronize_rcu(); >>> + memunmap(kaddr); >>> >> >> The original code used to memset the stolen time region, but this >> patch seems to drop it. Was that change intentional? > > 'struct pv_time_stolen_time_region' only has one field ('kaddr'), which > we're now clearing with rcu_replace_pointer() so the memset doesn't make > sense. > Ah right, never mind :) Thank you! Regards, Srivatsa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v1 0/8] virtio/vsock: experimental zerocopy receive
Hi Arseniy, On Thu, May 12, 2022 at 05:04:11AM +, Arseniy Krasnov wrote: INTRODUCTION Hello, this is experimental implementation of virtio vsock zerocopy receive. It was inspired by TCP zerocopy receive by Eric Dumazet. This API uses same idea: call 'mmap()' on socket's descriptor, then every 'getsockopt()' will fill provided vma area with pages of virtio RX buffers. After received data was processed by user, pages must be freed by 'madvise()' call with MADV_DONTNEED flag set(if user won't call 'madvise()', next 'getsockopt()' will fail). Sounds cool, but maybe we would need some socket/net experts here for review. Could we do something similar for the sending path as well? DETAILS Here is how mapping with mapped pages looks exactly: first page mapping contains array of trimmed virtio vsock packet headers (in contains only length of data on the corresponding page and 'flags' field): struct virtio_vsock_usr_hdr { uint32_t length; uint32_t flags; }; Field 'length' allows user to know exact size of payload within each sequence of pages and 'flags' allows user to handle SOCK_SEQPACKET flags(such as message bounds or record bounds). All other pages are data pages from RX queue. Page 0 Page 1 Page N [ hdr1 .. hdrN ][ data ] .. [ data ] || ^ ^ || | | |*---* || || ** Of course, single header could represent array of pages (when packet's buffer is bigger than one page).So here is example of detailed mapping layout for some set of packages. Lets consider that we have the following sequence of packages: 56 bytes, 4096 bytes and 8200 bytes. All pages: 0,1,2,3,4 and 5 will be inserted to user's vma(vma is large enough). Page 0: [[ hdr0 ][ hdr 1 ][ hdr 2 ][ hdr 3 ] ... ] Page 1: [ 56 ] Page 2: [ 4096 ] Page 3: [ 4096 ] Page 4: [ 4096 ] Page 5: [ 8 ] Page 0 contains only array of headers: 'hdr0' has 56 in length field. 'hdr1' has 4096 in length field. 'hdr2' has 8200 in length field. 'hdr3' has 0 in length field(this is end of data marker). Page 1 corresponds to 'hdr0' and has only 56 bytes of data. Page 2 corresponds to 'hdr1' and filled with data. Page 3 corresponds to 'hdr2' and filled with data. Page 4 corresponds to 'hdr2' and filled with data. Page 5 corresponds to 'hdr2' and has only 8 bytes of data. This patchset also changes packets allocation way: today implementation uses only 'kmalloc()' to create data buffer. Problem happens when we try to map such buffers to user's vma - kernel forbids to map slab pages to user's vma(as pages of "not large" 'kmalloc()' allocations are marked with PageSlab flag and "not large" could be bigger than one page). So to avoid this, data buffers now allocated using 'alloc_pages()' call. TESTS This patchset updates 'vsock_test' utility: two tests for new feature were added. First test covers invalid cases. Second checks valid transmission case. Thanks for adding the test! BENCHMARKING For benchmakring I've added small utility 'rx_zerocopy'. It works in client/server mode. When client connects to server, server starts sending exact amount of data to client(amount is set as input argument).Client reads data and waits for next portion of it. Client works in two modes: copy and zero-copy. In copy mode client uses 'read()' call while in zerocopy mode sequence of 'mmap()' /'getsockopt()'/'madvise()' are used. Smaller amount of time for transmission is better. For server, we can set size of tx buffer and for client we can set size of rx buffer or rx mapping size(in zerocopy mode). Usage of this utility is quiet simple: For client mode: ./rx_zerocopy --mode client [--zerocopy] [--rx] For server mode: ./rx_zerocopy --mode server [--mb] [--tx] [--mb] sets number of megabytes to transfer. [--rx] sets size of receive buffer/mapping in pages. [--tx] sets size of transmit buffer in pages. I checked for transmission of 4000mb of data. Here are some results: size of rx/tx buffers in pages *---* |8 |32|64 | 256| 512| *--**--*-*--*--* | zerocopy | 24 | 10.6 | 12.2 | 23.6 |21| secs to *--* process | non-zerocopy | 13 | 16.4 | 24.7 | 27.2 | 23.9 | 4000 mb
Re: [PATCH V2 0/8] vhost flush cleanups
On Sun, May 15, 2022 at 03:29:14PM -0500, Mike Christie wrote: The following patches are Andrey Ryabinin's flush cleanups and some from me. They reduce the number of flush calls and remove some bogus ones where we don't even have a worker running anymore or they were based on outdated or incorrect assumptions. Jason, for the patches you gave me an explicit acked/reviewed tag I added it. For the replies where I answered your review questions and you only replied with an affirimative reply I did not add a tag, because I was not not 100% sure what you wanted to do. These patches will be used in the vhost threading patches which I think will make progress again now that I have Christian's email figured out :) (he had moved to microsoft but I've been using the ubuntu address). I think the patches are ok cleanups in general so I thought they could get merged separately if people agree. V2: - Added patch to rename vhost_work_dev_flush to just vhost_dev_flush to handle review comment from Jason about the naming not being so great. Thank you for this cleanup! I think there is only a small issue to fix in vhost/test.c, otherwise it looks good to me :-) Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 8/8] vhost: rename vhost_work_dev_flush
On Sun, May 15, 2022 at 03:29:22PM -0500, Mike Christie wrote: This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie --- drivers/vhost/net.c | 4 ++-- drivers/vhost/scsi.c | 2 +- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 10 +- drivers/vhost/vhost.h | 2 +- drivers/vhost/vsock.c | 2 +- 6 files changed, 11 insertions(+), 11 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 4/8] vhost_test: remove vhost_test_flush_vq()
On Sun, May 15, 2022 at 03:29:18PM -0500, Mike Christie wrote: From: Andrey Ryabinin vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush() which seems have no value. It's just easier to call vhost_work_dev_flush() directly. Besides there is no point in obtaining vhost_dev pointer via 'n->vqs[index].poll.dev' while we can just use >dev. It's the same pointers, see vhost_test_open()/vhost_dev_init(). Signed-off-by: Andrey Ryabinin Signed-off-by: Mike Christie --- drivers/vhost/test.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index f0ac9e35f5d6..837148d0a6a8 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -144,14 +144,9 @@ static void vhost_test_stop(struct vhost_test *n, void **privatep) *privatep = vhost_test_stop_vq(n, n->vqs + VHOST_TEST_VQ); } -static void vhost_test_flush_vq(struct vhost_test *n, int index) -{ - vhost_work_dev_flush(n->vqs[index].poll.dev); -} - static void vhost_test_flush(struct vhost_test *n) { - vhost_test_flush_vq(n, VHOST_TEST_VQ); + vhost_work_dev_flush(>dev); } static int vhost_test_release(struct inode *inode, struct file *f) @@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int test) goto err; if (oldpriv) { - vhost_test_flush_vq(n, index); + vhost_test_flush(n, index); ^ Should we remove the `index` parameter? } } @@ -303,7 +298,7 @@ static long vhost_test_set_backend(struct vhost_test *n, unsigned index, int fd) mutex_unlock(>mutex); if (enable) { - vhost_test_flush_vq(n, index); + vhost_test_flush(n); } mutex_unlock(>dev.mutex); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 3/8] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls
On Sun, May 15, 2022 at 03:29:17PM -0500, Mike Christie wrote: From: Andrey Ryabinin vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing vhost_dev pointer obtained via 'n->poll[index].dev' and 'n->vqs[index].vq.poll.dev'. This is actually the same pointer, initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init() Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly. Do the flushes only once instead of several flush calls in a row which seems rather useless. Signed-off-by: Andrey Ryabinin [drop vhost_dev forward declaration in vhost.h] Signed-off-by: Mike Christie --- drivers/vhost/net.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time
On Fri, 13 May 2022 10:46:54 -0700, Elliot Berman wrote: > From: Prakruthi Deepak Heragu > > During hotplug, the stolen time data structure is unmapped and memset. > There is a possibility of the timer IRQ being triggered before memset > and stolen time is getting updated as part of this timer IRQ handler. This > causes the below crash in timer handler - > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64: paravirt: Use RCU read locks to guard stolen_time https://git.kernel.org/arm64/c/19bef63f951e Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 2/8] vhost: flush dev once during vhost_dev_stop
On Sun, May 15, 2022 at 03:29:16PM -0500, Mike Christie wrote: When vhost_work_dev_flush returns all work queued at that time will have completed. There is then no need to flush after every vhost_poll_stop call, and we can move the flush call to after the loop that stops the pollers. Signed-off-by: Mike Christie --- drivers/vhost/vhost.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2 1/8] vhost: get rid of vhost_poll_flush() wrapper
On Sun, May 15, 2022 at 03:29:15PM -0500, Mike Christie wrote: From: Andrey Ryabinin vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie --- drivers/vhost/net.c | 4 ++-- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 12 ++-- drivers/vhost/vhost.h | 1 - drivers/vhost/vsock.c | 2 +- 5 files changed, 6 insertions(+), 15 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device
Hi Greg, Yongji, > From: Yongji Xie > Sent: Tuesday, May 17, 2022 3:25 AM > > On Tue, May 17, 2022 at 4:06 AM Parav Pandit wrote: > > > > > > > > > From: Xie Yongji > > > Sent: Monday, May 16, 2022 2:04 AM > > > > > > Introduce a device object for vdpa management device to control its > > > lifecycle. > > Why is this needed? > > What is the limitation of current approach that requires new device for > mgmtdev? > > The primary motivation is missing in the commit log here. > > > > OK, I will add one. This patch actually comes from the discussion: > > > https://www.spinics.net/lists/linux-virtualization/msg56371.html > > The "vdpa_mgmt_dev" makes things a little confusing. Embedding the > device struct in it to control its lifecycle simplifies the logic and makes > the > driver model clear. > No. it doesn't. vdpa_mgmt_dev is really the handle for all the netlink socket messages targeted. It is not really a struct device. We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing. And regarding vduse_dev_release() and existing empty release function, they can be dynamically allocated. This is because they are really the struct device. > > > And the device name will be used to match > VDPA_ATTR_MGMTDEV_DEV_NAME > > > field of netlink message rather than using parent device name. > > > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc() or > > > _vdpa_mgmtdev_alloc() to allocate a vDPA management device before > > > calling vdpa_mgmtdev_register(). And some buggy empty release > > > function can also be removed from the driver codes. > > What is the bug, can you please explain? > > I'm not sure if "buggy" is the right word. But the empty release function > should be removed as mentioned in Documentation/core-api/kobject.rst. > Sure. So, struct device in the vdpa_sim and vduse can allocate the struct device using kzalloc() and release callback can free it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time
On Fri, May 13, 2022 at 04:32:53PM -0700, Srivatsa S. Bhat wrote: > On 5/13/22 10:46 AM, Elliot Berman wrote: > > From: Prakruthi Deepak Heragu > > > > During hotplug, the stolen time data structure is unmapped and memset. > > There is a possibility of the timer IRQ being triggered before memset > > and stolen time is getting updated as part of this timer IRQ handler. This > > causes the below crash in timer handler - > > > > [ 3457.473139][C5] Unable to handle kernel paging request at virtual > > address ffc03df05148 > > ... > > [ 3458.154398][C5] Call trace: > > [ 3458.157648][C5] para_steal_clock+0x30/0x50 > > [ 3458.162319][C5] irqtime_account_process_tick+0x30/0x194 > > [ 3458.168148][C5] account_process_tick+0x3c/0x280 > > [ 3458.173274][C5] update_process_times+0x5c/0xf4 > > [ 3458.178311][C5] tick_sched_timer+0x180/0x384 > > [ 3458.183164][C5] __run_hrtimer+0x160/0x57c > > [ 3458.187744][C5] hrtimer_interrupt+0x258/0x684 > > [ 3458.192698][C5] arch_timer_handler_virt+0x5c/0xa0 > > [ 3458.198002][C5] handle_percpu_devid_irq+0xdc/0x414 > > [ 3458.203385][C5] handle_domain_irq+0xa8/0x168 > > [ 3458.208241][C5] gic_handle_irq.34493+0x54/0x244 > > [ 3458.213359][C5] call_on_irq_stack+0x40/0x70 > > [ 3458.218125][C5] do_interrupt_handler+0x60/0x9c > > [ 3458.223156][C5] el1_interrupt+0x34/0x64 > > [ 3458.227560][C5] el1h_64_irq_handler+0x1c/0x2c > > [ 3458.232503][C5] el1h_64_irq+0x7c/0x80 > > [ 3458.236736][C5] free_vmap_area_noflush+0x108/0x39c > > [ 3458.242126][C5] remove_vm_area+0xbc/0x118 > > [ 3458.246714][C5] vm_remove_mappings+0x48/0x2a4 > > [ 3458.251656][C5] __vunmap+0x154/0x278 > > [ 3458.255796][C5] stolen_time_cpu_down_prepare+0xc0/0xd8 > > [ 3458.261542][C5] cpuhp_invoke_callback+0x248/0xc34 > > [ 3458.266842][C5] cpuhp_thread_fun+0x1c4/0x248 > > [ 3458.271696][C5] smpboot_thread_fn+0x1b0/0x400 > > [ 3458.276638][C5] kthread+0x17c/0x1e0 > > [ 3458.280691][C5] ret_from_fork+0x10/0x20 > > > > As a fix, introduce rcu lock to update stolen time structure. > > > > Suggested-by: Will Deacon > > Signed-off-by: Prakruthi Deepak Heragu > > Signed-off-by: Elliot Berman > > --- > > Looks good to me, but one quick question though (see below). > > Reviewed-by: Srivatsa S. Bhat (VMware) Cheers. > > static int stolen_time_cpu_down_prepare(unsigned int cpu) > > { > > + struct pvclock_vcpu_stolen_time *kaddr = NULL; > > struct pv_time_stolen_time_region *reg; > > > > reg = this_cpu_ptr(_time_region); > > if (!reg->kaddr) > > return 0; > > > > - memunmap(reg->kaddr); > > - memset(reg, 0, sizeof(*reg)); > > + kaddr = rcu_replace_pointer(reg->kaddr, NULL, true); > > + synchronize_rcu(); > > + memunmap(kaddr); > > > > The original code used to memset the stolen time region, but this > patch seems to drop it. Was that change intentional? 'struct pv_time_stolen_time_region' only has one field ('kaddr'), which we're now clearing with rcu_replace_pointer() so the memset doesn't make sense. Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device
On Tue, May 17, 2022 at 02:29:38PM +0800, Jason Wang wrote: > On Mon, May 16, 2022 at 7:54 PM Greg KH wrote: > > > > On Mon, May 16, 2022 at 06:34:36AM -0400, Michael S. Tsirkin wrote: > > > On Mon, May 16, 2022 at 06:31:18PM +0800, Yongji Xie wrote: > > > > On Mon, May 16, 2022 at 5:54 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote: > > > > > > On Mon, May 16, 2022 at 5:14 PM Jason Wang > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > 在 2022/5/16 14:03, Xie Yongji 写道: > > > > > > > > Introduce a device object for vdpa management device to control > > > > > > > > its lifecycle. And the device name will be used to match > > > > > > > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than > > > > > > > > using parent device name. > > > > > > > > > > > > > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc() > > > > > > > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device > > > > > > > > before calling vdpa_mgmtdev_register(). And some buggy empty > > > > > > > > release function can also be removed from the driver codes. > > > > > > > > > > > > > > > > Signed-off-by: Xie Yongji > > > > > > > > --- > > > > > > > > drivers/vdpa/ifcvf/ifcvf_main.c | 11 ++-- > > > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++-- > > > > > > > > drivers/vdpa/vdpa.c | 92 > > > > > > > > > > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 > > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +- > > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 38 > > > > > > > > include/linux/vdpa.h | 38 +++- > > > > > > > > 7 files changed, 168 insertions(+), 107 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > > b/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > > index 4366320fb68d..d4087c37cfdf 100644 > > > > > > > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev > > > > > > > > *pdev, const struct pci_device_id *id) > > > > > > > > u32 dev_type; > > > > > > > > int ret; > > > > > > > > > > > > > > > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct > > > > > > > > ifcvf_vdpa_mgmt_dev), GFP_KERNEL); > > > > > > > > - if (!ifcvf_mgmt_dev) { > > > > > > > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct > > > > > > > > ifcvf_vdpa_mgmt_dev, > > > > > > > > + mdev, dev_name(dev), > > > > > > > > dev); > > > > > > > > > > > > > > > > > > > > > Just wonder if it's better to make vDPA device a child of the mgmt > > > > > > > device instead of the PCI device? > > > > > > > > > > > > > > (Currently we use PCI device as the parent of the vDPA device, or > > > > > > > at > > > > > > > least we can do this for the simulator which doesn't have a > > > > > > > parent?) > > > > > > > > > > > > > > > > > > > Make sense. I think we can do it for all vDPA drivers. Make sure the > > > > > > parent of the vDPA device is the vDPA management device. > > > > > > > > > > > > Thanks, > > > > > > Yongji > > > > > > > > > > > > > > > that's an ABI change though isn't it? parent is exposed in sysfs, > > > > > right? > > > > > > > > > > > > > Hmm...yes. So it looks like we can't change it, right? > > > > > > > > Thanks, > > > > Yongji > > > > > > Afraid so. a way to find the pci device already exists I think, right? > > > > You can change it, any tools should be going through the bus/device > > links, not walking the sysfs tree directly, right? That's what those > > symlinks are for, in order to properly be able to enumerate different > > device types. > > > > A specific topology in sysfs should not ever be assumed to be static > > over time, that's not an accurate representation of how the kernel > > works. > > > > So try it and see? Odds are there are no tools that even care about > > these devices, right? Or is there? > > I think there's no tool that depends on the sysfs tree now. The > management layer is only expected to talk to the management device via > vdpa(8) which is integrated in iproute2. Great, then change away! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.
On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko wrote: > > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6. > Technically they enable NETIF_F_GSO_UDP_L4 > (and only if USO4 & USO6 are set simultaneously). > It allows to transmission of large UDP packets. > > Different features USO4 and USO6 are required for qemu where Windows guests > can > enable disable USO receives for IPv4 and IPv6 separately. > On the other side, Linux can't really differentiate USO4 and USO6, for now. > For now, to enable USO for TUN it requires enabling USO4 and USO6 together. > In the future, there would be a mechanism to control UDP_L4 GSO separately. > > Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2 > > New types for VirtioNet already on mailing: > https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html > > Also, there is a known issue with transmitting packages between two guests. Could you explain this more? It looks like a bug. (Or any pointer to the discussion) Thanks > Without hacks with skb's GSO - packages are still segmented on the host's > postrouting. > > Andrew (5): > uapi/linux/if_tun.h: Added new offload types for USO4/6. > driver/net/tun: Added features for USO. > uapi/linux/virtio_net.h: Added USO types. > linux/virtio_net.h: Support USO offload in vnet header. > drivers/net/virtio_net.c: Added USO support. > > drivers/net/tap.c | 10 -- > drivers/net/tun.c | 8 +++- > drivers/net/virtio_net.c| 19 +++ > include/linux/virtio_net.h | 9 + > include/uapi/linux/if_tun.h | 2 ++ > include/uapi/linux/virtio_net.h | 4 > 6 files changed, 45 insertions(+), 7 deletions(-) > > -- > 2.35.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device
On Mon, May 16, 2022 at 7:54 PM Greg KH wrote: > > On Mon, May 16, 2022 at 06:34:36AM -0400, Michael S. Tsirkin wrote: > > On Mon, May 16, 2022 at 06:31:18PM +0800, Yongji Xie wrote: > > > On Mon, May 16, 2022 at 5:54 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote: > > > > > On Mon, May 16, 2022 at 5:14 PM Jason Wang > > > > > wrote: > > > > > > > > > > > > > > > > > > 在 2022/5/16 14:03, Xie Yongji 写道: > > > > > > > Introduce a device object for vdpa management device to control > > > > > > > its lifecycle. And the device name will be used to match > > > > > > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than > > > > > > > using parent device name. > > > > > > > > > > > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc() > > > > > > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device > > > > > > > before calling vdpa_mgmtdev_register(). And some buggy empty > > > > > > > release function can also be removed from the driver codes. > > > > > > > > > > > > > > Signed-off-by: Xie Yongji > > > > > > > --- > > > > > > > drivers/vdpa/ifcvf/ifcvf_main.c | 11 ++-- > > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++-- > > > > > > > drivers/vdpa/vdpa.c | 92 > > > > > > > > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 > > > > > > > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +- > > > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 38 > > > > > > > include/linux/vdpa.h | 38 +++- > > > > > > > 7 files changed, 168 insertions(+), 107 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > b/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > index 4366320fb68d..d4087c37cfdf 100644 > > > > > > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c > > > > > > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev > > > > > > > *pdev, const struct pci_device_id *id) > > > > > > > u32 dev_type; > > > > > > > int ret; > > > > > > > > > > > > > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct > > > > > > > ifcvf_vdpa_mgmt_dev), GFP_KERNEL); > > > > > > > - if (!ifcvf_mgmt_dev) { > > > > > > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct > > > > > > > ifcvf_vdpa_mgmt_dev, > > > > > > > + mdev, dev_name(dev), > > > > > > > dev); > > > > > > > > > > > > > > > > > > Just wonder if it's better to make vDPA device a child of the mgmt > > > > > > device instead of the PCI device? > > > > > > > > > > > > (Currently we use PCI device as the parent of the vDPA device, or at > > > > > > least we can do this for the simulator which doesn't have a parent?) > > > > > > > > > > > > > > > > Make sense. I think we can do it for all vDPA drivers. Make sure the > > > > > parent of the vDPA device is the vDPA management device. > > > > > > > > > > Thanks, > > > > > Yongji > > > > > > > > > > > > that's an ABI change though isn't it? parent is exposed in sysfs, > > > > right? > > > > > > > > > > Hmm...yes. So it looks like we can't change it, right? > > > > > > Thanks, > > > Yongji > > > > Afraid so. a way to find the pci device already exists I think, right? > > You can change it, any tools should be going through the bus/device > links, not walking the sysfs tree directly, right? That's what those > symlinks are for, in order to properly be able to enumerate different > device types. > > A specific topology in sysfs should not ever be assumed to be static > over time, that's not an accurate representation of how the kernel > works. > > So try it and see? Odds are there are no tools that even care about > these devices, right? Or is there? I think there's no tool that depends on the sysfs tree now. The management layer is only expected to talk to the management device via vdpa(8) which is integrated in iproute2. Thanks > > thanks, > > greg k-h > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization