Re: [PATCH] vhost_net: fix double fget()

2022-05-17 Thread Michael S. Tsirkin
On Tue, May 17, 2022 at 10:00:03PM +, Al Viro wrote:
> On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote:
> > > Signed-off-by: Al Viro 
> > > Signed-off-by: Jason Wang 
> > 
> > Acked-by: Michael S. Tsirkin 
> > 
> > and this is stable material I guess.
> 
> It is, except that commit message ought to be cleaned up.  Something
> along the lines of
> 
> 
> Fix double fget() in vhost_net_set_backend()
> 
> Descriptor table is a shared resource; two fget() on the same descriptor
> may return different struct file references.  get_tap_ptr_ring() is
> called after we'd found (and pinned) the socket we'll be using and it
> tries to find the private tun/tap data structures associated with it.
> Redoing the lookup by the same file descriptor we'd used to get the
> socket is racy - we need to same struct file.
> 
> Thanks to Jason for spotting a braino in the original variant of patch -
> I'd missed the use of fd == -1 for disabling backend, and in that case
> we can end up with sock == NULL and sock != oldsock.
> 
> 
> Does the above sound sane for commit message?  And which tree would you
> prefer it to go through?  I can take it in vfs.git#fixes, or you could
> take it into your tree...

Acked-by: Michael S. Tsirkin 
for the new message and merging through your tree.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost_net: fix double fget()

2022-05-17 Thread Jason Wang
On Wed, May 18, 2022 at 6:00 AM Al Viro  wrote:
>
> On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote:
> > > Signed-off-by: Al Viro 
> > > Signed-off-by: Jason Wang 
> >
> > Acked-by: Michael S. Tsirkin 
> >
> > and this is stable material I guess.
>
> It is, except that commit message ought to be cleaned up.  Something
> along the lines of
>
> 
> Fix double fget() in vhost_net_set_backend()
>
> Descriptor table is a shared resource; two fget() on the same descriptor
> may return different struct file references.  get_tap_ptr_ring() is
> called after we'd found (and pinned) the socket we'll be using and it
> tries to find the private tun/tap data structures associated with it.
> Redoing the lookup by the same file descriptor we'd used to get the
> socket is racy - we need to same struct file.
>
> Thanks to Jason for spotting a braino in the original variant of patch -
> I'd missed the use of fd == -1 for disabling backend, and in that case
> we can end up with sock == NULL and sock != oldsock.
> 
>
> Does the above sound sane for commit message?

Yes.

> And which tree would you
> prefer it to go through?  I can take it in vfs.git#fixes, or you could
> take it into your tree...
>

Consider Michael gave an ack, it would be fine if you want to take via
your tree.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 9/9] virtio: use WARN_ON() to warn illegal status value

2022-05-17 Thread Jason Wang
We used to use BUG_ON() in virtio_device_ready() to detect illegal
status value, this seems sub-optimal since the value is under the
control of the device. Switch to use WARN_ON() instead.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jason Wang 
---
 include/linux/virtio_config.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index d4edfd7d91bb..9a36051ceb76 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev)
 {
unsigned status = dev->config->get_status(dev);
 
-   BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
+   WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
 
/*
 * The virtio_synchronize_cbs() makes sure vring_interrupt()
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 8/9] virtio: harden vring IRQ

2022-05-17 Thread Jason Wang
This is a rework on the previous IRQ hardening that is done for
virtio-pci where several drawbacks were found and were reverted:

1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
   that is used by some device such as virtio-blk
2) done only for PCI transport

The vq->broken is re-used in this patch for implementing the IRQ
hardening. The vq->broken is set to true during both initialization
and reset. And the vq->broken is set to false in
virtio_device_ready(). Then vring_interrupt() can check and return
when vq->broken is true. And in this case, switch to return IRQ_NONE
to let the interrupt core aware of such invalid interrupt to prevent
IRQ storm.

The reason of using a per queue variable instead of a per device one
is that we may need it for per queue reset hardening in the future.

Note that the hardening is only done for vring interrupt since the
config interrupt hardening is already done in commit 22b7050a024d7
("virtio: defer config changed notifications"). But the method that is
used by config interrupt can't be reused by the vring interrupt
handler because it uses spinlock to do the synchronization which is
expensive.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jason Wang 
---
 drivers/s390/virtio/virtio_ccw.c   |  4 
 drivers/virtio/virtio.c| 15 ---
 drivers/virtio/virtio_mmio.c   |  5 +
 drivers/virtio/virtio_pci_modern_dev.c |  5 +
 drivers/virtio/virtio_ring.c   | 11 +++
 include/linux/virtio_config.h  | 20 
 6 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 22d36594bcdd..6f4c83c6c9a7 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -971,6 +971,10 @@ static void virtio_ccw_set_status(struct virtio_device 
*vdev, u8 status)
ccw->flags = 0;
ccw->count = sizeof(status);
ccw->cda = (__u32)(unsigned long)>dma_area->status;
+   /* We use ssch for setting the status which is a serializing
+* instruction that guarantees the memory writes have
+* completed before ssch.
+*/
ret = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_WRITE_STATUS);
/* Write failed? We assume status is unchanged. */
if (ret)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 8dde44ea044a..1053f59d0a25 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -220,6 +220,15 @@ static int virtio_features_ok(struct virtio_device *dev)
  * */
 void virtio_reset_device(struct virtio_device *dev)
 {
+   /*
+* The below virtio_synchronize_cbs() guarantees that any
+* interrupt for this line arriving after
+* virtio_synchronize_vqs() has completed is guaranteed to see
+* vq->broken as true.
+*/
+   virtio_break_device(dev);
+   virtio_synchronize_cbs(dev);
+
dev->config->reset(dev);
 }
 EXPORT_SYMBOL_GPL(virtio_reset_device);
@@ -428,6 +437,9 @@ int register_virtio_device(struct virtio_device *dev)
dev->config_enabled = false;
dev->config_change_pending = false;
 
+   INIT_LIST_HEAD(>vqs);
+   spin_lock_init(>vqs_list_lock);
+
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
virtio_reset_device(dev);
@@ -435,9 +447,6 @@ int register_virtio_device(struct virtio_device *dev)
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
-   INIT_LIST_HEAD(>vqs);
-   spin_lock_init(>vqs_list_lock);
-
/*
 * device_add() causes the bus infrastructure to look for a matching
 * driver.
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 4a3b66e4e198..11c137526d1d 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -253,6 +253,11 @@ static void vm_set_status(struct virtio_device *vdev, u8 
status)
/* We should never be setting status to 0. */
BUG_ON(status == 0);
 
+   /*
+* Per memory-barriers.txt, wmb() is not needed to guarantee
+* that the the cache coherent memory writes have completed
+* before writing to the MMIO region.
+*/
writel(status, vm_dev->base + VIRTIO_MMIO_STATUS);
 }
 
diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index 591738ad3d56..91c9c0412730 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -466,6 +466,11 @@ void vp_modern_set_status(struct virtio_pci_modern_device 
*mdev,
 {
struct virtio_pci_common_cfg __iomem *cfg = 

[PATCH V5 7/9] virtio: allow to unbreak virtqueue

2022-05-17 Thread Jason Wang
This patch allows the new introduced __virtio_break_device() to
unbreak the virtqueue.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 21 +
 include/linux/virtio.h   |  1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238e..5b7df7c455f0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2397,6 +2397,27 @@ void virtio_break_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
 
+/*
+ * This should allow the device to be used by the driver. You may
+ * need to grab appropriate locks to flush. This should only be used
+ * in some specific case e.g (probing and restoring). Driver should
+ * not call this directly.
+ */
+void __virtio_unbreak_device(struct virtio_device *dev)
+{
+   struct virtqueue *_vq;
+
+   spin_lock(>vqs_list_lock);
+   list_for_each_entry(_vq, >vqs, list) {
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+   WRITE_ONCE(vq->broken, false);
+   }
+   spin_unlock(>vqs_list_lock);
+}
+EXPORT_SYMBOL_GPL(__virtio_unbreak_device);
+
 dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5464f398912a..d8fdf170637c 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev);
 bool is_virtio_device(struct device *dev);
 
 void virtio_break_device(struct virtio_device *dev);
+void __virtio_unbreak_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 6/9] virtio-ccw: implement synchronize_cbs()

2022-05-17 Thread Jason Wang
This patch tries to implement the synchronize_cbs() for ccw. For the
vring_interrupt() that is called via virtio_airq_handler(), the
synchronization is simply done via the airq_info's lock. For the
vring_interrupt() that is called via virtio_ccw_int_handler(), a per
device rwlock is introduced ans used in the synchronization method.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Signed-off-by: Jason Wang 
---
 drivers/s390/virtio/virtio_ccw.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..22d36594bcdd 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -62,6 +62,7 @@ struct virtio_ccw_device {
unsigned int revision; /* Transport revision */
wait_queue_head_t wait_q;
spinlock_t lock;
+   rwlock_t irq_lock;
struct mutex io_lock; /* Serializes I/O requests */
struct list_head virtqueues;
bool is_thinint;
@@ -984,6 +985,27 @@ static const char *virtio_ccw_bus_name(struct 
virtio_device *vdev)
return dev_name(>cdev->dev);
 }
 
+static void virtio_ccw_synchronize_cbs(struct virtio_device *vdev)
+{
+   struct virtio_ccw_device *vcdev = to_vc_device(vdev);
+   struct airq_info *info = vcdev->airq_info;
+
+   if (info) {
+   /*
+* Synchronize with the vring_interrupt() with airq indicator
+*/
+   write_lock_irq(>lock);
+   write_unlock_irq(>lock);
+   } else {
+   /*
+* Synchronize with the vring_interrupt() called by
+* virtio_ccw_int_handler().
+*/
+   write_lock_irq(>irq_lock);
+   write_unlock_irq(>irq_lock);
+   }
+}
+
 static const struct virtio_config_ops virtio_ccw_config_ops = {
.get_features = virtio_ccw_get_features,
.finalize_features = virtio_ccw_finalize_features,
@@ -995,6 +1017,7 @@ static const struct virtio_config_ops 
virtio_ccw_config_ops = {
.find_vqs = virtio_ccw_find_vqs,
.del_vqs = virtio_ccw_del_vqs,
.bus_name = virtio_ccw_bus_name,
+   .synchronize_cbs = virtio_ccw_synchronize_cbs,
 };
 
 
@@ -1106,6 +1129,8 @@ static void virtio_ccw_int_handler(struct ccw_device 
*cdev,
vcdev->err = -EIO;
}
virtio_ccw_check_activity(vcdev, activity);
+   /* Local interrupt should be disabled at this time */
+   read_lock(>irq_lock);
for_each_set_bit(i, indicators(vcdev),
 sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
/* The bit clear must happen before the vring kick. */
@@ -1114,6 +1139,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
*cdev,
vq = virtio_ccw_vq_by_ind(vcdev, i);
vring_interrupt(0, vq);
}
+   read_unlock(>irq_lock);
if (test_bit(0, indicators2(vcdev))) {
virtio_config_changed(>vdev);
clear_bit(0, indicators2(vcdev));
@@ -1284,6 +1310,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
init_waitqueue_head(>wait_q);
INIT_LIST_HEAD(>virtqueues);
spin_lock_init(>lock);
+   rwlock_init(>irq_lock);
mutex_init(>io_lock);
 
spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 5/9] virtio-mmio: implement synchronize_cbs()

2022-05-17 Thread Jason Wang
Simply synchronize the platform irq that is used by us.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_mmio.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 56128b9c46eb..4a3b66e4e198 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -345,6 +345,14 @@ static void vm_del_vqs(struct virtio_device *vdev)
free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
 }
 
+
+static void vm_synchronize_cbs(struct virtio_device *vdev)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+   synchronize_irq(platform_get_irq(vm_dev->pdev, 0));
+}
+
 static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned 
index,
  void (*callback)(struct virtqueue *vq),
  const char *name, bool ctx)
@@ -541,6 +549,7 @@ static const struct virtio_config_ops 
virtio_mmio_config_ops = {
.finalize_features = vm_finalize_features,
.bus_name   = vm_bus_name,
.get_shm_region = vm_get_shm_region,
+   .synchronize_cbs = vm_synchronize_cbs,
 };
 
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 4/9] virtio-pci: implement synchronize_cbs()

2022-05-17 Thread Jason Wang
We can simply reuse vp_synchronize_vectors() for .synchronize_cbs().

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio_pci_legacy.c | 1 +
 drivers/virtio/virtio_pci_modern.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 6f4e34ce96b8..207985107150 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -192,6 +192,7 @@ static const struct virtio_config_ops virtio_pci_config_ops 
= {
.reset  = vp_reset,
.find_vqs   = vp_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_cbs = vp_synchronize_vectors,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index a2671a20ef77..18c2190e3059 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -394,6 +394,7 @@ static const struct virtio_config_ops 
virtio_pci_config_nodev_ops = {
.reset  = vp_reset,
.find_vqs   = vp_modern_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_cbs = vp_synchronize_vectors,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
@@ -411,6 +412,7 @@ static const struct virtio_config_ops virtio_pci_config_ops 
= {
.reset  = vp_reset,
.find_vqs   = vp_modern_find_vqs,
.del_vqs= vp_del_vqs,
+   .synchronize_cbs = vp_synchronize_vectors,
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 3/9] virtio: introduce config op to synchronize vring callbacks

2022-05-17 Thread Jason Wang
This patch introduces new virtio config op to vring
callbacks. Transport specific method is required to make sure the
write before this function is visible to the vring_interrupt() that is
called after the return of this function. For the transport that
doesn't provide synchronize_vqs(), use synchornize_rcu() which
synchronize with IRQ implicitly as a fallback.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
---
 include/linux/virtio_config.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b341dd62aa4d..25be018810a7 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -57,6 +57,11 @@ struct virtio_shm_region {
  * include a NULL entry for vqs unused by driver
  * Returns 0 on success or error status
  * @del_vqs: free virtqueues found by find_vqs().
+ * @synchronize_cbs: synchronize with the virtqueue callbacks (optional)
+ *  The function guarantees that all memory operations on the
+ *  queue before it are visible to the vring_interrupt() that is
+ *  called after it.
+ *  vdev: the virtio_device
  * @get_features: get the array of feature bits for this device.
  * vdev: the virtio_device
  * Returns the first 64 feature bits (all we currently need).
@@ -89,6 +94,7 @@ struct virtio_config_ops {
const char * const names[], const bool *ctx,
struct irq_affinity *desc);
void (*del_vqs)(struct virtio_device *);
+   void (*synchronize_cbs)(struct virtio_device *);
u64 (*get_features)(struct virtio_device *vdev);
int (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
@@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
unsigned nvqs,
  desc);
 }
 
+/**
+ * virtio_synchronize_cbs - synchronize with virtqueue callbacks
+ * @vdev: the device
+ */
+static inline
+void virtio_synchronize_cbs(struct virtio_device *dev)
+{
+   if (dev->config->synchronize_cbs) {
+   dev->config->synchronize_cbs(dev);
+   } else {
+   /*
+* A best effort fallback to synchronize with
+* interrupts, preemption and softirq disabled
+* regions. See comment above synchronize_rcu().
+*/
+   synchronize_rcu();
+   }
+}
+
 /**
  * virtio_device_ready - enable vq use in probe function
  * @vdev: the device
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 2/9] virtio: use virtio_reset_device() when possible

2022-05-17 Thread Jason Wang
This allows us to do common extension without duplicating code.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 75c8d560bbd3..8dde44ea044a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -430,7 +430,7 @@ int register_virtio_device(struct virtio_device *dev)
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
-   dev->config->reset(dev);
+   virtio_reset_device(dev);
 
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -496,7 +496,7 @@ int virtio_device_restore(struct virtio_device *dev)
 
/* We always start by resetting the device, in case a previous
 * driver messed it up. */
-   dev->config->reset(dev);
+   virtio_reset_device(dev);
 
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 1/9] virtio: use virtio_device_ready() in virtio_device_restore()

2022-05-17 Thread Jason Wang
From: Stefano Garzarella 

It will allow us to do extension on virtio_device_ready() without
duplicating code.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: "Paul E. McKenney" 
Cc: Marc Zyngier 
Cc: Halil Pasic 
Cc: Cornelia Huck 
Cc: Vineeth Vijayan 
Cc: Peter Oberparleiter 
Cc: linux-s...@vger.kernel.org
Reviewed-by: Cornelia Huck 
Signed-off-by: Stefano Garzarella 
Signed-off-by: Jason Wang 
---
 drivers/virtio/virtio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 22f15f444f75..75c8d560bbd3 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -526,8 +526,9 @@ int virtio_device_restore(struct virtio_device *dev)
goto err;
}
 
-   /* Finally, tell the device we're all set */
-   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+   /* If restore didn't do it, mark device DRIVER_OK ourselves. */
+   if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
+   virtio_device_ready(dev);
 
virtio_config_enable(dev);
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5 0/9] rework on the IRQ hardening of virtio

2022-05-17 Thread Jason Wang
Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.

Note that, I only did compile test on ccw and MMIO transport.

Please review.

Changes since V4:

- use spin_lock_irq()/spin_unlock_irq() to synchronize with
  vring_interrupt() for ccw
- use spin_lock()/spin_unlock() to protect vring_interrupt() for non
  airq
- add comment to explain the ordering implications of set_status() for
  PCI, ccw and MMIO
- various tweaks on the comments and changelogs

Changes since V3:

- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
  document it should be only used for probing
- switch to use rwlock to synchornize the non airq for ccw

Changes since V2:

- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
  attributes for the future virtqueue reset support
  - remove unnecssary READ_ONCE()/WRITE_ONCE()
  - a new patch to remove device triggerable BUG_ON()
  - more tweaks on the comments

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Jason Wang (8):
  virtio: use virtio_reset_device() when possible
  virtio: introduce config op to synchronize vring callbacks
  virtio-pci: implement synchronize_cbs()
  virtio-mmio: implement synchronize_cbs()
  virtio-ccw: implement synchronize_cbs()
  virtio: allow to unbreak virtqueue
  virtio: harden vring IRQ
  virtio: use WARN_ON() to warning illegal status value

Stefano Garzarella (1):
  virtio: use virtio_device_ready() in virtio_device_restore()

 drivers/s390/virtio/virtio_ccw.c   | 31 +
 drivers/virtio/virtio.c| 24 +
 drivers/virtio/virtio_mmio.c   | 14 
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 drivers/virtio/virtio_pci_modern_dev.c |  5 +++
 drivers/virtio/virtio_ring.c   | 32 +++---
 include/linux/virtio.h |  1 +
 include/linux/virtio_config.h  | 47 +-
 9 files changed, 145 insertions(+), 12 deletions(-)

-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 2/5] vdpa: Add support for querying vendor statistics

2022-05-17 Thread Si-Wei Liu




On 5/17/2022 6:13 AM, Eli Cohen wrote:

Allows to read vendor statistics of a vdpa device. The specific
statistics data are received from the upstream driver in the form of an
(attribute name, attribute value) pairs.

An example of statistics for mlx5_vdpa device are:

received_desc - number of descriptors received by the virtqueue
completed_desc - number of descriptors completed by the virtqueue

A descriptor using indirect buffers is still counted as 1. In addition,
N chained descriptors are counted correctly N times as one would expect.

A new callback was added to vdpa_config_ops which provides the means for
the vdpa driver to return statistics results.

The interface allows for reading all the supported virtqueues, including
the control virtqueue if it exists.

Below are some examples taken from mlx5_vdpa which are introduced in the
following patch:

1. Read statistics for the virtqueue at index 1

$ vdpa dev vstats show vdpa-a qidx 1
vdpa-a:
queue_type tx queue_index 1 received_desc 3844836 completed_desc 3844836

2. Read statistics for the virtqueue at index 32
$ vdpa dev vstats show vdpa-a qidx 32
vdpa-a:
queue_type control_vq queue_index 32 received_desc 62 completed_desc 62

3. Read statisitics for the virtqueue at index 0 with json output
$ vdpa -j dev vstats show vdpa-a qidx 0
{"vstats":{"vdpa-a":{
"queue_type":"rx","queue_index":0,"name":"received_desc","value":417776,\
  "name":"completed_desc","value":417548}}}

4. Read statistics for the virtqueue at index 0 with preety json output
$ vdpa -jp dev vstats show vdpa-a qidx 0
{
 "vstats": {
 "vdpa-a": {

 "queue_type": "rx",
 "queue_index": 0,
 "name": "received_desc",
 "value": 417776,
 "name": "completed_desc",
 "value": 417548
 }
 }
}

Signed-off-by: Eli Cohen 
---
  drivers/vdpa/vdpa.c   | 160 ++
  include/linux/vdpa.h  |   3 +
  include/uapi/linux/vdpa.h |   6 ++
  3 files changed, 169 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index fac89a0d8178..1b810961ccc3 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -914,6 +914,106 @@ vdpa_dev_config_fill(struct vdpa_device *vdev, struct 
sk_buff *msg, u32 portid,
return err;
  }
  
+static int vdpa_fill_stats_rec(struct vdpa_device *vdev, struct sk_buff *msg,

+  struct genl_info *info, u32 index)
+{
+   struct virtio_net_config config = {};
+   u64 features;
+   u16 max_vqp;
+   u8 status;
+   int err;
+
This entire function should perhaps need to take cf_mutex (worth 
converting to rwsem as well) to guard against concurrent modification 
via .set_status() or .set_config() from the upper stack. Similar to 
vdpa_dev_config_fill().



+   status = vdev->config->get_status(vdev);
+   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
+   NL_SET_ERR_MSG_MOD(info->extack, "feature negotiation not 
complete");
+   return -EAGAIN;
+   }
+   vdpa_get_config(vdev, 0, , sizeof(config));

This should use vdpa_get_config_unlocked() without lock.

Regards,
-Siwei


+
+   max_vqp = le16_to_cpu(config.max_virtqueue_pairs);
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
+   return -EMSGSIZE;
+
+   features = vdev->config->get_driver_features(vdev);
+   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES,
+ features, VDPA_ATTR_PAD))
+   return -EMSGSIZE;
+
+   if (nla_put_u32(msg, VDPA_ATTR_DEV_QUEUE_INDEX, index))
+   return -EMSGSIZE;
+
+   err = vdev->config->get_vendor_vq_stats(vdev, index, msg, info->extack);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static int vendor_stats_fill(struct vdpa_device *vdev, struct sk_buff *msg,
+struct genl_info *info, u32 index)
+{
+   int err;
+
+   if (!vdev->config->get_vendor_vq_stats)
+   return -EOPNOTSUPP;
+
+   err = vdpa_fill_stats_rec(vdev, msg, info, index);
+   if (err)
+   return err;
+
+   return 0;
+}
+
+static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev,
+ struct sk_buff *msg,
+ struct genl_info *info, u32 index)
+{
+   u32 device_id;
+   void *hdr;
+   int err;
+   u32 portid = info->snd_portid;
+   u32 seq = info->snd_seq;
+   u32 flags = 0;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags,
+ VDPA_CMD_DEV_VSTATS_GET);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev))) {
+   err = -EMSGSIZE;
+   goto undo_msg;
+   }
+
+   device_id = vdev->config->get_device_id(vdev);
+   if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) {

Re: [PATCH v7 5/5] vdpa/mlx5: Use readers/writers semaphore instead of mutex

2022-05-17 Thread Si-Wei Liu




On 5/17/2022 6:13 AM, Eli Cohen wrote:

Reading statistics could be done intensively and by several processes
concurrently. Reader's lock is sufficient in this case.

Change reslock from mutex to a rwsem.

Suggested-by: Si-Wei Liu 
Signed-off-by: Eli Cohen 

Reviewed-by: Si-Wei Liu 

---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 41 ++-
  1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 2b815ef850c8..57cfc64248b7 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -155,7 +155,7 @@ struct mlx5_vdpa_net {
 * since memory map might change and we need to destroy and create
 * resources while driver in operational.
 */
-   struct mutex reslock;
+   struct rw_semaphore reslock;
struct mlx5_flow_table *rxft;
struct mlx5_fc *rx_counter;
struct mlx5_flow_handle *rx_rule_ucast;
@@ -1695,7 +1695,7 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
ndev = to_mlx5_vdpa_ndev(mvdev);
cvq = >cvq;
  
-	mutex_lock(>reslock);

+   down_write(>reslock);
  
  	if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK))

goto out;
@@ -1746,7 +1746,7 @@ static void mlx5_cvq_kick_handler(struct work_struct 
*work)
}
  
  out:

-   mutex_unlock(>reslock);
+   up_write(>reslock);
  }
  
  static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)

@@ -2244,7 +2244,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
int err;
  
-	WARN_ON(!mutex_is_locked(>reslock));

+   WARN_ON(!rwsem_is_locked(>reslock));
  
  	if (ndev->setup) {

mlx5_vdpa_warn(mvdev, "setup driver called for already setup 
driver\n");
@@ -2292,7 +2292,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
  static void teardown_driver(struct mlx5_vdpa_net *ndev)
  {
  
-	WARN_ON(!mutex_is_locked(>reslock));

+   WARN_ON(!rwsem_is_locked(>reslock));
  
  	if (!ndev->setup)

return;
@@ -2322,7 +2322,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
  
  	print_status(mvdev, status, true);
  
-	mutex_lock(>reslock);

+   down_write(>reslock);
  
  	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {

if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
@@ -2338,14 +2338,14 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
}
  
  	ndev->mvdev.status = status;

-   mutex_unlock(>reslock);
+   up_write(>reslock);
return;
  
  err_setup:

mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
  err_clear:
-   mutex_unlock(>reslock);
+   up_write(>reslock);
  }
  
  static int mlx5_vdpa_reset(struct vdpa_device *vdev)

@@ -2356,7 +2356,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
print_status(mvdev, 0, true);
mlx5_vdpa_info(mvdev, "performing device reset\n");
  
-	mutex_lock(>reslock);

+   down_write(>reslock);
teardown_driver(ndev);
clear_vqs_ready(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
@@ -2371,7 +2371,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
if (mlx5_vdpa_create_mr(mvdev, NULL))
mlx5_vdpa_warn(mvdev, "create MR failed\n");
}
-   mutex_unlock(>reslock);
+   up_write(>reslock);
  
  	return 0;

  }
@@ -2411,7 +2411,7 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, 
struct vhost_iotlb *iotlb
bool change_map;
int err;
  
-	mutex_lock(>reslock);

+   down_write(>reslock);
  
  	err = mlx5_vdpa_handle_set_map(mvdev, iotlb, _map);

if (err) {
@@ -2423,7 +2423,7 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, 
struct vhost_iotlb *iotlb
err = mlx5_vdpa_change_map(mvdev, iotlb);
  
  err:

-   mutex_unlock(>reslock);
+   up_write(>reslock);
return err;
  }
  
@@ -2442,7 +2442,6 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)

mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
}
mlx5_vdpa_free_resources(>mvdev);
-   mutex_destroy(>reslock);
kfree(ndev->event_cbs);
kfree(ndev->vqs);
  }
@@ -2527,7 +2526,7 @@ static int mlx5_vdpa_get_vendor_vq_stats(struct 
vdpa_device *vdev, u16 idx,
u64 completed_desc;
int err = 0;
  
-	mutex_lock(>reslock);

+   down_read(>reslock);
if (!is_index_valid(mvdev, idx)) {
NL_SET_ERR_MSG_MOD(extack, "virtqueue index is not valid");
err = -EINVAL;
@@ -2566,7 +2565,7 @@ static int mlx5_vdpa_get_vendor_vq_stats(struct 
vdpa_device *vdev, u16 idx,
  
  	err = 0;

  out_err:
-   mutex_unlock(>reslock);
+   up_read(>reslock);
return err;
  }
  
@@ -2835,18 +2834,18 @@ static int 

RE: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-17 Thread Parav Pandit via Virtualization


> From: gre...@linuxfoundation.org 
> Sent: Tuesday, May 17, 2022 9:54 AM
> 
> On Tue, May 17, 2022 at 12:21:03PM +, Parav Pandit wrote:
> > Hi Greg, Yongji,
> >
> > > From: Yongji Xie 
> > > Sent: Tuesday, May 17, 2022 3:25 AM
> > >
> > > On Tue, May 17, 2022 at 4:06 AM Parav Pandit 
> wrote:
> > > >
> > > >
> > > >
> > > > > From: Xie Yongji 
> > > > > Sent: Monday, May 16, 2022 2:04 AM
> > > > >
> > > > > Introduce a device object for vdpa management device to control
> > > > > its lifecycle.
> > > > Why is this needed?
> > > > What is the limitation of current approach that requires new
> > > > device for
> > > mgmtdev?
> > > > The primary motivation is missing in the commit log here.
> > > >
> > >
> > > OK, I will add one. This patch actually comes from the discussion:
> > >
> > >
> > > https://www.spinics.net/lists/linux-virtualization/msg56371.html
> > >
> > > The "vdpa_mgmt_dev" makes things a little confusing. Embedding the
> > > device struct in it to control its lifecycle simplifies the logic
> > > and makes the driver model clear.
> > >
> > No. it doesn't.
> >
> > vdpa_mgmt_dev is really the handle for all the netlink socket messages
> targeted.
> > It is not really a struct device.
> 
> Why can it not be one?  It has lifetime rules that must be followed, so might
> as well use the built-in code to handle this, right?
> 
Lifetime rules are followed.
That is, during unloading of the driver for the struct device* dev, mgmtdev 
must be unregistered.
Following the exact mirror of creation.
I general before deleting the struct device *dev, its upper layer user mgmt. 
dev must be unregistered.

There will be zero user of this struct mgmt_dev, after invocation of current 
API vdpa_mgmtdev_unregister().
This is the onus on above unregister API.

Any cleanup cannot be differed until newly thought dev->release().

> What is wrong with it being a struct device?

I am not sure if its fully wrong.
First for this struct device, there is no real driver bind to it.

Secondly, it is simply an overkill without any visible use.
vdpa_mgmt_dev struct has only register/unregister life cycle.
Unregistration of it stops and destroy any outstanding operation on it.
Any code accessing mgmtdev post unregistration simply won't work.

Anything pending cannot be freed in the newly sighted release() callback.
Freeing anything in the release is too late because hw resources are destroy 
after unregister and before release().

Contrary to it, doing so will hide bugs because such struct will live longer 
now if done as struct device.

On a side note, these devices are in 500 to 1K range and expected to grow more.
Creating yet another struct device and linking it to its parent would result in 
additional sysfs links, udev event explosion for no absolute use or doesn't 
bring any object lifecycle changes.

vduse is a special case in the kernel, that has to create a dummy device 
because there is no real device (like pci) linked to it that can service vdpa 
commands.

> 
> > We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing.
> 
> Where is the "management" device if this is not that?
> 
> What does "handle" mean here?
It a handle to on which commands are arriving via socket from user space.
These commands are serviced by the struct device *dev.
DMA operation etc also happen on *dev.
You can think of it as unique handle to refer to the kernel like 
netdev->ifindex on netlink messages.

> 
> > And regarding vduse_dev_release() and existing empty release function,
> they can be dynamically allocated.
> > This is because they are really the struct device.
> 
> I do not understand this statement, sorry.

It was bad sentence, my bad.

What I wanted to say is, probably better explained with real patch below.
In context of [1], struct vduse_mgmtdev empty release function can be avoided 
by below inline patch [2].

[1] https://www.spinics.net/lists/linux-virtualization/msg56371.html

[2] patch:

>From f87d557fe939a1632473dd11526a87301adbab8d Mon Sep 17 00:00:00 2001
From: Parav Pandit 
Date: Wed, 18 May 2022 01:22:21 +0300
Subject: [PATCH] vduse: Tie vduse mgmtdev and its device

vduse management device is not backed by any real device such as PCI. Hence it
doesn't have any parent device linked to it.

Kernel driver model in [1] suggests avoiding an empty device release callback.

Hence tie the mgmtdevice object's life cycle to an allocate dummy struct
device instead of having it static.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/core-api/kobject.rst?h=v5.18-rc7#n284

Signed-off-by: Parav Pandit 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 60 ++
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index f85d1a08ed87..ebe272575fb8 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1475,16 +1475,12 @@ static char 

Re: [PATCH] vhost_net: fix double fget()

2022-05-17 Thread Al Viro
On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote:
> > Signed-off-by: Al Viro 
> > Signed-off-by: Jason Wang 
> 
> Acked-by: Michael S. Tsirkin 
> 
> and this is stable material I guess.

It is, except that commit message ought to be cleaned up.  Something
along the lines of


Fix double fget() in vhost_net_set_backend()

Descriptor table is a shared resource; two fget() on the same descriptor
may return different struct file references.  get_tap_ptr_ring() is
called after we'd found (and pinned) the socket we'll be using and it
tries to find the private tun/tap data structures associated with it.
Redoing the lookup by the same file descriptor we'd used to get the
socket is racy - we need to same struct file.

Thanks to Jason for spotting a braino in the original variant of patch -
I'd missed the use of fd == -1 for disabling backend, and in that case
we can end up with sock == NULL and sock != oldsock.


Does the above sound sane for commit message?  And which tree would you
prefer it to go through?  I can take it in vfs.git#fixes, or you could
take it into your tree...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 7/8] vhost-test: drop flush after vhost_dev_cleanup

2022-05-17 Thread Mike Christie
The flush after vhost_dev_cleanup is not needed because:

1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
so the flush call will just return since the worker has not device.

2. It's not needed. The comment about jobs re-queueing themselves does
not look correct because handle_vq does not requeue work.

Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
---
 drivers/vhost/test.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index de39151366c5..6c139f18bc54 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -158,9 +158,6 @@ static int vhost_test_release(struct inode *inode, struct 
file *f)
vhost_test_flush(n);
vhost_dev_stop(>dev);
vhost_dev_cleanup(>dev);
-   /* We do an extra flush before freeing memory,
-* since jobs can re-queue themselves. */
-   vhost_test_flush(n);
kfree(n->dev.vqs);
kfree(n);
return 0;
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 5/8] vhost_vsock: simplify vhost_vsock_flush()

2022-05-17 Thread Mike Christie
From: Andrey Ryabinin 

vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev)
before vhost_work_dev_flush(>dev). This seems pointless
as vsock->vqs[i].poll.dev is the same as >dev and several flushes
in a row doesn't do anything useful, one is just enough.

Signed-off-by: Andrey Ryabinin 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
---
 drivers/vhost/vsock.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a4c8ae92a0fb..96be63697117 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -705,11 +705,6 @@ static int vhost_vsock_dev_open(struct inode *inode, 
struct file *file)
 
 static void vhost_vsock_flush(struct vhost_vsock *vsock)
 {
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
-   if (vsock->vqs[i].handle_kick)
-   vhost_work_dev_flush(vsock->vqs[i].poll.dev);
vhost_work_dev_flush(>dev);
 }
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 8/8] vhost: rename vhost_work_dev_flush

2022-05-17 Thread Mike Christie
This patch renames vhost_work_dev_flush to just vhost_dev_flush to
relfect that it flushes everything on the device and that drivers
don't know/care that polls are based on vhost_works. Drivers just
flush the entire device and polls, and works for vhost-scsi
management TMFs and IO net virtqueues, etc all are flushed.

Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 drivers/vhost/net.c   |  4 ++--
 drivers/vhost/scsi.c  |  2 +-
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 10 +-
 drivers/vhost/vhost.h |  2 +-
 drivers/vhost/vsock.c |  2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 047b7b05109a..0e4ff6a08f5f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1376,7 +1376,7 @@ static void vhost_net_stop(struct vhost_net *n, struct 
socket **tx_sock,
 
 static void vhost_net_flush(struct vhost_net *n)
 {
-   vhost_work_dev_flush(>dev);
+   vhost_dev_flush(>dev);
if (n->vqs[VHOST_NET_VQ_TX].ubufs) {
mutex_lock(>vqs[VHOST_NET_VQ_TX].vq.mutex);
n->tx_flush = true;
@@ -1566,7 +1566,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
}
 
if (oldsock) {
-   vhost_work_dev_flush(>dev);
+   vhost_dev_flush(>dev);
sockfd_put(oldsock);
}
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 94535c813ef7..ffd9e6c2ffc1 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1436,7 +1436,7 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
kref_put(_inflight[i]->kref, vhost_scsi_done_inflight);
 
/* Flush both the vhost poll and vhost work */
-   vhost_work_dev_flush(>dev);
+   vhost_dev_flush(>dev);
 
/* Wait for all reqs issued before the flush to be finished */
for (i = 0; i < VHOST_SCSI_MAX_VQ; i++)
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 6c139f18bc54..bc8e7fb1e635 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void 
**privatep)
 
 static void vhost_test_flush(struct vhost_test *n)
 {
-   vhost_work_dev_flush(>dev);
+   vhost_dev_flush(>dev);
 }
 
 static int vhost_test_release(struct inode *inode, struct file *f)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9f8de04bb673..716a80c61fa9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -231,7 +231,7 @@ void vhost_poll_stop(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_stop);
 
-void vhost_work_dev_flush(struct vhost_dev *dev)
+void vhost_dev_flush(struct vhost_dev *dev)
 {
struct vhost_flush_struct flush;
 
@@ -243,7 +243,7 @@ void vhost_work_dev_flush(struct vhost_dev *dev)
wait_for_completion(_event);
}
 }
-EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
+EXPORT_SYMBOL_GPL(vhost_dev_flush);
 
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
@@ -530,7 +530,7 @@ static int vhost_attach_cgroups(struct vhost_dev *dev)
attach.owner = current;
vhost_work_init(, vhost_attach_cgroups_work);
vhost_work_queue(dev, );
-   vhost_work_dev_flush(dev);
+   vhost_dev_flush(dev);
return attach.ret;
 }
 
@@ -657,7 +657,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
vhost_poll_stop(>vqs[i]->poll);
}
 
-   vhost_work_dev_flush(dev);
+   vhost_dev_flush(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_stop);
 
@@ -1711,7 +1711,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
mutex_unlock(>mutex);
 
if (pollstop && vq->handle_kick)
-   vhost_work_dev_flush(vq->poll.dev);
+   vhost_dev_flush(vq->poll.dev);
return r;
 }
 EXPORT_SYMBOL_GPL(vhost_vring_ioctl);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index aeb8e1ad1496..d02adf1b2bf8 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -45,7 +45,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t 
fn,
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
-void vhost_work_dev_flush(struct vhost_dev *dev);
+void vhost_dev_flush(struct vhost_dev *dev);
 
 struct vhost_log {
u64 addr;
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 96be63697117..368330417bde 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -705,7 +705,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct 
file *file)
 
 static void vhost_vsock_flush(struct vhost_vsock *vsock)
 {
-   vhost_work_dev_flush(>dev);
+   vhost_dev_flush(>dev);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
-- 
2.25.1


[PATCH V3 6/8] vhost-scsi: drop flush after vhost_dev_cleanup

2022-05-17 Thread Mike Christie
The flush after vhost_dev_cleanup is not needed because:

1. It doesn't do anything. vhost_dev_cleanup will stop the worker thread
so the flush call will just return since the worker has not device.

2. It's not needed for the re-queue case. vhost_scsi_evt_handle_kick grabs
the mutex and if the backend is NULL will return without queueing a work.
vhost_scsi_clear_endpoint will set the backend to NULL under the vq->mutex
then drops the mutex and does a flush. So we know when
vhost_scsi_clear_endpoint has dropped the mutex after clearing the backend
no evt related work will be able to requeue. The flush would then make sure
any queued evts are run and return.

Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
---
 drivers/vhost/scsi.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 532e204f2b1b..94535c813ef7 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1827,8 +1827,6 @@ static int vhost_scsi_release(struct inode *inode, struct 
file *f)
vhost_scsi_clear_endpoint(vs, );
vhost_dev_stop(>dev);
vhost_dev_cleanup(>dev);
-   /* Jobs can re-queue themselves in evt kick handler. Do extra flush. */
-   vhost_scsi_flush(vs);
kfree(vs->dev.vqs);
kvfree(vs);
return 0;
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 3/8] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls

2022-05-17 Thread Mike Christie
From: Andrey Ryabinin 

vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing
vhost_dev pointer obtained via 'n->poll[index].dev' and
'n->vqs[index].vq.poll.dev'. This is actually the same pointer,
initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init()

Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly.
Do the flushes only once instead of several flush calls in a row
which seems rather useless.

Signed-off-by: Andrey Ryabinin 
[drop vhost_dev forward declaration in vhost.h]
Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 drivers/vhost/net.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4e55ad8c942a..047b7b05109a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1374,16 +1374,9 @@ static void vhost_net_stop(struct vhost_net *n, struct 
socket **tx_sock,
*rx_sock = vhost_net_stop_vq(n, >vqs[VHOST_NET_VQ_RX].vq);
 }
 
-static void vhost_net_flush_vq(struct vhost_net *n, int index)
-{
-   vhost_work_dev_flush(n->poll[index].dev);
-   vhost_work_dev_flush(n->vqs[index].vq.poll.dev);
-}
-
 static void vhost_net_flush(struct vhost_net *n)
 {
-   vhost_net_flush_vq(n, VHOST_NET_VQ_TX);
-   vhost_net_flush_vq(n, VHOST_NET_VQ_RX);
+   vhost_work_dev_flush(>dev);
if (n->vqs[VHOST_NET_VQ_TX].ubufs) {
mutex_lock(>vqs[VHOST_NET_VQ_TX].vq.mutex);
n->tx_flush = true;
@@ -1573,7 +1566,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
}
 
if (oldsock) {
-   vhost_net_flush_vq(n, index);
+   vhost_work_dev_flush(>dev);
sockfd_put(oldsock);
}
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 1/8] vhost: get rid of vhost_poll_flush() wrapper

2022-05-17 Thread Mike Christie
From: Andrey Ryabinin 

vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
It gives wrong impression that we are doing some work over vhost_poll,
while in fact it flushes vhost_poll->dev.
It only complicate understanding of the code and leads to mistakes
like flushing the same vhost_dev several times in a row.

Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.

Signed-off-by: Andrey Ryabinin 
[merge vhost_poll_flush removal from Stefano Garzarella]
Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 drivers/vhost/net.c   |  4 ++--
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 12 ++--
 drivers/vhost/vhost.h |  1 -
 drivers/vhost/vsock.c |  2 +-
 5 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 792ab5f23647..4e55ad8c942a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1376,8 +1376,8 @@ static void vhost_net_stop(struct vhost_net *n, struct 
socket **tx_sock,
 
 static void vhost_net_flush_vq(struct vhost_net *n, int index)
 {
-   vhost_poll_flush(n->poll + index);
-   vhost_poll_flush(>vqs[index].vq.poll);
+   vhost_work_dev_flush(n->poll[index].dev);
+   vhost_work_dev_flush(n->vqs[index].vq.poll.dev);
 }
 
 static void vhost_net_flush(struct vhost_net *n)
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 05740cba1cd8..f0ac9e35f5d6 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -146,7 +146,7 @@ static void vhost_test_stop(struct vhost_test *n, void 
**privatep)
 
 static void vhost_test_flush_vq(struct vhost_test *n, int index)
 {
-   vhost_poll_flush(>vqs[index].poll);
+   vhost_work_dev_flush(n->vqs[index].poll.dev);
 }
 
 static void vhost_test_flush(struct vhost_test *n)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d02173fb290c..1d84a2818c6f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -245,14 +245,6 @@ void vhost_work_dev_flush(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_work_dev_flush);
 
-/* Flush any work that has been scheduled. When calling this, don't hold any
- * locks that are also used by the callback. */
-void vhost_poll_flush(struct vhost_poll *poll)
-{
-   vhost_work_dev_flush(poll->dev);
-}
-EXPORT_SYMBOL_GPL(vhost_poll_flush);
-
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 {
if (!dev->worker)
@@ -663,7 +655,7 @@ void vhost_dev_stop(struct vhost_dev *dev)
for (i = 0; i < dev->nvqs; ++i) {
if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
vhost_poll_stop(>vqs[i]->poll);
-   vhost_poll_flush(>vqs[i]->poll);
+   vhost_work_dev_flush(dev->vqs[i]->poll.dev);
}
}
 }
@@ -1719,7 +1711,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
mutex_unlock(>mutex);
 
if (pollstop && vq->handle_kick)
-   vhost_poll_flush(>poll);
+   vhost_work_dev_flush(vq->poll.dev);
return r;
 }
 EXPORT_SYMBOL_GPL(vhost_vring_ioctl);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 638bb640d6b4..aeb8e1ad1496 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -44,7 +44,6 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t 
fn,
 __poll_t mask, struct vhost_dev *dev);
 int vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
-void vhost_poll_flush(struct vhost_poll *poll);
 void vhost_poll_queue(struct vhost_poll *poll);
 void vhost_work_dev_flush(struct vhost_dev *dev);
 
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e6c9d41db1de..a4c8ae92a0fb 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -709,7 +709,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++)
if (vsock->vqs[i].handle_kick)
-   vhost_poll_flush(>vqs[i].poll);
+   vhost_work_dev_flush(vsock->vqs[i].poll.dev);
vhost_work_dev_flush(>dev);
 }
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 4/8] vhost_test: remove vhost_test_flush_vq()

2022-05-17 Thread Mike Christie
From: Andrey Ryabinin 

vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush()
which seems have no value. It's just easier to call vhost_work_dev_flush()
directly. Besides there is no point in obtaining vhost_dev pointer
via 'n->vqs[index].poll.dev' while we can just use >dev.
It's the same pointers, see vhost_test_open()/vhost_dev_init().

Signed-off-by: Andrey Ryabinin 
Signed-off-by: Mike Christie 
Reviewed-by: Chaitanya Kulkarni 
Acked-by: Jason Wang 
---
 drivers/vhost/test.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index f0ac9e35f5d6..de39151366c5 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -144,14 +144,9 @@ static void vhost_test_stop(struct vhost_test *n, void 
**privatep)
*privatep = vhost_test_stop_vq(n, n->vqs + VHOST_TEST_VQ);
 }
 
-static void vhost_test_flush_vq(struct vhost_test *n, int index)
-{
-   vhost_work_dev_flush(n->vqs[index].poll.dev);
-}
-
 static void vhost_test_flush(struct vhost_test *n)
 {
-   vhost_test_flush_vq(n, VHOST_TEST_VQ);
+   vhost_work_dev_flush(>dev);
 }
 
 static int vhost_test_release(struct inode *inode, struct file *f)
@@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int test)
goto err;
 
if (oldpriv) {
-   vhost_test_flush_vq(n, index);
+   vhost_test_flush(n);
}
}
 
@@ -303,7 +298,7 @@ static long vhost_test_set_backend(struct vhost_test *n, 
unsigned index, int fd)
mutex_unlock(>mutex);
 
if (enable) {
-   vhost_test_flush_vq(n, index);
+   vhost_test_flush(n);
}
 
mutex_unlock(>dev.mutex);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 0/8] vhost flush cleanups

2022-05-17 Thread Mike Christie
The following patches made over linus's/mst's tree are Andrey Ryabinin's
flush cleanups and some from me. They reduce the number of flush calls
and remove some bogus ones where we don't even have a worker running
anymore or they were based on outdated or incorrect assumptions.

V3:
- Fix test.c flush use.

V2:
- Added patch to rename vhost_work_dev_flush to just vhost_dev_flush
to handle review comment from Jason about the naming not being so
great.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V3 2/8] vhost: flush dev once during vhost_dev_stop

2022-05-17 Thread Mike Christie
When vhost_work_dev_flush returns all work queued at that time will have
completed. There is then no need to flush after every vhost_poll_stop
call, and we can move the flush call to after the loop that stops the
pollers.

Signed-off-by: Mike Christie 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
---
 drivers/vhost/vhost.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1d84a2818c6f..9f8de04bb673 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -653,11 +653,11 @@ void vhost_dev_stop(struct vhost_dev *dev)
int i;
 
for (i = 0; i < dev->nvqs; ++i) {
-   if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
+   if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick)
vhost_poll_stop(>vqs[i]->poll);
-   vhost_work_dev_flush(dev->vqs[i]->poll.dev);
-   }
}
+
+   vhost_work_dev_flush(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_stop);
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-17 Thread gre...@linuxfoundation.org
On Tue, May 17, 2022 at 12:21:03PM +, Parav Pandit wrote:
> Hi Greg, Yongji,
> 
> > From: Yongji Xie 
> > Sent: Tuesday, May 17, 2022 3:25 AM
> > 
> > On Tue, May 17, 2022 at 4:06 AM Parav Pandit  wrote:
> > >
> > >
> > >
> > > > From: Xie Yongji 
> > > > Sent: Monday, May 16, 2022 2:04 AM
> > > >
> > > > Introduce a device object for vdpa management device to control its
> > > > lifecycle.
> > > Why is this needed?
> > > What is the limitation of current approach that requires new device for
> > mgmtdev?
> > > The primary motivation is missing in the commit log here.
> > >
> > 
> > OK, I will add one. This patch actually comes from the discussion:
> > 
> > 
> > https://www.spinics.net/lists/linux-virtualization/msg56371.html
> > 
> > The "vdpa_mgmt_dev" makes things a little confusing. Embedding the
> > device struct in it to control its lifecycle simplifies the logic and makes 
> > the
> > driver model clear.
> > 
> No. it doesn't.
> 
> vdpa_mgmt_dev is really the handle for all the netlink socket messages 
> targeted.
> It is not really a struct device.

Why can it not be one?  It has lifetime rules that must be followed, so
might as well use the built-in code to handle this, right?

What is wrong with it being a struct device?

> We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing.

Where is the "management" device if this is not that?

What does "handle" mean here?

> And regarding vduse_dev_release() and existing empty release function, they 
> can be dynamically allocated.
> This is because they are really the struct device.

I do not understand this statement, sorry.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/8] vhost_test: remove vhost_test_flush_vq()

2022-05-17 Thread Mike Christie
On 5/17/22 9:11 AM, Stefano Garzarella wrote:
>>
>> static int vhost_test_release(struct inode *inode, struct file *f)
>> @@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int 
>> test)
>>     goto err;
>>
>>     if (oldpriv) {
>> -    vhost_test_flush_vq(n, index);
>> +    vhost_test_flush(n, index);
>     ^
> Should we remove the `index` parameter?

You are right. I'll fix up and repost. Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time

2022-05-17 Thread Srivatsa S. Bhat
On 5/17/22 1:54 AM, Will Deacon wrote:
> On Fri, May 13, 2022 at 04:32:53PM -0700, Srivatsa S. Bhat wrote:
>> On 5/13/22 10:46 AM, Elliot Berman wrote:
>>> From: Prakruthi Deepak Heragu 
>>>

[...]

>>>  static int stolen_time_cpu_down_prepare(unsigned int cpu)
>>>  {
>>> +   struct pvclock_vcpu_stolen_time *kaddr = NULL;
>>> struct pv_time_stolen_time_region *reg;
>>>  
>>> reg = this_cpu_ptr(_time_region);
>>> if (!reg->kaddr)
>>> return 0;
>>>  
>>> -   memunmap(reg->kaddr);
>>> -   memset(reg, 0, sizeof(*reg));
>>> +   kaddr = rcu_replace_pointer(reg->kaddr, NULL, true);
>>> +   synchronize_rcu();
>>> +   memunmap(kaddr);
>>>  
>>
>> The original code used to memset the stolen time region, but this
>> patch seems to drop it. Was that change intentional?
> 
> 'struct pv_time_stolen_time_region' only has one field ('kaddr'), which
> we're now clearing with rcu_replace_pointer() so the memset doesn't make
> sense.
> 

Ah right, never mind :)

Thank you!

Regards,
Srivatsa
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v1 0/8] virtio/vsock: experimental zerocopy receive

2022-05-17 Thread Stefano Garzarella

Hi Arseniy,

On Thu, May 12, 2022 at 05:04:11AM +, Arseniy Krasnov wrote:

 INTRODUCTION

Hello, this is experimental implementation of virtio vsock zerocopy
receive. It was inspired by TCP zerocopy receive by Eric Dumazet. This API uses
same idea: call 'mmap()' on socket's descriptor, then every 'getsockopt()' will
fill provided vma area with pages of virtio RX buffers. After received data was
processed by user, pages must be freed by 'madvise()'  call with MADV_DONTNEED
flag set(if user won't call 'madvise()', next 'getsockopt()' will fail).


Sounds cool, but maybe we would need some socket/net experts here for 
review.


Could we do something similar for the sending path as well?



DETAILS

Here is how mapping with mapped pages looks exactly: first page mapping
contains array of trimmed virtio vsock packet headers (in contains only length
of data on the corresponding page and 'flags' field):

struct virtio_vsock_usr_hdr {
uint32_t length;
uint32_t flags;
};

Field  'length' allows user to know exact size of payload within each sequence
of pages and 'flags' allows user to handle SOCK_SEQPACKET flags(such as message
bounds or record bounds). All other pages are data pages from RX queue.

Page 0  Page 1  Page N

[ hdr1 .. hdrN ][ data ] .. [ data ]
  ||   ^   ^
  ||   |   |
  |*---*
  ||
  ||
  **

Of course, single header could represent array of pages (when packet's
buffer is bigger than one page).So here is example of detailed mapping layout
for some set of packages. Lets consider that we have the following sequence  of
packages: 56 bytes, 4096 bytes and 8200 bytes. All pages: 0,1,2,3,4 and 5 will
be inserted to user's vma(vma is large enough).

Page 0: [[ hdr0 ][ hdr 1 ][ hdr 2 ][ hdr 3 ] ... ]
Page 1: [ 56 ]
Page 2: [ 4096 ]
Page 3: [ 4096 ]
Page 4: [ 4096 ]
Page 5: [ 8 ]

Page 0 contains only array of headers:
'hdr0' has 56 in length field.
'hdr1' has 4096 in length field.
'hdr2' has 8200 in length field.
'hdr3' has 0 in length field(this is end of data marker).

Page 1 corresponds to 'hdr0' and has only 56 bytes of data.
Page 2 corresponds to 'hdr1' and filled with data.
Page 3 corresponds to 'hdr2' and filled with data.
Page 4 corresponds to 'hdr2' and filled with data.
Page 5 corresponds to 'hdr2' and has only 8 bytes of data.

This patchset also changes packets allocation way: today implementation
uses only 'kmalloc()' to create data buffer. Problem happens when we try to map
such buffers to user's vma - kernel forbids to map slab pages to user's vma(as
pages of "not large" 'kmalloc()' allocations are marked with PageSlab flag and
"not large" could be bigger than one page). So to avoid this, data buffers now
allocated using 'alloc_pages()' call.

  TESTS

This patchset updates 'vsock_test' utility: two tests for new feature
were added. First test covers invalid cases. Second checks valid transmission
case.


Thanks for adding the test!



   BENCHMARKING

For benchmakring I've added small utility 'rx_zerocopy'. It works in
client/server mode. When client connects to server, server starts sending exact
amount of data to client(amount is set as input argument).Client reads data and
waits for next portion of it. Client works in two modes: copy and zero-copy. In
copy mode client uses 'read()' call while in zerocopy mode sequence of 'mmap()'
/'getsockopt()'/'madvise()' are used. Smaller amount of time for transmission
is better. For server, we can set size of tx buffer and for client we can set
size of rx buffer or rx mapping size(in zerocopy mode). Usage of this utility
is quiet simple:

For client mode:

./rx_zerocopy --mode client [--zerocopy] [--rx]

For server mode:

./rx_zerocopy --mode server [--mb] [--tx]

[--mb] sets number of megabytes to transfer.
[--rx] sets size of receive buffer/mapping in pages.
[--tx] sets size of transmit buffer in pages.

I checked for transmission of 4000mb of data. Here are some results:

  size of rx/tx buffers in pages
  *---*
  |8   |32|64   |   256|   512|
*--**--*-*--*--*
|   zerocopy   |   24   |   10.6   |  12.2   |   23.6   |21| secs to
*--* process
| non-zerocopy |   13   |   16.4   |  24.7   |   27.2   |   23.9   | 4000 mb

Re: [PATCH V2 0/8] vhost flush cleanups

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:14PM -0500, Mike Christie wrote:

The following patches are Andrey Ryabinin's flush cleanups and some
from me. They reduce the number of flush calls and remove some bogus
ones where we don't even have a worker running anymore or they were
based on outdated or incorrect assumptions.

Jason, for the patches you gave me an explicit acked/reviewed tag I
added it. For the replies where I answered your review questions and
you only replied with an affirimative reply I did not add a tag,
because I was not not 100% sure what you wanted to do.

These patches will be used in the vhost threading patches which I think
will make progress again now that I have Christian's email figured out
:) (he had moved to microsoft but I've been using the ubuntu address).
I think the patches are ok cleanups in general so I thought they could
get merged separately if people agree.

V2:
- Added patch to rename vhost_work_dev_flush to just vhost_dev_flush
to handle review comment from Jason about the naming not being so
great.




Thank you for this cleanup!

I think there is only a small issue to fix in vhost/test.c, otherwise it 
looks good to me :-)


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 8/8] vhost: rename vhost_work_dev_flush

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:22PM -0500, Mike Christie wrote:

This patch renames vhost_work_dev_flush to just vhost_dev_flush to
relfect that it flushes everything on the device and that drivers
don't know/care that polls are based on vhost_works. Drivers just
flush the entire device and polls, and works for vhost-scsi
management TMFs and IO net virtqueues, etc all are flushed.

Signed-off-by: Mike Christie 
---
drivers/vhost/net.c   |  4 ++--
drivers/vhost/scsi.c  |  2 +-
drivers/vhost/test.c  |  2 +-
drivers/vhost/vhost.c | 10 +-
drivers/vhost/vhost.h |  2 +-
drivers/vhost/vsock.c |  2 +-
6 files changed, 11 insertions(+), 11 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 4/8] vhost_test: remove vhost_test_flush_vq()

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:18PM -0500, Mike Christie wrote:

From: Andrey Ryabinin 

vhost_test_flush_vq() just a simple wrapper around vhost_work_dev_flush()
which seems have no value. It's just easier to call vhost_work_dev_flush()
directly. Besides there is no point in obtaining vhost_dev pointer
via 'n->vqs[index].poll.dev' while we can just use >dev.
It's the same pointers, see vhost_test_open()/vhost_dev_init().

Signed-off-by: Andrey Ryabinin 
Signed-off-by: Mike Christie 
---
drivers/vhost/test.c | 11 +++
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index f0ac9e35f5d6..837148d0a6a8 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -144,14 +144,9 @@ static void vhost_test_stop(struct vhost_test *n, void 
**privatep)
*privatep = vhost_test_stop_vq(n, n->vqs + VHOST_TEST_VQ);
}

-static void vhost_test_flush_vq(struct vhost_test *n, int index)
-{
-   vhost_work_dev_flush(n->vqs[index].poll.dev);
-}
-
static void vhost_test_flush(struct vhost_test *n)
{
-   vhost_test_flush_vq(n, VHOST_TEST_VQ);
+   vhost_work_dev_flush(>dev);
}

static int vhost_test_release(struct inode *inode, struct file *f)
@@ -210,7 +205,7 @@ static long vhost_test_run(struct vhost_test *n, int test)
goto err;

if (oldpriv) {
-   vhost_test_flush_vq(n, index);
+   vhost_test_flush(n, index);

^
Should we remove the `index` parameter?


}
}

@@ -303,7 +298,7 @@ static long vhost_test_set_backend(struct vhost_test *n, 
unsigned index, int fd)
mutex_unlock(>mutex);

if (enable) {
-   vhost_test_flush_vq(n, index);
+   vhost_test_flush(n);
}

mutex_unlock(>dev.mutex);
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 3/8] vhost_net: get rid of vhost_net_flush_vq() and extra flush calls

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:17PM -0500, Mike Christie wrote:

From: Andrey Ryabinin 

vhost_net_flush_vq() calls vhost_work_dev_flush() twice passing
vhost_dev pointer obtained via 'n->poll[index].dev' and
'n->vqs[index].vq.poll.dev'. This is actually the same pointer,
initialized in vhost_net_open()/vhost_dev_init()/vhost_poll_init()

Remove vhost_net_flush_vq() and call vhost_work_dev_flush() directly.
Do the flushes only once instead of several flush calls in a row
which seems rather useless.

Signed-off-by: Andrey Ryabinin 
[drop vhost_dev forward declaration in vhost.h]
Signed-off-by: Mike Christie 
---
drivers/vhost/net.c | 11 ++-
1 file changed, 2 insertions(+), 9 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time

2022-05-17 Thread Will Deacon
On Fri, 13 May 2022 10:46:54 -0700, Elliot Berman wrote:
> From: Prakruthi Deepak Heragu 
> 
> During hotplug, the stolen time data structure is unmapped and memset.
> There is a possibility of the timer IRQ being triggered before memset
> and stolen time is getting updated as part of this timer IRQ handler. This
> causes the below crash in timer handler -
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: paravirt: Use RCU read locks to guard stolen_time
  https://git.kernel.org/arm64/c/19bef63f951e

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 2/8] vhost: flush dev once during vhost_dev_stop

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:16PM -0500, Mike Christie wrote:

When vhost_work_dev_flush returns all work queued at that time will have
completed. There is then no need to flush after every vhost_poll_stop
call, and we can move the flush call to after the loop that stops the
pollers.

Signed-off-by: Mike Christie 
---
drivers/vhost/vhost.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 1/8] vhost: get rid of vhost_poll_flush() wrapper

2022-05-17 Thread Stefano Garzarella

On Sun, May 15, 2022 at 03:29:15PM -0500, Mike Christie wrote:

From: Andrey Ryabinin 

vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush().
It gives wrong impression that we are doing some work over vhost_poll,
while in fact it flushes vhost_poll->dev.
It only complicate understanding of the code and leads to mistakes
like flushing the same vhost_dev several times in a row.

Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly.

Signed-off-by: Andrey Ryabinin 
[merge vhost_poll_flush removal from Stefano Garzarella]
Signed-off-by: Mike Christie 
---
drivers/vhost/net.c   |  4 ++--
drivers/vhost/test.c  |  2 +-
drivers/vhost/vhost.c | 12 ++--
drivers/vhost/vhost.h |  1 -
drivers/vhost/vsock.c |  2 +-
5 files changed, 6 insertions(+), 15 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-17 Thread Parav Pandit via Virtualization
Hi Greg, Yongji,

> From: Yongji Xie 
> Sent: Tuesday, May 17, 2022 3:25 AM
> 
> On Tue, May 17, 2022 at 4:06 AM Parav Pandit  wrote:
> >
> >
> >
> > > From: Xie Yongji 
> > > Sent: Monday, May 16, 2022 2:04 AM
> > >
> > > Introduce a device object for vdpa management device to control its
> > > lifecycle.
> > Why is this needed?
> > What is the limitation of current approach that requires new device for
> mgmtdev?
> > The primary motivation is missing in the commit log here.
> >
> 
> OK, I will add one. This patch actually comes from the discussion:
> 
> 
> https://www.spinics.net/lists/linux-virtualization/msg56371.html
> 
> The "vdpa_mgmt_dev" makes things a little confusing. Embedding the
> device struct in it to control its lifecycle simplifies the logic and makes 
> the
> driver model clear.
> 
No. it doesn't.

vdpa_mgmt_dev is really the handle for all the netlink socket messages targeted.
It is not really a struct device.
We can rename it to vdpa_mgmt_handle, if the _dev suffix is confusing.

And regarding vduse_dev_release() and existing empty release function, they can 
be dynamically allocated.
This is because they are really the struct device.


> > > And the device name will be used to match
> VDPA_ATTR_MGMTDEV_DEV_NAME
> > > field of netlink message rather than using parent device name.
> > >
> > > With this patch applied, drivers should use vdpa_mgmtdev_alloc() or
> > > _vdpa_mgmtdev_alloc() to allocate a vDPA management device before
> > > calling vdpa_mgmtdev_register(). And some buggy empty release
> > > function can also be removed from the driver codes.
> > What is the bug, can you please explain?
> 
> I'm not sure if "buggy" is the right word. But the empty release function
> should be removed as mentioned in Documentation/core-api/kobject.rst.
> 
Sure. So, struct device in the vdpa_sim and vduse can allocate the struct 
device using kzalloc() and release callback can free it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] arm64: paravirt: Use RCU read locks to guard stolen_time

2022-05-17 Thread Will Deacon
On Fri, May 13, 2022 at 04:32:53PM -0700, Srivatsa S. Bhat wrote:
> On 5/13/22 10:46 AM, Elliot Berman wrote:
> > From: Prakruthi Deepak Heragu 
> > 
> > During hotplug, the stolen time data structure is unmapped and memset.
> > There is a possibility of the timer IRQ being triggered before memset
> > and stolen time is getting updated as part of this timer IRQ handler. This
> > causes the below crash in timer handler -
> > 
> >   [ 3457.473139][C5] Unable to handle kernel paging request at virtual 
> > address ffc03df05148
> >   ...
> >   [ 3458.154398][C5] Call trace:
> >   [ 3458.157648][C5]  para_steal_clock+0x30/0x50
> >   [ 3458.162319][C5]  irqtime_account_process_tick+0x30/0x194
> >   [ 3458.168148][C5]  account_process_tick+0x3c/0x280
> >   [ 3458.173274][C5]  update_process_times+0x5c/0xf4
> >   [ 3458.178311][C5]  tick_sched_timer+0x180/0x384
> >   [ 3458.183164][C5]  __run_hrtimer+0x160/0x57c
> >   [ 3458.187744][C5]  hrtimer_interrupt+0x258/0x684
> >   [ 3458.192698][C5]  arch_timer_handler_virt+0x5c/0xa0
> >   [ 3458.198002][C5]  handle_percpu_devid_irq+0xdc/0x414
> >   [ 3458.203385][C5]  handle_domain_irq+0xa8/0x168
> >   [ 3458.208241][C5]  gic_handle_irq.34493+0x54/0x244
> >   [ 3458.213359][C5]  call_on_irq_stack+0x40/0x70
> >   [ 3458.218125][C5]  do_interrupt_handler+0x60/0x9c
> >   [ 3458.223156][C5]  el1_interrupt+0x34/0x64
> >   [ 3458.227560][C5]  el1h_64_irq_handler+0x1c/0x2c
> >   [ 3458.232503][C5]  el1h_64_irq+0x7c/0x80
> >   [ 3458.236736][C5]  free_vmap_area_noflush+0x108/0x39c
> >   [ 3458.242126][C5]  remove_vm_area+0xbc/0x118
> >   [ 3458.246714][C5]  vm_remove_mappings+0x48/0x2a4
> >   [ 3458.251656][C5]  __vunmap+0x154/0x278
> >   [ 3458.255796][C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
> >   [ 3458.261542][C5]  cpuhp_invoke_callback+0x248/0xc34
> >   [ 3458.266842][C5]  cpuhp_thread_fun+0x1c4/0x248
> >   [ 3458.271696][C5]  smpboot_thread_fn+0x1b0/0x400
> >   [ 3458.276638][C5]  kthread+0x17c/0x1e0
> >   [ 3458.280691][C5]  ret_from_fork+0x10/0x20
> > 
> > As a fix, introduce rcu lock to update stolen time structure.
> > 
> > Suggested-by: Will Deacon 
> > Signed-off-by: Prakruthi Deepak Heragu 
> > Signed-off-by: Elliot Berman 
> > ---
> 
> Looks good to me, but one quick question though (see below).
> 
> Reviewed-by: Srivatsa S. Bhat (VMware) 

Cheers.

> >  static int stolen_time_cpu_down_prepare(unsigned int cpu)
> >  {
> > +   struct pvclock_vcpu_stolen_time *kaddr = NULL;
> > struct pv_time_stolen_time_region *reg;
> >  
> > reg = this_cpu_ptr(_time_region);
> > if (!reg->kaddr)
> > return 0;
> >  
> > -   memunmap(reg->kaddr);
> > -   memset(reg, 0, sizeof(*reg));
> > +   kaddr = rcu_replace_pointer(reg->kaddr, NULL, true);
> > +   synchronize_rcu();
> > +   memunmap(kaddr);
> >  
> 
> The original code used to memset the stolen time region, but this
> patch seems to drop it. Was that change intentional?

'struct pv_time_stolen_time_region' only has one field ('kaddr'), which
we're now clearing with rcu_replace_pointer() so the memset doesn't make
sense.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-17 Thread Greg KH
On Tue, May 17, 2022 at 02:29:38PM +0800, Jason Wang wrote:
> On Mon, May 16, 2022 at 7:54 PM Greg KH  wrote:
> >
> > On Mon, May 16, 2022 at 06:34:36AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, May 16, 2022 at 06:31:18PM +0800, Yongji Xie wrote:
> > > > On Mon, May 16, 2022 at 5:54 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote:
> > > > > > On Mon, May 16, 2022 at 5:14 PM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2022/5/16 14:03, Xie Yongji 写道:
> > > > > > > > Introduce a device object for vdpa management device to control
> > > > > > > > its lifecycle. And the device name will be used to match
> > > > > > > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than
> > > > > > > > using parent device name.
> > > > > > > >
> > > > > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc()
> > > > > > > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device
> > > > > > > > before calling vdpa_mgmtdev_register(). And some buggy empty
> > > > > > > > release function can also be removed from the driver codes.
> > > > > > > >
> > > > > > > > Signed-off-by: Xie Yongji 
> > > > > > > > ---
> > > > > > > >   drivers/vdpa/ifcvf/ifcvf_main.c  | 11 ++--
> > > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++--
> > > > > > > >   drivers/vdpa/vdpa.c  | 92 
> > > > > > > > 
> > > > > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 
> > > > > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +-
> > > > > > > >   drivers/vdpa/vdpa_user/vduse_dev.c   | 38 
> > > > > > > >   include/linux/vdpa.h | 38 +++-
> > > > > > > >   7 files changed, 168 insertions(+), 107 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
> > > > > > > > b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > > index 4366320fb68d..d4087c37cfdf 100644
> > > > > > > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev 
> > > > > > > > *pdev, const struct pci_device_id *id)
> > > > > > > >   u32 dev_type;
> > > > > > > >   int ret;
> > > > > > > >
> > > > > > > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct 
> > > > > > > > ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
> > > > > > > > - if (!ifcvf_mgmt_dev) {
> > > > > > > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct 
> > > > > > > > ifcvf_vdpa_mgmt_dev,
> > > > > > > > + mdev, dev_name(dev), 
> > > > > > > > dev);
> > > > > > >
> > > > > > >
> > > > > > > Just wonder if it's better to make vDPA device a child of the mgmt
> > > > > > > device instead of the PCI device?
> > > > > > >
> > > > > > > (Currently we use PCI device as the parent of the vDPA device, or 
> > > > > > > at
> > > > > > > least we can do this for the simulator which doesn't have a 
> > > > > > > parent?)
> > > > > > >
> > > > > >
> > > > > > Make sense. I think we can do it for all vDPA drivers. Make sure the
> > > > > > parent of the vDPA device is the vDPA management device.
> > > > > >
> > > > > > Thanks,
> > > > > > Yongji
> > > > >
> > > > >
> > > > > that's an ABI change though isn't it? parent is exposed in sysfs,
> > > > > right?
> > > > >
> > > >
> > > > Hmm...yes. So it looks like we can't change it, right?
> > > >
> > > > Thanks,
> > > > Yongji
> > >
> > > Afraid so. a way to find the pci device already exists I think, right?
> >
> > You can change it, any tools should be going through the bus/device
> > links, not walking the sysfs tree directly, right?  That's what those
> > symlinks are for, in order to properly be able to enumerate different
> > device types.
> >
> > A specific topology in sysfs should not ever be assumed to be static
> > over time, that's not an accurate representation of how the kernel
> > works.
> >
> > So try it and see?  Odds are there are no tools that even care about
> > these devices, right?  Or is there?
> 
> I think there's no tool that depends on the sysfs tree now. The
> management layer is only expected to talk to the management device via
> vdpa(8) which is integrated in iproute2.

Great, then change away!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.

2022-05-17 Thread Jason Wang
On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko  wrote:
>
> Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> Technically they enable NETIF_F_GSO_UDP_L4
> (and only if USO4 & USO6 are set simultaneously).
> It allows to transmission of large UDP packets.
>
> Different features USO4 and USO6 are required for qemu where Windows guests 
> can
> enable disable USO receives for IPv4 and IPv6 separately.
> On the other side, Linux can't really differentiate USO4 and USO6, for now.
> For now, to enable USO for TUN it requires enabling USO4 and USO6 together.
> In the future, there would be a mechanism to control UDP_L4 GSO separately.
>
> Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
>
> New types for VirtioNet already on mailing:
> https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
>
> Also, there is a known issue with transmitting packages between two guests.

Could you explain this more? It looks like a bug. (Or any pointer to
the discussion)

Thanks

> Without hacks with skb's GSO - packages are still segmented on the host's 
> postrouting.
>
> Andrew (5):
>   uapi/linux/if_tun.h: Added new offload types for USO4/6.
>   driver/net/tun: Added features for USO.
>   uapi/linux/virtio_net.h: Added USO types.
>   linux/virtio_net.h: Support USO offload in vnet header.
>   drivers/net/virtio_net.c: Added USO support.
>
>  drivers/net/tap.c   | 10 --
>  drivers/net/tun.c   |  8 +++-
>  drivers/net/virtio_net.c| 19 +++
>  include/linux/virtio_net.h  |  9 +
>  include/uapi/linux/if_tun.h |  2 ++
>  include/uapi/linux/virtio_net.h |  4 
>  6 files changed, 45 insertions(+), 7 deletions(-)
>
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] vdpa: Add a device object for vdpa management device

2022-05-17 Thread Jason Wang
On Mon, May 16, 2022 at 7:54 PM Greg KH  wrote:
>
> On Mon, May 16, 2022 at 06:34:36AM -0400, Michael S. Tsirkin wrote:
> > On Mon, May 16, 2022 at 06:31:18PM +0800, Yongji Xie wrote:
> > > On Mon, May 16, 2022 at 5:54 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 16, 2022 at 05:31:27PM +0800, Yongji Xie wrote:
> > > > > On Mon, May 16, 2022 at 5:14 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > 在 2022/5/16 14:03, Xie Yongji 写道:
> > > > > > > Introduce a device object for vdpa management device to control
> > > > > > > its lifecycle. And the device name will be used to match
> > > > > > > VDPA_ATTR_MGMTDEV_DEV_NAME field of netlink message rather than
> > > > > > > using parent device name.
> > > > > > >
> > > > > > > With this patch applied, drivers should use vdpa_mgmtdev_alloc()
> > > > > > > or _vdpa_mgmtdev_alloc() to allocate a vDPA management device
> > > > > > > before calling vdpa_mgmtdev_register(). And some buggy empty
> > > > > > > release function can also be removed from the driver codes.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji 
> > > > > > > ---
> > > > > > >   drivers/vdpa/ifcvf/ifcvf_main.c  | 11 ++--
> > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c| 11 ++--
> > > > > > >   drivers/vdpa/vdpa.c  | 92 
> > > > > > > 
> > > > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 39 
> > > > > > >   drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 46 +-
> > > > > > >   drivers/vdpa/vdpa_user/vduse_dev.c   | 38 
> > > > > > >   include/linux/vdpa.h | 38 +++-
> > > > > > >   7 files changed, 168 insertions(+), 107 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
> > > > > > > b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > index 4366320fb68d..d4087c37cfdf 100644
> > > > > > > --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> > > > > > > @@ -821,10 +821,11 @@ static int ifcvf_probe(struct pci_dev 
> > > > > > > *pdev, const struct pci_device_id *id)
> > > > > > >   u32 dev_type;
> > > > > > >   int ret;
> > > > > > >
> > > > > > > - ifcvf_mgmt_dev = kzalloc(sizeof(struct 
> > > > > > > ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
> > > > > > > - if (!ifcvf_mgmt_dev) {
> > > > > > > + ifcvf_mgmt_dev = vdpa_mgmtdev_alloc(struct 
> > > > > > > ifcvf_vdpa_mgmt_dev,
> > > > > > > + mdev, dev_name(dev), 
> > > > > > > dev);
> > > > > >
> > > > > >
> > > > > > Just wonder if it's better to make vDPA device a child of the mgmt
> > > > > > device instead of the PCI device?
> > > > > >
> > > > > > (Currently we use PCI device as the parent of the vDPA device, or at
> > > > > > least we can do this for the simulator which doesn't have a parent?)
> > > > > >
> > > > >
> > > > > Make sense. I think we can do it for all vDPA drivers. Make sure the
> > > > > parent of the vDPA device is the vDPA management device.
> > > > >
> > > > > Thanks,
> > > > > Yongji
> > > >
> > > >
> > > > that's an ABI change though isn't it? parent is exposed in sysfs,
> > > > right?
> > > >
> > >
> > > Hmm...yes. So it looks like we can't change it, right?
> > >
> > > Thanks,
> > > Yongji
> >
> > Afraid so. a way to find the pci device already exists I think, right?
>
> You can change it, any tools should be going through the bus/device
> links, not walking the sysfs tree directly, right?  That's what those
> symlinks are for, in order to properly be able to enumerate different
> device types.
>
> A specific topology in sysfs should not ever be assumed to be static
> over time, that's not an accurate representation of how the kernel
> works.
>
> So try it and see?  Odds are there are no tools that even care about
> these devices, right?  Or is there?

I think there's no tool that depends on the sysfs tree now. The
management layer is only expected to talk to the management device via
vdpa(8) which is integrated in iproute2.

Thanks

>
> thanks,
>
> greg k-h
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization