Re: [PATCH] vduse: Cleanup the old kernel states after reset failure

2021-09-06 Thread Jason Wang
On Mon, Sep 6, 2021 at 10:22 PM Xie Yongji  wrote:
>
> We should cleanup the old kernel states e.g. interrupt callback
> no matter whether the userspace handle the reset correctly or not
> since virtio-vdpa can't handle the reset failure now.
>
> Otherwise, the old state might be used after reset which might
> break something, e.g. the old interrupt callback might be triggered
> by userspace after reset, which can break the virtio device driver.
>
> Signed-off-by: Xie Yongji 
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 59a93e5b967a..61695521066c 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -665,13 +665,11 @@ static void vduse_vdpa_set_config(struct vdpa_device 
> *vdpa, unsigned int offset,
>  static int vduse_vdpa_reset(struct vdpa_device *vdpa)
>  {
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> -
> -   if (vduse_dev_set_status(dev, 0))
> -   return -EIO;
> +   int ret = vduse_dev_set_status(dev, 0);
>
> vduse_dev_reset(dev);
>
> -   return 0;
> +   return ret;
>  }
>
>  static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
> --
> 2.11.0
>

Acked-by: Jason Wang 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] virtiofs: miscellaneous fixes

2021-09-06 Thread JeffleXu
ping ...

On 8/12/21 1:46 PM, Jeffle Xu wrote:
> Some fixes or optimization for virtiofs, which are authored by Liu Bo.
> 
> Liu Bo (2):
>   virtio-fs: disable atomic_o_trunc if no_open is enabled
>   virtiofs: reduce lock contention on fpq->lock
> 
>  fs/fuse/file.c  | 11 +--
>  fs/fuse/virtio_fs.c |  3 ---
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

2021-09-06 Thread Stefan Hajnoczi
On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> > While investigating an unhandled irq from vring_interrupt() with virtiofs I
> > stumbled onto a possible race that also affects virtio_gpu. This theory is
> > based on code inspection and hopefully you can point out something that 
> > makes
> > this a non-issue in practice :).
> > 
> > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> > taken with no used (completed) buffers in the virtqueue. The kernel disables
> > the irq and the driver is no longer receives irqs when this happens:
> > 
> >   irq 77: nobody cared (try booting with the "irqpoll" option)
> >   ...
> >   handlers:
> >   [] vring_interrupt
> >   Disabling IRQ #77
> > 
> > Consider the following:
> > 
> > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
> >function is scheduled to dequeue used buffers:
> >vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> > 
> > 2. The device adds more used requests and just before...
> > 
> > 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> >virtqueue_get_buf().
> > 
> > 4. The device raises the irq and vring_interrupt() is called after
> >virtio_fs_requests_done_work emptied the virtqueue:
> > 
> >irqreturn_t vring_interrupt(int irq, void *_vq)
> >{
> >struct vring_virtqueue *vq = to_vvq(_vq);
> > 
> >if (!more_used(vq)) {
> >pr_debug("virtqueue interrupt with no work for %p\n", vq);
> >return IRQ_NONE;
> >
> > 
> > I have included a patch that switches virtiofs from spin_lock() to
> > spin_lock_irqsave() to prevent vring_interrupt() from running while the
> > virtqueue is processed from a work function.
> > 
> > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> > I think this can result in the same false unhandled irq problem as virtiofs.
> > 
> > This race condition could in theory affect all drivers. The VIRTIO
> > specification says:
> > 
> >   Neither of these notification suppression methods are reliable, as they 
> > are
> >   not synchronized with the device, but they serve as useful optimizations.
> > 
> > If virtqueue_disable_cb() is just a hint and might not disable virtqueue 
> > irqs
> > then virtio_net and other drivers have a problem because because an irq 
> > could
> > be raised while the driver is dequeuing used buffers. I think we haven't 
> > seen
> > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> > devices might cache the value and not disable notifications for some time...
> > 
> > Have I missed something?
> > 
> > The virtiofs patch I attached is being stress tested to see if the unhandled
> > irqs still occur.
> > 
> > Stefan Hajnoczi (1):
> >   fuse: disable local irqs when processing vq completions
> > 
> >  fs/fuse/virtio_fs.c | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> Fundamentally it is not a problem to have an unhandled IRQ
> once in a while. It's only a problem if this happens time
> after time.
> 
> 
> Does the kernel you are testing include
> commit 8d622d21d24803408b256d96463eac4574dcf067
> Author: Michael S. Tsirkin 
> Date:   Tue Apr 13 01:19:16 2021 -0400
> 
> virtio: fix up virtio_disable_cb
> 
> ?
> 
> If not it's worth checking whether the latest kernel still
> has the issue.

A new kernel with your patch doesn't have this issue.

Please disregard the patch I posted, your patch seems to be enough.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] virtio-blk: avoid preallocating big SGL for data

2021-09-06 Thread Stefan Hajnoczi
On Wed, Sep 01, 2021 at 04:14:34PM +0300, Max Gurtovoy wrote:
> No need to pre-allocate a big buffer for the IO SGL anymore. If a device
> has lots of deep queues, preallocation for the sg list can consume
> substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
> can be 64 or 128 and each queue's depth might be 128. This means the
> resulting preallocation for the data SGLs is big.
> 
> Switch to runtime allocation for SGL for lists longer than 2 entries.
> This is the approach used by NVMe drivers so it should be reasonable for
> virtio block as well. Runtime SGL allocation has always been the case
> for the legacy I/O path so this is nothing new.
> 
> The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
> support SG_CHAIN, use only runtime allocation for the SGL.
> 
> Re-organize the setup of the IO request to fit the new sg chain
> mechanism.
> 
> No performance degradation was seen (fio libaio engine with 16 jobs and
> 128 iodepth):
> 
> IO size  IOPs Rand Read (before/after) IOPs Rand Write 
> (before/after)
>  -
> --
> 512B  318K/316K329K/325K
> 
> 4KB   323K/321K353K/349K
> 
> 16KB  199K/208K250K/275K
> 
> 128KB 36K/36.1K39.2K/41.7K

I ran fio randread benchmarks with 4k, 16k, 64k, and 128k at iodepth 1,
8, and 64 on two vCPUs. The results look fine, there is no significant
regression.

iodepth=1 and iodepth=64 are very consistent. For some reason the
iodepth=8 has significant variance but I don't think it's the fault of
this patch.

Fio results and the Jupyter notebook export are available here (check
out benchmark.html to see the graphs):

https://gitlab.com/stefanha/virt-playbooks/-/tree/virtio-blk-sgl-allocation-benchmark/notebook

Guest:
- Fedora 34
- Linux v5.14
- 2 vCPUs (pinned), 4 GB RAM (single host NUMA node)
- 1 IOThread (pinned)
- virtio-blk aio=native,cache=none,format=raw
- QEMU 6.1.0

Host:
- RHEL 8.3
- Linux 4.18.0-240.22.1.el8_3.x86_64
- Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
- Intel Optane DC P4800X

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry

2021-09-06 Thread Jorgen Hansen


> On 6 Sep 2021, at 11:11, Stefano Garzarella  wrote:
> 
> Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core,
> tests, and headers. Move some general vsock stuff from virtio-vsock
> entry into this new more general vsock entry.
> 
> I've been reviewing and contributing for the last few years,
> so I'm available to help maintain this code.
> 
> Cc: Dexuan Cui 
> Cc: Jorgen Hansen 
> Cc: Stefan Hajnoczi 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
> 
> Dexuan, Jorgen, Stefan, would you like to co-maintain or
> be added as a reviewer?

Hi Stefano,

Please add me as a maintainer as well. I’ll try to help more out.

Thanks,
Jorgen

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry

2021-09-06 Thread Stefan Hajnoczi
On Mon, Sep 06, 2021 at 11:11:59AM +0200, Stefano Garzarella wrote:
> Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core,
> tests, and headers. Move some general vsock stuff from virtio-vsock
> entry into this new more general vsock entry.
> 
> I've been reviewing and contributing for the last few years,
> so I'm available to help maintain this code.
> 
> Cc: Dexuan Cui 
> Cc: Jorgen Hansen 
> Cc: Stefan Hajnoczi 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
> 
> Dexuan, Jorgen, Stefan, would you like to co-maintain or
> be added as a reviewer?
> 
> Thanks,
> Stefano
> ---
>  MAINTAINERS | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry

2021-09-06 Thread Stefan Hajnoczi
On Mon, Sep 06, 2021 at 11:11:59AM +0200, Stefano Garzarella wrote:
> Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core,
> tests, and headers. Move some general vsock stuff from virtio-vsock
> entry into this new more general vsock entry.
> 
> I've been reviewing and contributing for the last few years,
> so I'm available to help maintain this code.
> 
> Cc: Dexuan Cui 
> Cc: Jorgen Hansen 
> Cc: Stefan Hajnoczi 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
> 
> Dexuan, Jorgen, Stefan, would you like to co-maintain or
> be added as a reviewer?

Please skip me for now. I'm available if you take an extended period of
time off and other special situations but don't have enough time to play
an active role.

Thanks,
Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/1] virtio-blk: add num_io_queues module parameter

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 01:31:32AM +0300, Max Gurtovoy wrote:
> 
> On 9/5/2021 7:02 PM, Michael S. Tsirkin wrote:
> > On Thu, Sep 02, 2021 at 02:45:52PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Aug 31, 2021 at 04:50:35PM +0300, Max Gurtovoy wrote:
> > > > Sometimes a user would like to control the amount of IO queues to be
> > > > created for a block device. For example, for limiting the memory
> > > > footprint of virtio-blk devices.
> > > > 
> > > > Signed-off-by: Max Gurtovoy 
> > > > ---
> > > > 
> > > > changes from v1:
> > > >   - use param_set_uint_minmax (from Christoph)
> > > >   - added "Should > 0" to module description
> > > > 
> > > > Note: This commit apply on top of Jens's branch for-5.15/drivers
> > > > ---
> > > >   drivers/block/virtio_blk.c | 20 +++-
> > > >   1 file changed, 19 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index 4b49df2dfd23..9332fc4e9b31 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -24,6 +24,22 @@
> > > >   /* The maximum number of sg elements that fit into a virtqueue */
> > > >   #define VIRTIO_BLK_MAX_SG_ELEMS 32768
> > > > +static int virtblk_queue_count_set(const char *val,
> > > > +   const struct kernel_param *kp)
> > > > +{
> > > > +   return param_set_uint_minmax(val, kp, 1, nr_cpu_ids);
> > > > +}


Hmm which tree is this for?

> > > > +
> > > > +static const struct kernel_param_ops queue_count_ops = {
> > > > +   .set = virtblk_queue_count_set,
> > > > +   .get = param_get_uint,
> > > > +};
> > > > +
> > > > +static unsigned int num_io_queues;
> > > > +module_param_cb(num_io_queues, &queue_count_ops, &num_io_queues, 0644);
> > > > +MODULE_PARM_DESC(num_io_queues,
> > > > +"Number of IO virt queues to use for blk device. 
> > > > Should > 0");



better:

+MODULE_PARM_DESC(num_io_request_queues,
+"Limit number of IO request virt queues to use for each 
device. 0 for now limit");


> > > > +
> > > >   static int major;
> > > >   static DEFINE_IDA(vd_index_ida);
> > > > @@ -501,7 +517,9 @@ static int init_vq(struct virtio_blk *vblk)
> > > > if (err)
> > > > num_vqs = 1;
> > > > -   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
> > > > +   num_vqs = min_t(unsigned int,
> > > > +   min_not_zero(num_io_queues, nr_cpu_ids),
> > > > +   num_vqs);
> > > If you respin, please consider calling them request queues. That's the
> > > terminology from the VIRTIO spec and it's nice to keep it consistent.
> > > But the purpose of num_io_queues is clear, so:
> > > 
> > > Reviewed-by: Stefan Hajnoczi 
> > I did this:
> > +static unsigned int num_io_request_queues;
> > +module_param_cb(num_io_request_queues, &queue_count_ops, 
> > &num_io_request_queues, 0644);
> > +MODULE_PARM_DESC(num_io_request_queues,
> > +"Limit number of IO request virt queues to use for each 
> > device. 0 for now limit");
> 
> The parameter is writable and can be changed and then new devices might be
> probed with new value.
> 
> It can't be zero in the code. we can change param_set_uint_minmax args and
> say that 0 says nr_cpus.
> 
> I'm ok with the renaming but I prefer to stick to the description we gave in
> V3 of this patch (and maybe enable value of 0 as mentioned above).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 04:45:55PM +0800, Yongji Xie wrote:
> On Mon, Sep 6, 2021 at 4:01 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 06, 2021 at 03:06:44PM +0800, Yongji Xie wrote:
> > > On Mon, Sep 6, 2021 at 2:37 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Sep 06, 2021 at 02:09:25PM +0800, Yongji Xie wrote:
> > > > > On Mon, Sep 6, 2021 at 1:56 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> > > > > > > This adds a new callback to support device specific reset
> > > > > > > behavior. The vdpa bus driver will call the reset function
> > > > > > > instead of setting status to zero during resetting.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji 
> > > > > >
> > > > > >
> > > > > > This does gloss over a significant change though:
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > > @@ -348,12 +352,12 @@ static inline struct device 
> > > > > > > *vdpa_get_dma_dev(struct vdpa_device *vdev)
> > > > > > >   return vdev->dma_dev;
> > > > > > >  }
> > > > > > >
> > > > > > > -static inline void vdpa_reset(struct vdpa_device *vdev)
> > > > > > > +static inline int vdpa_reset(struct vdpa_device *vdev)
> > > > > > >  {
> > > > > > >   const struct vdpa_config_ops *ops = vdev->config;
> > > > > > >
> > > > > > >   vdev->features_valid = false;
> > > > > > > - ops->set_status(vdev, 0);
> > > > > > > + return ops->reset(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > >  static inline int vdpa_set_features(struct vdpa_device *vdev, 
> > > > > > > u64 features)
> > > > > >
> > > > > >
> > > > > > Unfortunately this breaks virtio_vdpa:
> > > > > >
> > > > > >
> > > > > > static void virtio_vdpa_reset(struct virtio_device *vdev)
> > > > > > {
> > > > > > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> > > > > >
> > > > > > vdpa_reset(vdpa);
> > > > > > }
> > > > > >
> > > > > >
> > > > > > and there's no easy way to fix this, kernel can't recover
> > > > > > from a reset failure e.g. during driver unbind.
> > > > > >
> > > > >
> > > > > Yes, but it should be safe with the protection of software IOTLB even
> > > > > if the reset() fails during driver unbind.
> > > > >
> > > > > Thanks,
> > > > > Yongji
> > > >
> > > > Hmm. I don't see it.
> > > > What exactly will happen? What prevents device from poking at
> > > > memory after reset? Note that dma unmap in e.g. del_vqs happens
> > > > too late.
> > >
> > > But I didn't see any problems with touching the memory for virtqueues.
> >
> > Drivers make the assumption that after reset returns no new
> > buffers will be consumed. For example a bunch of drivers
> > call virtqueue_detach_unused_buf.
> 
> I'm not sure if I get your point. But it looks like
> virtqueue_detach_unused_buf() will check the driver's metadata first
> rather than read the memory from virtqueue.
> 
> > I can't say whether block makes this assumption anywhere.
> > Needs careful auditing.
> >
> > > The memory should not be freed after dma unmap?
> >
> > But unmap does not happen until after the reset.
> >
> 
> I mean the memory is totally allocated and controlled by the VDUSE
> driver. The VDUSE driver will not return them to the buddy system
> unless userspace unmap it.

Right. But what stops VDUSE from poking at memory after
reset failed?



> >
> > > And the memory for the bounce buffer should also be safe to be
> > > accessed by userspace in this case.
> > >
> > > > And what about e.g. interrupts?
> > > > E.g. we have this:
> > > >
> > > > /* Virtqueues are stopped, nothing can use vblk->vdev anymore. 
> > > > */
> > > > vblk->vdev = NULL;
> > > >
> > > > and this is no longer true at this point.
> > > >
> > >
> > > You're right. But I didn't see where the interrupt handler will use
> > > the vblk->vdev.
> >
> > static void virtblk_done(struct virtqueue *vq)
> > {
> > struct virtio_blk *vblk = vq->vdev->priv;
> >
> > vq->vdev is the same as vblk->vdev.
> >
> 
> We will test the vq->ready (will be set to false in del_vqs()) before
> injecting an interrupt in the VDUSE driver. So it should be OK?

Maybe not ...  It's not designed for such asynchronous access, so e.g.
there's no locking or memory ordering around accesses.


> >
> > > So it seems to be not too late to fix it:
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index 5c25ff6483ad..ea41a7389a26 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -665,13 +665,13 @@ static void vduse_vdpa_set_config(struct
> > > vdpa_device *vdpa, unsigned int offset,
> > >  static int vduse_vdpa_reset(struct vdpa_device *vdpa)
> > >  {
> > > struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> > > +   int ret;
> > >
> > > -   if (vduse_dev_set_status(dev, 0))
> > > -   return -EIO;
> > > +   ret = vduse_dev_set_status(dev, 0);
> > >
> > > vduse_dev_r

[PATCH] MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry

2021-09-06 Thread Stefano Garzarella
Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core,
tests, and headers. Move some general vsock stuff from virtio-vsock
entry into this new more general vsock entry.

I've been reviewing and contributing for the last few years,
so I'm available to help maintain this code.

Cc: Dexuan Cui 
Cc: Jorgen Hansen 
Cc: Stefan Hajnoczi 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---

Dexuan, Jorgen, Stefan, would you like to co-maintain or
be added as a reviewer?

Thanks,
Stefano
---
 MAINTAINERS | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4278b389218e..12d535eabf95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19763,18 +19763,11 @@ L:k...@vger.kernel.org
 L: virtualization@lists.linux-foundation.org
 L: net...@vger.kernel.org
 S: Maintained
-F: drivers/net/vsockmon.c
 F: drivers/vhost/vsock.c
 F: include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
-F: include/uapi/linux/vm_sockets_diag.h
-F: include/uapi/linux/vsockmon.h
-F: net/vmw_vsock/af_vsock_tap.c
-F: net/vmw_vsock/diag.c
 F: net/vmw_vsock/virtio_transport.c
 F: net/vmw_vsock/virtio_transport_common.c
-F: net/vmw_vsock/vsock_loopback.c
-F: tools/testing/vsock/
 
 VIRTIO BLOCK AND SCSI DRIVERS
 M: "Michael S. Tsirkin" 
@@ -19970,6 +19963,19 @@ F: drivers/staging/vme/
 F: drivers/vme/
 F: include/linux/vme*
 
+VM SOCKETS (AF_VSOCK)
+M: Stefano Garzarella 
+L: virtualization@lists.linux-foundation.org
+L: net...@vger.kernel.org
+S: Maintained
+F: drivers/net/vsockmon.c
+F: include/net/af_vsock.h
+F: include/uapi/linux/vm_sockets.h
+F: include/uapi/linux/vm_sockets_diag.h
+F: include/uapi/linux/vsockmon.h
+F: net/vmw_vsock/
+F: tools/testing/vsock/
+
 VMWARE BALLOON DRIVER
 M: Nadav Amit 
 M: "VMware, Inc." 
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v5 0/6] virtio/vsock: introduce MSG_EOR flag for SEQPACKET

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 09:33:15AM +0200, Stefano Garzarella wrote:
> On Sun, Sep 05, 2021 at 04:18:52PM -0400, Michael S. Tsirkin wrote:
> > On Sun, Sep 05, 2021 at 07:21:10PM +0300, Arseny Krasnov wrote:
> > > 
> > > On 05.09.2021 19:19, Michael S. Tsirkin wrote:
> > > > On Sun, Sep 05, 2021 at 07:02:44PM +0300, Arseny Krasnov wrote:
> > > >> On 05.09.2021 18:55, Michael S. Tsirkin wrote:
> > > >>> On Fri, Sep 03, 2021 at 03:30:13PM +0300, Arseny Krasnov wrote:
> > >   This patchset implements support of MSG_EOR bit for SEQPACKET
> > >  AF_VSOCK sockets over virtio transport.
> > >   First we need to define 'messages' and 'records' like this:
> > >  Message is result of sending calls: 'write()', 'send()', 'sendmsg()'
> > >  etc. It has fixed maximum length, and it bounds are visible using
> > >  return from receive calls: 'read()', 'recv()', 'recvmsg()' etc.
> > >  Current implementation based on message definition above.
> > >   Record has unlimited length, it consists of multiple message,
> > >  and bounds of record are visible via MSG_EOR flag returned from
> > >  'recvmsg()' call. Sender passes MSG_EOR to sending system call and
> > >  receiver will see MSG_EOR when corresponding message will be 
> > >  processed.
> > >   Idea of patchset comes from POSIX: it says that SEQPACKET
> > >  supports record boundaries which are visible for receiver using
> > >  MSG_EOR bit. So, it looks like MSG_EOR is enough thing for SEQPACKET
> > >  and we don't need to maintain boundaries of corresponding send -
> > >  receive system calls. But, for 'sendXXX()' and 'recXXX()' POSIX says,
> > >  that all these calls operates with messages, e.g. 'sendXXX()' sends
> > >  message, while 'recXXX()' reads messages and for SEQPACKET, 
> > >  'recXXX()'
> > >  must read one entire message from socket, dropping all out of size
> > >  bytes. Thus, both message boundaries and MSG_EOR bit must be 
> > >  supported
> > >  to follow POSIX rules.
> > >   To support MSG_EOR new bit was added along with existing
> > >  'VIRTIO_VSOCK_SEQ_EOR': 'VIRTIO_VSOCK_SEQ_EOM'(end-of-message) - now 
> > >  it
> > >  works in the same way as 'VIRTIO_VSOCK_SEQ_EOR'. But 
> > >  'VIRTIO_VSOCK_SEQ_EOR'
> > >  is used to mark 'MSG_EOR' bit passed from userspace.
> > >   This patchset includes simple test for MSG_EOR.
> > > >>> I'm prepared to merge this for this window,
> > > >>> but I'm not sure who's supposed to ack the net/vmw_vsock/af_vsock.c
> > > >>> bits. It's a harmless variable renaming so maybe it does not matter.
> > > >>>
> > > >>> The rest is virtio stuff so I guess my tree is ok.
> > > >>>
> > > >>> Objections, anyone?
> > > >> https://lkml.org/lkml/2021/9/3/76 this is v4. It is same as v5 in 
> > > >> af_vsock.c changes.
> > > >>
> > > >> It has Reviewed by from Stefano Garzarella.
> > > > Is Stefano the maintainer for af_vsock then?
> > > > I wasn't sure.
> 
> I'm maintaining virtio-vsock stuff, but I'm reviewing most of the af_vsock
> patches. We don't have an entry for it in MAINTAINERS, maybe we should.

Yea, please add that. And the test I guess?
It's now Dave and while he's great as we all know,
reducing the load on him is a good thing to do.

> > > Ack, let's wait for maintainer's comment
> > 
> > 
> > The specific patch is a trivial variable renaming so
> > I parked this in my tree for now, will merge unless I
> > hear any objections in the next couple of days.
> 
> I agree, I think your tree is fine, since this series is mostly about
> virtio-vsock.
> 
> Thanks,
> Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13 05/13] vdpa: Add reset callback in vdpa_config_ops

2021-09-06 Thread Michael S. Tsirkin
On Mon, Sep 06, 2021 at 03:06:44PM +0800, Yongji Xie wrote:
> On Mon, Sep 6, 2021 at 2:37 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 06, 2021 at 02:09:25PM +0800, Yongji Xie wrote:
> > > On Mon, Sep 6, 2021 at 1:56 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Aug 31, 2021 at 06:36:26PM +0800, Xie Yongji wrote:
> > > > > This adds a new callback to support device specific reset
> > > > > behavior. The vdpa bus driver will call the reset function
> > > > > instead of setting status to zero during resetting.
> > > > >
> > > > > Signed-off-by: Xie Yongji 
> > > >
> > > >
> > > > This does gloss over a significant change though:
> > > >
> > > >
> > > > > ---
> > > > > @@ -348,12 +352,12 @@ static inline struct device 
> > > > > *vdpa_get_dma_dev(struct vdpa_device *vdev)
> > > > >   return vdev->dma_dev;
> > > > >  }
> > > > >
> > > > > -static inline void vdpa_reset(struct vdpa_device *vdev)
> > > > > +static inline int vdpa_reset(struct vdpa_device *vdev)
> > > > >  {
> > > > >   const struct vdpa_config_ops *ops = vdev->config;
> > > > >
> > > > >   vdev->features_valid = false;
> > > > > - ops->set_status(vdev, 0);
> > > > > + return ops->reset(vdev);
> > > > >  }
> > > > >
> > > > >  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 
> > > > > features)
> > > >
> > > >
> > > > Unfortunately this breaks virtio_vdpa:
> > > >
> > > >
> > > > static void virtio_vdpa_reset(struct virtio_device *vdev)
> > > > {
> > > > struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> > > >
> > > > vdpa_reset(vdpa);
> > > > }
> > > >
> > > >
> > > > and there's no easy way to fix this, kernel can't recover
> > > > from a reset failure e.g. during driver unbind.
> > > >
> > >
> > > Yes, but it should be safe with the protection of software IOTLB even
> > > if the reset() fails during driver unbind.
> > >
> > > Thanks,
> > > Yongji
> >
> > Hmm. I don't see it.
> > What exactly will happen? What prevents device from poking at
> > memory after reset? Note that dma unmap in e.g. del_vqs happens
> > too late.
> 
> But I didn't see any problems with touching the memory for virtqueues.

Drivers make the assumption that after reset returns no new
buffers will be consumed. For example a bunch of drivers
call virtqueue_detach_unused_buf.
I can't say whether block makes this assumption anywhere.
Needs careful auditing.

> The memory should not be freed after dma unmap?

But unmap does not happen until after the reset.


> And the memory for the bounce buffer should also be safe to be
> accessed by userspace in this case.
> 
> > And what about e.g. interrupts?
> > E.g. we have this:
> >
> > /* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
> > vblk->vdev = NULL;
> >
> > and this is no longer true at this point.
> >
> 
> You're right. But I didn't see where the interrupt handler will use
> the vblk->vdev.

static void virtblk_done(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;

vq->vdev is the same as vblk->vdev.


> So it seems to be not too late to fix it:
> 
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 5c25ff6483ad..ea41a7389a26 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -665,13 +665,13 @@ static void vduse_vdpa_set_config(struct
> vdpa_device *vdpa, unsigned int offset,
>  static int vduse_vdpa_reset(struct vdpa_device *vdpa)
>  {
> struct vduse_dev *dev = vdpa_to_vduse(vdpa);
> +   int ret;
> 
> -   if (vduse_dev_set_status(dev, 0))
> -   return -EIO;
> +   ret = vduse_dev_set_status(dev, 0);
> 
> vduse_dev_reset(dev);
> 
> -   return 0;
> +   return ret;
>  }
> 
>  static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa)
> 
> Thanks,
> Yongji

Needs some comments to explain why it's done like this.

BTW device is generally wedged at this point right?
E.g. if reset during initialization fails, userspace
will still get the reset at some later point and be
confused ...

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v5 0/6] virtio/vsock: introduce MSG_EOR flag for SEQPACKET

2021-09-06 Thread Stefano Garzarella

On Sun, Sep 05, 2021 at 04:18:52PM -0400, Michael S. Tsirkin wrote:

On Sun, Sep 05, 2021 at 07:21:10PM +0300, Arseny Krasnov wrote:


On 05.09.2021 19:19, Michael S. Tsirkin wrote:
> On Sun, Sep 05, 2021 at 07:02:44PM +0300, Arseny Krasnov wrote:
>> On 05.09.2021 18:55, Michael S. Tsirkin wrote:
>>> On Fri, Sep 03, 2021 at 03:30:13PM +0300, Arseny Krasnov wrote:
This patchset implements support of MSG_EOR bit for SEQPACKET
 AF_VSOCK sockets over virtio transport.
First we need to define 'messages' and 'records' like this:
 Message is result of sending calls: 'write()', 'send()', 'sendmsg()'
 etc. It has fixed maximum length, and it bounds are visible using
 return from receive calls: 'read()', 'recv()', 'recvmsg()' etc.
 Current implementation based on message definition above.
Record has unlimited length, it consists of multiple message,
 and bounds of record are visible via MSG_EOR flag returned from
 'recvmsg()' call. Sender passes MSG_EOR to sending system call and
 receiver will see MSG_EOR when corresponding message will be processed.
Idea of patchset comes from POSIX: it says that SEQPACKET
 supports record boundaries which are visible for receiver using
 MSG_EOR bit. So, it looks like MSG_EOR is enough thing for SEQPACKET
 and we don't need to maintain boundaries of corresponding send -
 receive system calls. But, for 'sendXXX()' and 'recXXX()' POSIX says,
 that all these calls operates with messages, e.g. 'sendXXX()' sends
 message, while 'recXXX()' reads messages and for SEQPACKET, 'recXXX()'
 must read one entire message from socket, dropping all out of size
 bytes. Thus, both message boundaries and MSG_EOR bit must be supported
 to follow POSIX rules.
To support MSG_EOR new bit was added along with existing
 'VIRTIO_VSOCK_SEQ_EOR': 'VIRTIO_VSOCK_SEQ_EOM'(end-of-message) - now it
 works in the same way as 'VIRTIO_VSOCK_SEQ_EOR'. But 'VIRTIO_VSOCK_SEQ_EOR'
 is used to mark 'MSG_EOR' bit passed from userspace.
This patchset includes simple test for MSG_EOR.
>>> I'm prepared to merge this for this window,
>>> but I'm not sure who's supposed to ack the net/vmw_vsock/af_vsock.c
>>> bits. It's a harmless variable renaming so maybe it does not matter.
>>>
>>> The rest is virtio stuff so I guess my tree is ok.
>>>
>>> Objections, anyone?
>> https://lkml.org/lkml/2021/9/3/76 this is v4. It is same as v5 in af_vsock.c 
changes.
>>
>> It has Reviewed by from Stefano Garzarella.
> Is Stefano the maintainer for af_vsock then?
> I wasn't sure.


I'm maintaining virtio-vsock stuff, but I'm reviewing most of the 
af_vsock patches. We don't have an entry for it in MAINTAINERS, maybe we 
should.



Ack, let's wait for maintainer's comment



The specific patch is a trivial variable renaming so
I parked this in my tree for now, will merge unless I
hear any objections in the next couple of days.


I agree, I think your tree is fine, since this series is mostly about 
virtio-vsock.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization