from:"Raphael Norwitz"

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Raphael Norwitz




> On Nov 21, 2022, at 5:11 AM, Stefano Garzarella  wrote:
> 
> Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> 
> This feature is not currently emulated by QEMU, so for vhost and
> vhost-user devices we need to make sure it is supported by the offloaded
> device emulation (in-kernel or in another process).
> To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> passed to vhost_get_features(). This way it will be masked if the device
> does not support it.
> 
> This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> and then also tested with vhost-user-rng which confirmed the same issue.
> They fail when sending features through VHOST_SET_FEATURES ioctl or
> VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> by the guest (Linux >= v6.0), but not supported by the device.
> 
> Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> Signed-off-by: Stefano Garzarella 

Looks good. For vhost-user-blk and vhost-user-scsi:

Acked-by: Raphael Norwitz

Re: [PATCH for-7.2] vhost: enable vrings in vhost_dev_start() for vhost-user devices

2022-11-23 Thread Raphael Norwitz



> On Nov 23, 2022, at 8:16 AM, Stefano Garzarella  wrote:
> 
> Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
> properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
> backend, but we forgot to enable vrings as specified in
> docs/interop/vhost-user.rst:
> 
>If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
>ring starts directly in the enabled state.
> 
>If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
>initialized in a disabled state and is enabled by
>``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
> 
> Some vhost-user front-ends already did this by calling
> vhost_ops.vhost_set_vring_enable() directly:
> - backends/cryptodev-vhost.c
> - hw/net/virtio-net.c
> - hw/virtio/vhost-user-gpio.c

To simplify why not rather change these devices to use the new semantics?

> 
> But most didn't do that, so we would leave the vrings disabled and some
> backends would not work. We observed this issue with the rust version of
> virtiofsd [1], which uses the event loop [2] provided by the
> vhost-user-backend crate where requests are not processed if vring is
> not enabled.
> 
> Let's fix this issue by enabling the vrings in vhost_dev_start() for
> vhost-user front-ends that don't already do this directly. Same thing
> also in vhost_dev_stop() where we disable vrings.
> 
> [1] https://gitlab.com/virtio-fs/virtiofsd
> [2] 
> https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217
> Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
> Reported-by: German Maglione 
> Tested-by: German Maglione 
> Signed-off-by: Stefano Garzarella 

Looks good for vhost-user-blk/vhost-user-scsi.

Acked-by: Raphael Norwitz 

> ---
> include/hw/virtio/vhost.h  |  6 +++--
> backends/cryptodev-vhost.c |  4 ++--
> backends/vhost-user.c  |  4 ++--
> hw/block/vhost-user-blk.c  |  4 ++--
> hw/net/vhost_net.c |  8 +++
> hw/scsi/vhost-scsi-common.c|  4 ++--
> hw/virtio/vhost-user-fs.c  |  4 ++--
> hw/virtio/vhost-user-gpio.c|  4 ++--
> hw/virtio/vhost-user-i2c.c |  4 ++--
> hw/virtio/vhost-user-rng.c |  4 ++--
> hw/virtio/vhost-vsock-common.c |  4 ++--
> hw/virtio/vhost.c  | 44 ++
> hw/virtio/trace-events |  4 ++--
> 13 files changed, 67 insertions(+), 31 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 353252ac3e..67a6807fac 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -184,24 +184,26 @@ static inline bool vhost_dev_is_started(struct 
> vhost_dev *hdev)
>  * vhost_dev_start() - start the vhost device
>  * @hdev: common vhost_dev structure
>  * @vdev: the VirtIODevice structure
> + * @vrings: true to have vrings enabled in this call
>  *
>  * Starts the vhost device. From this point VirtIO feature negotiation
>  * can start and the device can start processing VirtIO transactions.
>  *
>  * Return: 0 on success, < 0 on error.
>  */
> -int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
> +int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
> 
> /**
>  * vhost_dev_stop() - stop the vhost device
>  * @hdev: common vhost_dev structure
>  * @vdev: the VirtIODevice structure
> + * @vrings: true to have vrings disabled in this call
>  *
>  * Stop the vhost device. After the device is stopped the notifiers
>  * can be disabled (@vhost_dev_disable_notifiers) and the device can
>  * be torn down (@vhost_dev_cleanup).
>  */
> -void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
> +void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings);
> 
> /**
>  * DOC: vhost device configuration handling
> diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
> index bc13e466b4..572f87b3be 100644
> --- a/backends/cryptodev-vhost.c
> +++ b/backends/cryptodev-vhost.c
> @@ -94,7 +94,7 @@ cryptodev_vhost_start_one(CryptoDevBackendVhost *crypto,
> goto fail_notifiers;
> }
> 
> -r = vhost_dev_start(&crypto->dev, dev);
> +r = vhost_dev_start(&crypto->dev, dev, false);
> if (r < 0) {
> goto fail_start;
> }
> @@ -111,7 +111,7 @@ static void
> cryptodev_vhost_stop_one(CryptoDevBackendVhost *crypto,
>  VirtIODevice *dev)
> {
> -vhost_dev_stop(&crypto->dev, dev);
> +vhost_dev_stop(&crypto->dev, dev, false);
> vhost_dev_disable_notifiers(&crypto->dev, dev);
> }
> 
> di

Re: [PATCH for-7.2] vhost: enable vrings in vhost_dev_start() for vhost-user devices

2022-11-24 Thread Raphael Norwitz




> On Nov 24, 2022, at 2:54 AM, Stefano Garzarella  wrote:
> 
> On Thu, Nov 24, 2022 at 01:50:19AM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 24, 2022 at 12:19:25AM +, Raphael Norwitz wrote:
>>> 
>>> > On Nov 23, 2022, at 8:16 AM, Stefano Garzarella  
>>> > wrote:
>>> >
>>> > Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
>>> > properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
>>> > backend, but we forgot to enable vrings as specified in
>>> > docs/interop/vhost-user.rst:
>>> >
>>> >If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
>>> >ring starts directly in the enabled state.
>>> >
>>> >If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
>>> >initialized in a disabled state and is enabled by
>>> >``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
>>> >
>>> > Some vhost-user front-ends already did this by calling
>>> > vhost_ops.vhost_set_vring_enable() directly:
>>> > - backends/cryptodev-vhost.c
>>> > - hw/net/virtio-net.c
>>> > - hw/virtio/vhost-user-gpio.c
>>> 
>>> To simplify why not rather change these devices to use the new semantics?
> 
> Maybe the only one I wouldn't be scared of is vhost-user-gpio, but for 
> example vhost-net would require a lot of changes, maybe better after the 
> release.
> 
> For example, we could do like vhost-vdpa and call SET_VRING_ENABLE in the 
> VhostOps.vhost_dev_start callback of vhost-user.c, but I think it's too risky 
> to do that now.
> 
>> 
>> Granted this is already scary enough for this release.
> 
> Yeah, I tried to touch as little as possible but I'm scared too, I just 
> haven't found a better solution for now :-(
> 

Sure - no need to force a more disruptive change in right before the release. 
If anything can be simplified later.

>> 
>>> >
>>> > But most didn't do that, so we would leave the vrings disabled and some
>>> > backends would not work. We observed this issue with the rust version of
>>> > virtiofsd [1], which uses the event loop [2] provided by the
>>> > vhost-user-backend crate where requests are not processed if vring is
>>> > not enabled.
>>> >
>>> > Let's fix this issue by enabling the vrings in vhost_dev_start() for
>>> > vhost-user front-ends that don't already do this directly. Same thing
>>> > also in vhost_dev_stop() where we disable vrings.
>>> >
>>> > [1] 
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_virtio-2Dfs_virtiofsd&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=In4gmR1pGzKB8G5p6LUrWqkSMec2L5EtXZow_FZNJZk&m=PcC4TEq5C80Knek-ScCNI26rQ13h0n3QEMNNhc-ENd7Txd8wHYqwC1TYfXW_hYor&s=5pdxt8m4-ks8VB2tRSXQV05kdfdP50iy-aAxuGe-Ffc&e=
>>> >  > [2] 
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rust-2Dvmm_vhost_blob_240fc2966_crates_vhost-2Duser-2Dbackend_src_event-5Floop.rs-23L217&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=In4gmR1pGzKB8G5p6LUrWqkSMec2L5EtXZow_FZNJZk&m=PcC4TEq5C80Knek-ScCNI26rQ13h0n3QEMNNhc-ENd7Txd8wHYqwC1TYfXW_hYor&s=-3NUG1pPKN-FwUeDkuu52roXoPoeLR1y4gjrddHUz2U&e=
>>> >  > Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in 
>>> > features")
>>> > Reported-by: German Maglione 
>>> > Tested-by: German Maglione 
>>> > Signed-off-by: Stefano Garzarella 
>>> 
>>> Looks good for vhost-user-blk/vhost-user-scsi.
>>> 
>>> Acked-by: Raphael Norwitz 
> 
> Thanks for the review!
> Stefano

Re: [PATCH v3 7/7] hw/virtio: generalise CHR_EVENT_CLOSED handling

2022-11-28 Thread Raphael Norwitz

> On Nov 28, 2022, at 11:41 AM, Alex Bennée  wrote:
> 
> ..and use for both virtio-user-blk and virtio-user-gpio. This avoids
> the circular close by deferring shutdown due to disconnection until a
> later point. virtio-user-blk already had this mechanism in place so

The mechanism was originally copied from virtio-net so we should probably fix 
it there too. AFAICT calling vhost_user_async_close() should work in 
net_vhost_user_event().

Otherwise the code looks good modulo a few nits. Happy to see the duplicated 
logic is being generalized.


> generalise it as a vhost-user helper function and use for both blk and
> gpio devices.
> 
> While we are at it we also fix up vhost-user-gpio to re-establish the
> event handler after close down so we can reconnect later.
> 
> Signed-off-by: Alex Bennée 
> ---
> include/hw/virtio/vhost-user.h | 18 +
> hw/block/vhost-user-blk.c  | 41 +++-
> hw/virtio/vhost-user-gpio.c| 11 +-
> hw/virtio/vhost-user.c | 71 ++
> 4 files changed, 104 insertions(+), 37 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index c6e693cd3f..191216a74f 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -68,4 +68,22 @@ bool vhost_user_init(VhostUserState *user, CharBackend 
> *chr, Error **errp);
>  */
> void vhost_user_cleanup(VhostUserState *user);
> 
> +/**
> + * vhost_user_async_close() - cleanup vhost-user post connection drop
> + * @d: DeviceState for the associated device (passed to callback)
> + * @chardev: the CharBackend associated with the connection
> + * @vhost: the common vhost device
> + * @cb: the user callback function to complete the clean-up
> + *
> + * This function is used to handle the shutdown of a vhost-user
> + * connection to a backend. We handle this centrally to make sure we
> + * do all the steps and handle potential races due to VM shutdowns.
> + * Once the connection is disabled we call a backhalf to ensure
> + */
> +typedef void (*vu_async_close_fn)(DeviceState *cb);
> +
> +void vhost_user_async_close(DeviceState *d,
> +CharBackend *chardev, struct vhost_dev *vhost,
> +vu_async_close_fn cb);
> +
> #endif
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 1177064631..aff4d2b8cb 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -369,17 +369,10 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> vhost_user_blk_stop(vdev);
> 
> vhost_dev_cleanup(&s->dev);
> -}
> 
> -static void vhost_user_blk_chr_closed_bh(void *opaque)
> -{
> -DeviceState *dev = opaque;
> -VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> -VHostUserBlk *s = VHOST_USER_BLK(vdev);
> -
> -vhost_user_blk_disconnect(dev);
> +/* Re-instate the event handler for new connections */
> qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> - NULL, opaque, NULL, true);
> + NULL, dev, NULL, true);
> }
> 
> static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> @@ -398,33 +391,9 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
> }
> break;
> case CHR_EVENT_CLOSED:
> -if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> -/*
> - * A close event may happen during a read/write, but vhost
> - * code assumes the vhost_dev remains setup, so delay the
> - * stop & clear.
> - */
> -AioContext *ctx = qemu_get_current_aio_context();
> -
> -qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
> -NULL, NULL, false);
> -aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, 
> opaque);
> -
> -/*
> - * Move vhost device to the stopped state. The vhost-user device
> - * will be clean up and disconnected in BH. This can be useful in
> - * the vhost migration code. If disconnect was caught there is an
> - * option for the general vhost code to get the dev state without
> - * knowing its type (in this case vhost-user).
> - *
> - * FIXME: this is sketchy to be reaching into vhost_dev
> - * now because we are forcing something that implies we
> - * have executed vhost_dev_stop() but that won't happen
> - * until vhost_user_blk_stop() gets called from the bh.
> - * Really this state check should be tracked locally.
> - */
> -s->dev.started = false;
> -}
> +/* defer close until later to avoid circular close */
> +vhost_user_async_close(dev, &s->chardev, &s->dev,
> +   vhost_user_blk_disconnect);
> break;
> case CHR_EVENT_BREAK:
> case CHR_EVEN

Re: [PATCH v3 7/7] hw/virtio: generalise CHR_EVENT_CLOSED handling

2022-11-29 Thread Raphael Norwitz

> On Nov 29, 2022, at 12:30 AM, Michael S. Tsirkin  wrote:
> 
> On Tue, Nov 29, 2022 at 05:18:58AM +, Raphael Norwitz wrote:
>>> On Nov 28, 2022, at 11:41 AM, Alex Bennée  wrote:
>>> 
>>> ..and use for both virtio-user-blk and virtio-user-gpio. This avoids
>>> the circular close by deferring shutdown due to disconnection until a
>>> later point. virtio-user-blk already had this mechanism in place so
>> 
>> The mechanism was originally copied from virtio-net so we should probably 
>> fix it there too. AFAICT calling vhost_user_async_close() should work in 
>> net_vhost_user_event().
>> 
>> Otherwise the code looks good modulo a few nits. Happy to see the duplicated 
>> logic is being generalized.
> 
> If you do, separate patch pls and does not have to block this series.

If the series is urgent my comments can be addressed later.

Reviewed-by: Raphael Norwitz 

> 
>> 
>>> generalise it as a vhost-user helper function and use for both blk and
>>> gpio devices.
>>> 
>>> While we are at it we also fix up vhost-user-gpio to re-establish the
>>> event handler after close down so we can reconnect later.
>>> 
>>> Signed-off-by: Alex Bennée 
>>> ---
>>> include/hw/virtio/vhost-user.h | 18 +
>>> hw/block/vhost-user-blk.c  | 41 +++-
>>> hw/virtio/vhost-user-gpio.c| 11 +-
>>> hw/virtio/vhost-user.c | 71 ++
>>> 4 files changed, 104 insertions(+), 37 deletions(-)
>>> 
>>> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
>>> index c6e693cd3f..191216a74f 100644
>>> --- a/include/hw/virtio/vhost-user.h
>>> +++ b/include/hw/virtio/vhost-user.h
>>> @@ -68,4 +68,22 @@ bool vhost_user_init(VhostUserState *user, CharBackend 
>>> *chr, Error **errp);
>>> */
>>> void vhost_user_cleanup(VhostUserState *user);
>>> 
>>> +/**
>>> + * vhost_user_async_close() - cleanup vhost-user post connection drop
>>> + * @d: DeviceState for the associated device (passed to callback)
>>> + * @chardev: the CharBackend associated with the connection
>>> + * @vhost: the common vhost device
>>> + * @cb: the user callback function to complete the clean-up
>>> + *
>>> + * This function is used to handle the shutdown of a vhost-user
>>> + * connection to a backend. We handle this centrally to make sure we
>>> + * do all the steps and handle potential races due to VM shutdowns.
>>> + * Once the connection is disabled we call a backhalf to ensure
>>> + */
>>> +typedef void (*vu_async_close_fn)(DeviceState *cb);
>>> +
>>> +void vhost_user_async_close(DeviceState *d,
>>> +CharBackend *chardev, struct vhost_dev *vhost,
>>> +vu_async_close_fn cb);
>>> +
>>> #endif
>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>> index 1177064631..aff4d2b8cb 100644
>>> --- a/hw/block/vhost-user-blk.c
>>> +++ b/hw/block/vhost-user-blk.c
>>> @@ -369,17 +369,10 @@ static void vhost_user_blk_disconnect(DeviceState 
>>> *dev)
>>>vhost_user_blk_stop(vdev);
>>> 
>>>vhost_dev_cleanup(&s->dev);
>>> -}
>>> 
>>> -static void vhost_user_blk_chr_closed_bh(void *opaque)
>>> -{
>>> -DeviceState *dev = opaque;
>>> -VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>> -VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>> -
>>> -vhost_user_blk_disconnect(dev);
>>> +/* Re-instate the event handler for new connections */
>>>qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
>>> - NULL, opaque, NULL, true);
>>> + NULL, dev, NULL, true);
>>> }
>>> 
>>> static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>>> @@ -398,33 +391,9 @@ static void vhost_user_blk_event(void *opaque, 
>>> QEMUChrEvent event)
>>>}
>>>break;
>>>case CHR_EVENT_CLOSED:
>>> -if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>>> -/*
>>> - * A close event may happen during a read/write, but vhost
>>> - * code assumes the vhost_dev remains setup, so delay the
>>> - * stop & clear.
>>> - */
>>> -AioContext *ctx = qemu_get_curren

Re: [PATCH] contrib/vhost-user-blk: Clean up deallocation of VuVirtqElement

2022-07-27 Thread Raphael Norwitz

On Tue, Jul 26, 2022 at 03:57:42PM +0100, Peter Maydell wrote:
> On Fri, 1 Jul 2022 at 06:41, Markus Armbruster  wrote:
> > Could we use a contrib/README with an explanation what "contrib" means,
> > and how to build and use the stuff there?
> 
> I would rather we got rid of contrib/ entirely. Our git repo
> should contain things we care about enough to really support
> and believe in, in which case they should be in top level
> directories matching what they are (eg tools/). If we don't
> believe in these things enough to really support them, then
> we should drop them, and let those who do care maintain them
> as out-of-tree tools if they like.
>

I can't speak for a lot of stuff in contrib/ but I find the vhost-user
backends like vhost-user-blk and vhost-user-scsi helpful for testing and
development. I would like to keep maintaining those two at least.

> subprojects/ is similarly vague.
> 

Again, I can't say much for other stuff in subprojects/ but
libvhost-user is clearly important. Maybe we move libvhost-user to
another directroy and the libvhost-user based backends there too?

> thanks
> -- PMM

Re: [PATCH v4 11/22] hw/virtio: move vhd->started check into helper and add FIXME

2022-08-07 Thread Raphael Norwitz

On Tue, Aug 02, 2022 at 10:49:59AM +0100, Alex Bennée wrote:
> The `started` field is manipulated internally within the vhost code
> except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
> dev state in the vhost_migration_log routine). Mark that as a FIXME
> because it introduces a potential race. I think the referenced fix
> should be tracking its state locally.

I don't think we can track the state locally. As described in the commit
message for f5b22d06fb, the state is used by vhost code in the
vhost_migration_log() function so we probably need something at the
vhost level. I do agree we shouldn't re-use vdev->started.

Maybe we should add another 'active' variable in vhost_dev? I'm happy
to send a patch for that.

Until we agree on a better solution I'm happy with the FIXME.

Reviewed-by: Raphael Norwitz 

> 
> Signed-off-by: Alex Bennée 
> ---
>  include/hw/virtio/vhost.h  | 12 
>  hw/block/vhost-user-blk.c  | 10 --
>  hw/scsi/vhost-scsi.c   |  4 ++--
>  hw/scsi/vhost-user-scsi.c  |  2 +-
>  hw/virtio/vhost-user-fs.c  |  3 ++-
>  hw/virtio/vhost-user-i2c.c |  4 ++--
>  hw/virtio/vhost-user-rng.c |  4 ++--
>  hw/virtio/vhost-user-vsock.c   |  2 +-
>  hw/virtio/vhost-vsock-common.c |  3 ++-
>  hw/virtio/vhost-vsock.c|  2 +-
>  10 files changed, 33 insertions(+), 13 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 586c5457e2..61b957e927 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -94,6 +94,7 @@ struct vhost_dev {
>  uint64_t protocol_features;
>  uint64_t max_queues;
>  uint64_t backend_cap;
> +/* @started: is the vhost device started? */
>  bool started;
>  bool log_enabled;
>  uint64_t log_size;
> @@ -165,6 +166,17 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev);
>   */
>  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  
> +/**
> + * vhost_dev_is_started() - report status of vhost device
> + * @hdev: common vhost_dev structure
> + *
> + * Return the started status of the vhost device
> + */
> +static inline bool vhost_dev_is_started(struct vhost_dev *hdev)
> +{
> +return hdev->started;
> +}
> +
>  /**
>   * vhost_dev_start() - start the vhost device
>   * @hdev: common vhost_dev structure
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9117222456..2bba42478d 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -229,7 +229,7 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  return;
>  }
>  
> -if (s->dev.started == should_start) {
> +if (vhost_dev_is_started(&s->dev) == should_start) {
>  return;
>  }
>  
> @@ -286,7 +286,7 @@ static void vhost_user_blk_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  return;
>  }
>  
> -if (s->dev.started) {
> +if (vhost_dev_is_started(&s->dev)) {
>  return;
>  }
>  
> @@ -415,6 +415,12 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>   * the vhost migration code. If disconnect was caught there is an
>   * option for the general vhost code to get the dev state without
>   * knowing its type (in this case vhost-user).
> + *
> + * FIXME: this is sketchy to be reaching into vhost_dev
> + * now because we are forcing something that implies we
> + * have executed vhost_dev_stop() but that won't happen
> + * until vhost_user_blk_stop() gets called from the bh.
> + * Really this state check should be tracked locally.
>   */
>  s->dev.started = false;
>  }
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index 3059068175..bdf337a7a2 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -120,7 +120,7 @@ static void vhost_scsi_set_status(VirtIODevice *vdev, 
> uint8_t val)
>  start = false;
>  }
>  
> -if (vsc->dev.started == start) {
> +if (vhost_dev_is_started(&vsc->dev) == start) {
>  return;
>  }
>  
> @@ -147,7 +147,7 @@ static int vhost_scsi_pre_save(void *opaque)
>  
>  /* At this point, backend must be stopped, otherwise
>   * it might keep writing to memory. */
> -assert(!vsc->dev.started);
> +assert(!vhost_dev_is_started(&vsc->dev));
>  
>  return 0;
>  }
> diff --git a/hw/scsi/vhost-user

Re: [PATCH] vhost-user-scsi: support reconnect to backend

2023-07-24 Thread Raphael Norwitz

Very excited to see this. High level looks good modulo a few small things.

My major concern is around existing vhost-user-scsi backends which don’t 
support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the reconnect 
behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We may want to do 
the same for vhost-user-blk.

The question is then what happens if the check is false. IIUC without an 
inflight FD, if a device processes requests out of order, it’s not safe to 
continue execution on reconnect, as there’s no way for the backend to know how 
to replay IO. Should we permanently wedge the device or have QEMU fail out? May 
be nice to have a toggle for this.

> On Jul 21, 2023, at 6:51 AM, Li Feng  wrote:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 
> ---
> hw/block/vhost-user-blk.c   |   2 -
> hw/scsi/vhost-scsi-common.c |  27 ++---
> hw/scsi/vhost-user-scsi.c   | 163 +---
> include/hw/virtio/vhost-user-scsi.h |   3 +
> include/hw/virtio/vhost.h   |   2 +
> 5 files changed, 165 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index eecf3f7a81..f250c740b5 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -32,8 +32,6 @@
> #include "sysemu/sysemu.h"
> #include "sysemu/runstate.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> -
> static const int user_feature_bits[] = {
> VIRTIO_BLK_F_SIZE_MAX,
> VIRTIO_BLK_F_SEG_MAX,
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c

Why can’t all the vhost-scsi-common stuff be moved to a separate change?

Especially the stuff introduced for vhost-user-blk in 
1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out.

> index a06f01af26..08801886b8 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> 
> vsc->dev.acked_features = vdev->guest_features;
> 
> -assert(vsc->inflight == NULL);
> -vsc->inflight = g_new0(struct vhost_inflight, 1);
> -ret = vhost_dev_get_inflight(&vsc->dev,
> - vs->conf.virtqueue_size,
> - vsc->inflight);
> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
> if (ret < 0) {
> -error_report("Error get inflight: %d", -ret);
> +error_report("Error setting inflight format: %d", -ret);
> goto err_guest_notifiers;
> }
> 
> +if (!vsc->inflight->addr) {
> +ret = vhost_dev_get_inflight(&vsc->dev,
> +vs->conf.virtqueue_size,
> +vsc->inflight);
> +if (ret < 0) {
> +error_report("Error get inflight: %d", -ret);
> +goto err_guest_notifiers;
> +}
> +}
> +
> ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
> if (ret < 0) {
> error_report("Error set inflight: %d", -ret);
> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> return ret;
> 
> err_guest_notifiers:
> -g_free(vsc->inflight);
> -vsc->inflight = NULL;
> -
> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> err_host_notifiers:
> vhost_dev_disable_notifiers(&vsc->dev, vdev);
> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> }
> assert(ret >= 0);
> 

In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight now?

> -if (vsc->inflight) {
> -vhost_dev_free_inflight(vsc->inflight);
> -g_free(vsc->inflight);
> -vsc->inflight = NULL;
> -}
> -
> vhost_dev_disable_notifiers(&vsc->dev, vdev);
> }
> 
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..e0e88b0c42 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -89,14 +89,126 @@ static void vhost_dummy_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
> {
> }
> 
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
> +vsc->dev.vqs = s->vhost_vqs;
> +vsc->dev.vq_index = 0;
> +vsc->dev.backend_features = 0;
> +
> +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0,
> + errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* restore vhost state */

Should this

Re: [PATCH] vhost-user-scsi: support reconnect to backend

2023-07-27 Thread Raphael Norwitz



> On Jul 25, 2023, at 6:19 AM, Li Feng  wrote:
> 
> Thanks for your comments.
> 
>> 2023年7月25日 上午1:21，Raphael Norwitz  写道：
>> 
>> Very excited to see this. High level looks good modulo a few small things.
>> 
>> My major concern is around existing vhost-user-scsi backends which don’t 
>> support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the 
>> reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We 
>> may want to do the same for vhost-user-blk.
>> 
>> The question is then what happens if the check is false. IIUC without an 
>> inflight FD, if a device processes requests out of order, it’s not safe to 
>> continue execution on reconnect, as there’s no way for the backend to know 
>> how to replay IO. Should we permanently wedge the device or have QEMU fail 
>> out? May be nice to have a toggle for this.
> 
> Based on what MST said, is there anything else I need to do?

I don’t think so.

>> 
>>> On Jul 21, 2023, at 6:51 AM, Li Feng  wrote:
>>> 
>>> If the backend crashes and restarts, the device is broken.
>>> This patch adds reconnect for vhost-user-scsi.
>>> 
>>> Tested with spdk backend.
>>> 
>>> Signed-off-by: Li Feng 
>>> ---
>>> hw/block/vhost-user-blk.c   |   2 -
>>> hw/scsi/vhost-scsi-common.c |  27 ++---
>>> hw/scsi/vhost-user-scsi.c   | 163 +---
>>> include/hw/virtio/vhost-user-scsi.h |   3 +
>>> include/hw/virtio/vhost.h   |   2 +
>>> 5 files changed, 165 insertions(+), 32 deletions(-)
>>> 
>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>> index eecf3f7a81..f250c740b5 100644
>>> --- a/hw/block/vhost-user-blk.c
>>> +++ b/hw/block/vhost-user-blk.c
>>> @@ -32,8 +32,6 @@
>>> #include "sysemu/sysemu.h"
>>> #include "sysemu/runstate.h"
>>> 
>>> -#define REALIZE_CONNECTION_RETRIES 3
>>> -
>>> static const int user_feature_bits[] = {
>>>VIRTIO_BLK_F_SIZE_MAX,
>>>VIRTIO_BLK_F_SEG_MAX,
>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>> 
>> Why can’t all the vhost-scsi-common stuff be moved to a separate change?
> 
> I will move this code to separate patch.
>> 
>> Especially the stuff introduced for vhost-user-blk in 
>> 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out.
> OK.
> 
>> 
>>> index a06f01af26..08801886b8 100644
>>> --- a/hw/scsi/vhost-scsi-common.c
>>> +++ b/hw/scsi/vhost-scsi-common.c
>>> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>> 
>>>vsc->dev.acked_features = vdev->guest_features;
>>> 
>>> -assert(vsc->inflight == NULL);
>>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>>> -ret = vhost_dev_get_inflight(&vsc->dev,
>>> - vs->conf.virtqueue_size,
>>> - vsc->inflight);
>>> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>>if (ret < 0) {
>>> -error_report("Error get inflight: %d", -ret);
>>> +error_report("Error setting inflight format: %d", -ret);
>>>goto err_guest_notifiers;
>>>}
>>> 
>>> +if (!vsc->inflight->addr) {
>>> +ret = vhost_dev_get_inflight(&vsc->dev,
>>> +vs->conf.virtqueue_size,
>>> +vsc->inflight);
>>> +if (ret < 0) {
>>> +error_report("Error get inflight: %d", -ret);
>>> +goto err_guest_notifiers;
>>> +}
>>> +}
>>> +
>>>ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>>if (ret < 0) {
>>>error_report("Error set inflight: %d", -ret);
>>> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>>return ret;
>>> 
>>> err_guest_notifiers:
>>> -g_free(vsc->inflight);
>>> -vsc->inflight = NULL;
>>> -
>>>k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
>>> err_host_notifiers:
>>>vhost_dev_disable_notifiers(&vsc->dev, vdev);
>>> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>>>}
>>>

Re: [PATCH v2 1/4] vhost: fix the fd leak

2023-07-30 Thread Raphael Norwitz




> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> When the vhost-user reconnect to the backend, the notifer should be
> cleanup. Otherwise, the fd resource will be exhausted.
> 
> Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/virtio/vhost.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index abf0d03c8d..e2f6ffb446 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2044,6 +2044,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
> VirtIODevice *vdev, bool vrings)
> event_notifier_test_and_clear(
> &hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
> event_notifier_test_and_clear(&vdev->config_notifier);
> +event_notifier_cleanup(
> +&hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
> 
> trace_vhost_dev_stop(hdev, vdev->name, vrings);
> 
> -- 
> 2.41.0
>

Re: [PATCH] vhost-user-scsi: support reconnect to backend

2023-07-30 Thread Raphael Norwitz



> On Jul 28, 2023, at 3:48 AM, Li Feng  wrote:
> 
> Thanks for your reply.
> 
>> 2023年7月28日 上午5:21，Raphael Norwitz  写道：
>> 
>> 
>> 
>>> On Jul 25, 2023, at 6:19 AM, Li Feng  wrote:
>>> 
>>> Thanks for your comments.
>>> 
>>>> 2023年7月25日 上午1:21，Raphael Norwitz  写道：
>>>> 
>>>> Very excited to see this. High level looks good modulo a few small things.
>>>> 
>>>> My major concern is around existing vhost-user-scsi backends which don’t 
>>>> support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the 
>>>> reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We 
>>>> may want to do the same for vhost-user-blk.
>>>> 
>>>> The question is then what happens if the check is false. IIUC without an 
>>>> inflight FD, if a device processes requests out of order, it’s not safe to 
>>>> continue execution on reconnect, as there’s no way for the backend to know 
>>>> how to replay IO. Should we permanently wedge the device or have QEMU fail 
>>>> out? May be nice to have a toggle for this.
>>> 
>>> Based on what MST said, is there anything else I need to do?
>> 
>> I don’t think so.
>> 
>>>> 
>>>>> On Jul 21, 2023, at 6:51 AM, Li Feng  wrote:
>>>>> 
>>>>> If the backend crashes and restarts, the device is broken.
>>>>> This patch adds reconnect for vhost-user-scsi.
>>>>> 
>>>>> Tested with spdk backend.
>>>>> 
>>>>> Signed-off-by: Li Feng 
>>>>> ---
>>>>> hw/block/vhost-user-blk.c   |   2 -
>>>>> hw/scsi/vhost-scsi-common.c |  27 ++---
>>>>> hw/scsi/vhost-user-scsi.c   | 163 +---
>>>>> include/hw/virtio/vhost-user-scsi.h |   3 +
>>>>> include/hw/virtio/vhost.h   |   2 +
>>>>> 5 files changed, 165 insertions(+), 32 deletions(-)
>>>>> 
>>>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>>>> index eecf3f7a81..f250c740b5 100644
>>>>> --- a/hw/block/vhost-user-blk.c
>>>>> +++ b/hw/block/vhost-user-blk.c
>>>>> @@ -32,8 +32,6 @@
>>>>> #include "sysemu/sysemu.h"
>>>>> #include "sysemu/runstate.h"
>>>>> 
>>>>> -#define REALIZE_CONNECTION_RETRIES 3
>>>>> -
>>>>> static const int user_feature_bits[] = {
>>>>>   VIRTIO_BLK_F_SIZE_MAX,
>>>>>   VIRTIO_BLK_F_SEG_MAX,
>>>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>>>> 
>>>> Why can’t all the vhost-scsi-common stuff be moved to a separate change?
>>> 
>>> I will move this code to separate patch.
>>>> 
>>>> Especially the stuff introduced for vhost-user-blk in 
>>>> 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out.
>>> OK.
>>> 
>>>> 
>>>>> index a06f01af26..08801886b8 100644
>>>>> --- a/hw/scsi/vhost-scsi-common.c
>>>>> +++ b/hw/scsi/vhost-scsi-common.c
>>>>> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>>>> 
>>>>>   vsc->dev.acked_features = vdev->guest_features;
>>>>> 
>>>>> -assert(vsc->inflight == NULL);
>>>>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>>>>> -ret = vhost_dev_get_inflight(&vsc->dev,
>>>>> - vs->conf.virtqueue_size,
>>>>> - vsc->inflight);
>>>>> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>>>>   if (ret < 0) {
>>>>> -error_report("Error get inflight: %d", -ret);
>>>>> +error_report("Error setting inflight format: %d", -ret);
>>>>>   goto err_guest_notifiers;
>>>>>   }
>>>>> 
>>>>> +if (!vsc->inflight->addr) {
>>>>> +ret = vhost_dev_get_inflight(&vsc->dev,
>>>>> +vs->conf.virtqueue_size,
>>>>> +vsc->inflight);
>>>>> +if (ret < 0) {
>>>>> +error_report("Error get infli

Re: [PATCH v2 3/4] vhost: move and rename the conn retry times

2023-07-30 Thread Raphael Norwitz



> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> Multile devices need this macro, move it to a common header.
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/block/vhost-user-blk.c   | 4 +---
> hw/virtio/vhost-user-gpio.c | 3 +--
> include/hw/virtio/vhost.h   | 2 ++
> 3 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index eecf3f7a81..3c69fa47d5 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -32,8 +32,6 @@
> #include "sysemu/sysemu.h"
> #include "sysemu/runstate.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> -
> static const int user_feature_bits[] = {
> VIRTIO_BLK_F_SIZE_MAX,
> VIRTIO_BLK_F_SEG_MAX,
> @@ -482,7 +480,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
> s->inflight = g_new0(struct vhost_inflight, 1);
> s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> 
> -retries = REALIZE_CONNECTION_RETRIES;
> +retries = VU_REALIZE_CONN_RETRIES;
> assert(!*errp);
> do {
> if (*errp) {
> diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> index 3b013f2d0f..d9979aa5db 100644
> --- a/hw/virtio/vhost-user-gpio.c
> +++ b/hw/virtio/vhost-user-gpio.c
> @@ -15,7 +15,6 @@
> #include "standard-headers/linux/virtio_ids.h"
> #include "trace.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> #define VHOST_NVQS 2
> 
> /* Features required from VirtIO */
> @@ -359,7 +358,7 @@ static void vu_gpio_device_realize(DeviceState *dev, 
> Error **errp)
> qemu_chr_fe_set_handlers(&gpio->chardev, NULL, NULL, vu_gpio_event, NULL,
>  dev, NULL, true);
> 
> -retries = REALIZE_CONNECTION_RETRIES;
> +retries = VU_REALIZE_CONN_RETRIES;
> g_assert(!*errp);
> do {
> if (*errp) {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 6a173cb9fa..ca3131b1af 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -8,6 +8,8 @@
> #define VHOST_F_DEVICE_IOTLB 63
> #define VHOST_USER_F_PROTOCOL_FEATURES 30
> 
> +#define VU_REALIZE_CONN_RETRIES 3
> +
> /* Generic structures common for any vhost based device. */
> 
> struct vhost_inflight {
> -- 
> 2.41.0
>

Re: [PATCH v2 2/4] vhost-user-common: send get_inflight_fd once

2023-07-30 Thread Raphael Norwitz


> On Jul 28, 2023, at 3:49 AM, Li Feng  wrote:
> 
> 
> 
>> 2023年7月28日 下午2:04，Michael S. Tsirkin  写道：
>> 
>> On Tue, Jul 25, 2023 at 06:42:45PM +0800, Li Feng wrote:
>>> Get_inflight_fd is sent only once. When reconnecting to the backend,
>>> qemu sent set_inflight_fd to the backend.
>> 
>> I don't understand what you are trying to say here.
>> Should be:
>> Currently ABCD. This is wrong/unnecessary because EFG. This patch HIJ.
> 
> Thanks, I will reorganize the commit message in v3.
>> 
>>> Signed-off-by: Li Feng 
>>> ---
>>> hw/scsi/vhost-scsi-common.c | 37 ++---
>>> 1 file changed, 18 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>>> index a06f01af26..664adb15b4 100644
>>> --- a/hw/scsi/vhost-scsi-common.c
>>> +++ b/hw/scsi/vhost-scsi-common.c
>>> @@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>> 
>>> vsc->dev.acked_features = vdev->guest_features;
>>> 
>>> -assert(vsc->inflight == NULL);
>>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>>> -ret = vhost_dev_get_inflight(&vsc->dev,
>>> - vs->conf.virtqueue_size,
>>> - vsc->inflight);
>>> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>> if (ret < 0) {
>>> -error_report("Error get inflight: %d", -ret);
>>> +error_report("Error setting inflight format: %d", -ret);
>>> goto err_guest_notifiers;
>>> }
>>> 
>>> -ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>> -if (ret < 0) {
>>> -error_report("Error set inflight: %d", -ret);
>>> -goto err_guest_notifiers;
>>> +if (vsc->inflight) {
>>> +if (!vsc->inflight->addr) {
>>> +ret = vhost_dev_get_inflight(&vsc->dev,
>>> +vs->conf.virtqueue_size,
>>> +vsc->inflight);
>>> +if (ret < 0) {
>>> +error_report("Error get inflight: %d", -ret);
>> 
>> As long as you are fixing this - should be "getting inflight”.
> I will fix it in v3.
>> 
>>> +goto err_guest_notifiers;
>>> +}
>>> +}
>>> +

Looks like you reworked this a bit so to avoid a potential crash if 
vsc->inflight is NULL

Should we fix it for vhost-user-blk too?

>>> +ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>> +if (ret < 0) {
>>> +error_report("Error set inflight: %d", -ret);
>>> +goto err_guest_notifiers;
>>> +}
>>> }
>>> 
>>> ret = vhost_dev_start(&vsc->dev, vdev, true);
>>> @@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>> return ret;
>>> 
>>> err_guest_notifiers:
>>> -g_free(vsc->inflight);
>>> -vsc->inflight = NULL;
>>> -
>>> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
>>> err_host_notifiers:
>>> vhost_dev_disable_notifiers(&vsc->dev, vdev);
>>> @@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>>> }
>>> assert(ret >= 0);
>>> 

As I said before, I think this introduces a leak.

>>> -if (vsc->inflight) {
>>> -vhost_dev_free_inflight(vsc->inflight);
>>> -g_free(vsc->inflight);
>>> -vsc->inflight = NULL;
>>> -}
>>> -
>>> vhost_dev_disable_notifiers(&vsc->dev, vdev);
>>> }
>>> 
>>> -- 
>>> 2.41.0

Re: [PATCH v2 4/4] vhost-user-scsi: support reconnect to backend

2023-07-30 Thread Raphael Norwitz

I don’t think we should be changing any vhost-scsi-common code here. I’d rather 
implement wrappers around vhost_user_scsi_start/stop() around 
vhost_user_scsi_common_start/stop() and check started_vu there.

Otherwise I think this is looking good. 

Glad to see you caught the vhost_user_scsi_handle_ouptut and implemented it 
like vhost-user-blk. Can it go in a separate change?

> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 
> ---
> hw/scsi/vhost-scsi-common.c   |   6 +
> hw/scsi/vhost-user-scsi.c | 220 +++---
> include/hw/virtio/vhost-scsi-common.h |   3 +
> include/hw/virtio/vhost-user-scsi.h   |   3 +
> 4 files changed, 211 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index 664adb15b4..3fde477eee 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> error_report("Error start vhost dev");
> goto err_guest_notifiers;
> }
> +vsc->started_vu = true;
> 
> /* guest_notifier_mask/pending not used yet, so just unmask
>  * everything here.  virtio-pci will do the right thing by
> @@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> int ret = 0;
> 
> +if (!vsc->started_vu) {
> +return;
> +}
> +vsc->started_vu = false;
> +
> vhost_dev_stop(&vsc->dev, vdev, true);
> 
> if (k->set_guest_notifiers) {
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..bd32dcf999 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -46,20 +46,25 @@ enum VhostUserProtocolFeature {
> static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
> {
> VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +bool should_start = virtio_device_should_start(vdev, status);
> +int ret;
> 
> -if (vhost_dev_is_started(&vsc->dev) == start) {
> +if (!s->connected) {
> return;
> }
> 
> -if (start) {
> -int ret;
> +if (vhost_dev_is_started(&vsc->dev) == should_start) {
> +return;
> +}
> 
> +if (should_start) {
> ret = vhost_scsi_common_start(vsc);
> if (ret < 0) {
> error_report("unable to start vhost-user-scsi: %s", 
> strerror(-ret));
> -exit(1);
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> }
> } else {
> vhost_scsi_common_stop(vsc);
> @@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
> }
> }
> 
> -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> {
> +VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +Error *local_err = NULL;
> +int i, ret;
> +
> +if (!vdev->start_on_kick) {
> +return;
> +}
> +
> +if (!s->connected) {
> +return;
> +}
> +
> +if (vhost_dev_is_started(&vsc->dev)) {
> +return;
> +}
> +
> +/*
> + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> +ret = vhost_scsi_common_start(vsc);
> +if (ret < 0) {
> +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: ");
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> +return;
> +}
> +
> +/* Kick right away to begin processing requests already in vring */
> +for (i = 0; i < vsc->dev.nvqs; i++) {
> +VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> +if (!virtio_queue_get_desc_addr(vdev, i)) {
> +continue;
> +}
> +event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> +}
> +}
> +
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs-

Re: [PATCH v2 2/4] vhost-user-common: send get_inflight_fd once

2023-07-31 Thread Raphael Norwitz



> On Jul 31, 2023, at 7:38 AM, Li Feng  wrote:
> 
> 
> 
>> 2023年7月31日 06:13，Raphael Norwitz  写道：
>> 
>>> 
>>> On Jul 28, 2023, at 3:49 AM, Li Feng  wrote:
>>> 
>>> 
>>> 
>>>> 2023年7月28日 下午2:04，Michael S. Tsirkin  写道：
>>>> 
>>>> On Tue, Jul 25, 2023 at 06:42:45PM +0800, Li Feng wrote:
>>>>> Get_inflight_fd is sent only once. When reconnecting to the backend,
>>>>> qemu sent set_inflight_fd to the backend.
>>>> 
>>>> I don't understand what you are trying to say here.
>>>> Should be:
>>>> Currently ABCD. This is wrong/unnecessary because EFG. This patch HIJ.
>>> 
>>> Thanks, I will reorganize the commit message in v3.
>>>> 
>>>>> Signed-off-by: Li Feng 
>>>>> ---
>>>>> hw/scsi/vhost-scsi-common.c | 37 ++---
>>>>> 1 file changed, 18 insertions(+), 19 deletions(-)
>>>>> 
>>>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>>>>> index a06f01af26..664adb15b4 100644
>>>>> --- a/hw/scsi/vhost-scsi-common.c
>>>>> +++ b/hw/scsi/vhost-scsi-common.c
>>>>> @@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>>>> 
>>>>>vsc->dev.acked_features = vdev->guest_features;
>>>>> 
>>>>> -assert(vsc->inflight == NULL);
>>>>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>>>>> -ret = vhost_dev_get_inflight(&vsc->dev,
>>>>> - vs->conf.virtqueue_size,
>>>>> - vsc->inflight);
>>>>> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>>>>if (ret < 0) {
>>>>> -error_report("Error get inflight: %d", -ret);
>>>>> +error_report("Error setting inflight format: %d", -ret);
>>>>>goto err_guest_notifiers;
>>>>>}
>>>>> 
>>>>> -ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>>>> -if (ret < 0) {
>>>>> -error_report("Error set inflight: %d", -ret);
>>>>> -goto err_guest_notifiers;
>>>>> +if (vsc->inflight) {
>>>>> +if (!vsc->inflight->addr) {
>>>>> +ret = vhost_dev_get_inflight(&vsc->dev,
>>>>> +vs->conf.virtqueue_size,
>>>>> +vsc->inflight);
>>>>> +if (ret < 0) {
>>>>> +error_report("Error get inflight: %d", -ret);
>>>> 
>>>> As long as you are fixing this - should be "getting inflight”.
>>> I will fix it in v3.
>>>> 
>>>>> +goto err_guest_notifiers;
>>>>> +}
>>>>> +}
>>>>> +
>> 
>> Looks like you reworked this a bit so to avoid a potential crash if 
>> vsc->inflight is NULL
>> 
>> Should we fix it for vhost-user-blk too?
>> 
> This check is mainly for the vhost-scsi code, that doesn’t need allocate the 
> inflight memory.
> 
> The vhost-user-blk doesn’t need this check, because there isn't a vhost-blk 
> device that reuse the code.
> 

Makes sense.

>>>>> +ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>>>> +if (ret < 0) {
>>>>> +error_report("Error set inflight: %d", -ret);
>>>>> +goto err_guest_notifiers;
>>>>> +}
>>>>>}
>>>>> 
>>>>>ret = vhost_dev_start(&vsc->dev, vdev, true);
>>>>> @@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>>>>return ret;
>>>>> 
>>>>> err_guest_notifiers:
>>>>> -g_free(vsc->inflight);
>>>>> -vsc->inflight = NULL;
>>>>> -
>>>>>k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
>>>>> err_host_notifiers:
>>>>>vhost_dev_disable_notifiers(&vsc->dev, vdev);
>>>>> @@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>>>>>}
>>>>>assert(ret >= 0);
>>>>> 
>> 
>> As I said before, I think this introduces a leak.
> I have answered in the previous mail.
> 

On re-review I agree it’s fine since vac-inflight isn’t set.

>> 
>>>>> -if (vsc->inflight) {
>>>>> -vhost_dev_free_inflight(vsc->inflight);
>>>>> -g_free(vsc->inflight);
>>>>> -vsc->inflight = NULL;
>>>>> -}
>>>>> -
>>>>>vhost_dev_disable_notifiers(&vsc->dev, vdev);
>>>>> }
>>>>> 
>>>>> -- 
>>>>> 2.41.0

Re: [PATCH v3 2/5] vhost-user-common: send get_inflight_fd once

2023-07-31 Thread Raphael Norwitz




> On Jul 31, 2023, at 8:10 AM, Li Feng  wrote:
> 
> Currently the get_inflight_fd will be sent every time the device is started, 
> and
> the backend will allocate shared memory to save the inflight state. If the
> backend finds that it receives the second get_inflight_fd, it will release the
> previous shared memory, which breaks inflight working logic.
> 
> This patch is a preparation for the following patches.
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/scsi/vhost-scsi-common.c | 37 ++---
> 1 file changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index a06f01af26..a61cd0e907 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> 
> vsc->dev.acked_features = vdev->guest_features;
> 
> -assert(vsc->inflight == NULL);
> -vsc->inflight = g_new0(struct vhost_inflight, 1);
> -ret = vhost_dev_get_inflight(&vsc->dev,
> - vs->conf.virtqueue_size,
> - vsc->inflight);
> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
> if (ret < 0) {
> -error_report("Error get inflight: %d", -ret);
> +error_report("Error setting inflight format: %d", -ret);
> goto err_guest_notifiers;
> }
> 
> -ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
> -if (ret < 0) {
> -error_report("Error set inflight: %d", -ret);
> -goto err_guest_notifiers;
> +if (vsc->inflight) {
> +if (!vsc->inflight->addr) {
> +ret = vhost_dev_get_inflight(&vsc->dev,
> +vs->conf.virtqueue_size,
> +vsc->inflight);
> +if (ret < 0) {
> +error_report("Error getting inflight: %d", -ret);
> +goto err_guest_notifiers;
> +}
> +}
> +
> +ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
> +if (ret < 0) {
> +error_report("Error setting inflight: %d", -ret);
> +goto err_guest_notifiers;
> +}
> }
> 
> ret = vhost_dev_start(&vsc->dev, vdev, true);
> @@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> return ret;
> 
> err_guest_notifiers:
> -g_free(vsc->inflight);
> -vsc->inflight = NULL;
> -
> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> err_host_notifiers:
> vhost_dev_disable_notifiers(&vsc->dev, vdev);
> @@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> }
> assert(ret >= 0);
> 
> -if (vsc->inflight) {
> -vhost_dev_free_inflight(vsc->inflight);
> -g_free(vsc->inflight);
> -vsc->inflight = NULL;
> -}
> -
> vhost_dev_disable_notifiers(&vsc->dev, vdev);
> }
> 
> -- 
> 2.41.0
>

Re: [PATCH v3 4/5] vhost-user-scsi: support reconnect to backend

2023-07-31 Thread Raphael Norwitz




> On Jul 31, 2023, at 8:10 AM, Li Feng  wrote:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/scsi/vhost-user-scsi.c   | 199 +---
> include/hw/virtio/vhost-user-scsi.h |   4 +
> 2 files changed, 184 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..5bf012461b 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -43,26 +43,54 @@ enum VhostUserProtocolFeature {
> VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> };
> 
> +static int vhost_user_scsi_start(VHostUserSCSI *s)
> +{
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +int ret;
> +
> +ret = vhost_scsi_common_start(vsc);
> +s->started_vu = (ret < 0 ? false : true);
> +
> +return ret;
> +}
> +
> +static void vhost_user_scsi_stop(VHostUserSCSI *s)
> +{
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +
> +if (!s->started_vu) {
> +return;
> +}
> +s->started_vu = false;
> +
> +vhost_scsi_common_stop(vsc);
> +}
> +
> static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
> {
> VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +bool should_start = virtio_device_should_start(vdev, status);
> +int ret;
> 
> -if (vhost_dev_is_started(&vsc->dev) == start) {
> +if (!s->connected) {
> return;
> }
> 
> -if (start) {
> -int ret;
> +if (vhost_dev_is_started(&vsc->dev) == should_start) {
> +return;
> +}
> 
> -ret = vhost_scsi_common_start(vsc);
> +if (should_start) {
> +ret = vhost_user_scsi_start(s);
> if (ret < 0) {
> error_report("unable to start vhost-user-scsi: %s", 
> strerror(-ret));
> -exit(1);
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> }
> } else {
> -vhost_scsi_common_stop(vsc);
> +vhost_user_scsi_stop(s);
> }
> }
> 
> @@ -89,14 +117,126 @@ static void vhost_dummy_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
> {
> }
> 
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
> +vsc->dev.vqs = s->vhost_vqs;
> +vsc->dev.vq_index = 0;
> +vsc->dev.backend_features = 0;
> +
> +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0,
> + errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* restore vhost state */
> +if (virtio_device_started(vdev, vdev->status)) {
> +ret = vhost_user_scsi_start(s);
> +if (ret < 0) {
> +return ret;
> +}
> +}
> +
> +return 0;
> +}
> +
> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event);
> +
> +static void vhost_user_scsi_disconnect(DeviceState *dev)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +if (!s->connected) {
> +return;
> +}
> +s->connected = false;
> +
> +vhost_user_scsi_stop(s);
> +
> +vhost_dev_cleanup(&vsc->dev);
> +
> +/* Re-instate the event handler for new connections */
> +qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL,
> + vhost_user_scsi_event, NULL, dev, NULL, true);
> +}
> +
> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event)
> +{
> +DeviceState *dev = opaque;
> +VirtIO

Re: [PATCH v3 5/5] vhost-user-scsi: start vhost when guest kicks

2023-07-31 Thread Raphael Norwitz




> On Jul 31, 2023, at 8:10 AM, Li Feng  wrote:
> 
> Let's keep the same behavior as vhost-user-blk.
> 
> Some old guests kick virtqueue before setting VIRTIO_CONFIG_S_DRIVER_OK.
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/scsi/vhost-user-scsi.c | 48 +++
> 1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 5bf012461b..a7fa8e8df2 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -113,8 +113,48 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
> }
> }
> 
> -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> {
> +VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +Error *local_err = NULL;
> +int i, ret;
> +
> +if (!vdev->start_on_kick) {
> +return;
> +}
> +
> +if (!s->connected) {
> +return;
> +}
> +
> +if (vhost_dev_is_started(&vsc->dev)) {
> +return;
> +}
> +
> +/*
> + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> +ret = vhost_user_scsi_start(s);
> +if (ret < 0) {
> +error_reportf_err(local_err, "vhost-user-scsi: vhost start failed: 
> ");
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> +return;
> +}
> +
> +/* Kick right away to begin processing requests already in vring */
> +for (i = 0; i < vsc->dev.nvqs; i++) {
> +VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> +if (!virtio_queue_get_desc_addr(vdev, i)) {
> +continue;
> +}
> +event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> +}
> }
> 
> static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> @@ -243,9 +283,9 @@ static void vhost_user_scsi_realize(DeviceState *dev, 
> Error **errp)
> return;
> }
> 
> -virtio_scsi_common_realize(dev, vhost_dummy_handle_output,
> -   vhost_dummy_handle_output,
> -   vhost_dummy_handle_output, &err);
> +virtio_scsi_common_realize(dev, vhost_user_scsi_handle_output,
> +   vhost_user_scsi_handle_output,
> +   vhost_user_scsi_handle_output, &err);
> if (err != NULL) {
> error_propagate(errp, err);
> return;
> -- 
> 2.41.0
>

Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation

2024-02-03 Thread Raphael Norwitz

As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
be updating it again shortly so tagging these with my new work email.

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> We barely had mmap_offset set in the past. With virtio-mem and
> dynamic-memslots that will change.
>
> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> performing pointer arithmetics, which is wrong. Let's simply
> use dev_region->mmap_addr instead of "void *mmap_addr".
>
> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> Cc: Raphael Norwitz 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index a3b158c671..7e515ed15d 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -800,8 +800,8 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>   * Return the address to QEMU so that it can translate the ufd
>   * fault addresses back.
>   */
> -msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> +msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>
>  /* Send the message back to qemu with the addresses filled in. */
>  vmsg->fd_num = 0;
> @@ -969,8 +969,8 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
> *vmsg)
>  /* Return the address to QEMU so that it can translate the ufd
>   * fault addresses back.
>   */
> -msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> +msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>  close(vmsg->fds[i]);
>  }
>
> --
> 2.43.0
>
>

Re: [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's prepare for increasing VHOST_USER_MAX_RAM_SLOTS by dynamically
> allocating dev->regions. We don't have any ABI guarantees (not
> dynamically linked), so we can simply change the layout of VuDev.
>
> Let's zero out the memory, just as we used to do.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 11 +++
>  subprojects/libvhost-user/libvhost-user.h |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 7e515ed15d..8a5a7a2295 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -2171,6 +2171,8 @@ vu_deinit(VuDev *dev)
>
>  free(dev->vq);
>  dev->vq = NULL;
> +free(dev->regions);
> +dev->regions = NULL;
>  }
>
>  bool
> @@ -2205,9 +2207,18 @@ vu_init(VuDev *dev,
>  dev->backend_fd = -1;
>  dev->max_queues = max_queues;
>
> +dev->regions = malloc(VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
> +if (!dev->regions) {
> +DPRINT("%s: failed to malloc mem regions\n", __func__);
> +return false;
> +}
> +memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
> +
>  dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
>  if (!dev->vq) {
>  DPRINT("%s: failed to malloc virtqueues\n", __func__);
> +free(dev->regions);
> +dev->regions = NULL;
>  return false;
>  }
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h 
> b/subprojects/libvhost-user/libvhost-user.h
> index c2352904f0..c882b4e3a2 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -398,7 +398,7 @@ typedef struct VuDevInflightInfo {
>  struct VuDev {
>  int sock;
>  uint32_t nregions;
> -VuDevRegion regions[VHOST_USER_MAX_RAM_SLOTS];
> +VuDevRegion *regions;
>  VuVirtq *vq;
>  VuDevInflightInfo inflight_info;
>  int log_call_fd;
> --
> 2.43.0
>
>

Re: [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's support up to 509 mem slots, just like vhost in the kernel usually
> does and the rust vhost-user implementation recently [1] started doing.
> This is required to properly support memory hotplug, either using
> multiple DIMMs (ACPI supports up to 256) or using virtio-mem.
>
> The 509 used to be the KVM limit, it supported 512, but 3 were
> used for internal purposes. Currently, KVM supports more than 512, but
> it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot
> memory), except when other memory devices like PCI devices with BARs are
> used. So, 509 seems to work well for vhost in the kernel.
>
> Details can be found in the QEMU change that made virtio-mem consume
> up to 256 mem slots across all virtio-mem devices. [2]
>
> 509 mem slots implies 509 VMAs/mappings in the worst case (even though,
> in practice with virtio-mem we won't be seeing more than ~260 in most
> setups).
>
> With max_map_count under Linux defaulting to 64k, 509 mem slots
> still correspond to less than 1% of the maximum number of mappings.
> There are plenty left for the application to consume.
>
> [1] https://github.com/rust-vmm/vhost/pull/224
> [2] https://lore.kernel.org/all/20230926185738.277351-1-da...@redhat.com/
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.h | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h 
> b/subprojects/libvhost-user/libvhost-user.h
> index c882b4e3a2..deb40e77b3 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -31,10 +31,12 @@
>  #define VHOST_MEMORY_BASELINE_NREGIONS 8
>
>  /*
> - * Set a reasonable maximum number of ram slots, which will be supported by
> - * any architecture.
> + * vhost in the kernel usually supports 509 mem slots. 509 used to be the
> + * KVM limit, it supported 512, but 3 were used for internal purposes. This
> + * limit is sufficient to support many DIMMs and virtio-mem in
> + * "dynamic-memslots" mode.
>   */
> -#define VHOST_USER_MAX_RAM_SLOTS 32
> +#define VHOST_USER_MAX_RAM_SLOTS 509
>
>  #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
>
> --
> 2.43.0
>
>

Re: [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Let's factor it out. Note that the check for MAP_FAILED was wrong as
> we never set mmap_addr if mmap() failed. We'll remove the NULL check
> separately.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 34 ---
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 8a5a7a2295..d5b3468e43 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -240,6 +240,22 @@ qva_to_va(VuDev *dev, uint64_t qemu_addr)
>  return NULL;
>  }
>
> +static void
> +vu_remove_all_mem_regs(VuDev *dev)
> +{
> +unsigned int i;
> +
> +for (i = 0; i < dev->nregions; i++) {
> +VuDevRegion *r = &dev->regions[i];
> +void *ma = (void *)(uintptr_t)r->mmap_addr;
> +
> +if (ma) {
> +munmap(ma, r->size + r->mmap_offset);
> +}
> +}
> +dev->nregions = 0;
> +}
> +
>  static void
>  vmsg_close_fds(VhostUserMsg *vmsg)
>  {
> @@ -1003,14 +1019,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>  unsigned int i;
>  VhostUserMemory m = vmsg->payload.memory, *memory = &m;
>
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = &dev->regions[i];
> -void *ma = (void *) (uintptr_t) r->mmap_addr;
> -
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> -}
> +vu_remove_all_mem_regs(dev);
>  dev->nregions = memory->nregions;
>
>  if (dev->postcopy_listening) {
> @@ -2112,14 +2121,7 @@ vu_deinit(VuDev *dev)
>  {
>  unsigned int i;
>
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = &dev->regions[i];
> -void *m = (void *) (uintptr_t) r->mmap_addr;
> -if (m != MAP_FAILED) {
> -munmap(m, r->size + r->mmap_offset);
> -}
> -}
> -dev->nregions = 0;
> +vu_remove_all_mem_regs(dev);
>
>  for (i = 0; i < dev->max_queues; i++) {
>  VuVirtq *vq = &dev->vq[i];
> --
> 2.43.0
>
>

Re: [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's reduce some code duplication and prepare for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 119 +++---
>  1 file changed, 39 insertions(+), 80 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d5b3468e43..d9e2214ad2 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -937,95 +937,23 @@ vu_get_shared_object(VuDev *dev, VhostUserMsg *vmsg)
>  }
>
>  static bool
> -vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> +vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
> -unsigned int i;
>  VhostUserMemory m = vmsg->payload.memory, *memory = &m;
> -dev->nregions = memory->nregions;
> -
> -DPRINT("Nregions: %u\n", memory->nregions);
> -for (i = 0; i < dev->nregions; i++) {
> -void *mmap_addr;
> -VhostUserMemoryRegion *msg_region = &memory->regions[i];
> -VuDevRegion *dev_region = &dev->regions[i];
> -
> -DPRINT("Region %d\n", i);
> -DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> -   msg_region->guest_phys_addr);
> -DPRINT("memory_size: 0x%016"PRIx64"\n",
> -   msg_region->memory_size);
> -DPRINT("userspace_addr   0x%016"PRIx64"\n",
> -   msg_region->userspace_addr);
> -DPRINT("mmap_offset  0x%016"PRIx64"\n",
> -   msg_region->mmap_offset);
> -
> -dev_region->gpa = msg_region->guest_phys_addr;
> -dev_region->size = msg_region->memory_size;
> -dev_region->qva = msg_region->userspace_addr;
> -dev_region->mmap_offset = msg_region->mmap_offset;
> +int prot = PROT_READ | PROT_WRITE;
> +unsigned int i;
>
> -/* We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages.
> +if (dev->postcopy_listening) {
> +/*
>   * In postcopy we're using PROT_NONE here to catch anyone
>   * accessing it before we userfault
>   */
> -mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[i], 0);
> -
> -if (mmap_addr == MAP_FAILED) {
> -vu_panic(dev, "region mmap error: %s", strerror(errno));
> -} else {
> -dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> -DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> -   dev_region->mmap_addr);
> -}
> -
> -/* Return the address to QEMU so that it can translate the ufd
> - * fault addresses back.
> - */
> -msg_region->userspace_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> -close(vmsg->fds[i]);
> -}
> -
> -/* Send the message back to qemu with the addresses filled in */
> -vmsg->fd_num = 0;
> -if (!vu_send_reply(dev, dev->sock, vmsg)) {
> -vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> -return false;
> -}
> -
> -/* Wait for QEMU to confirm that it's registered the handler for the
> - * faults.
> - */
> -if (!dev->read_msg(dev, dev->sock, vmsg) ||
> -vmsg->size != sizeof(vmsg->payload.u64) ||
> -vmsg->payload.u64 != 0) {
> -vu_panic(dev, "failed to receive valid ack for postcopy 
> set-mem-table");
> -return false;
> +prot = PROT_NONE;
>  }
>
> -/* OK, now we can go and register the memory and generate faults */
> -(void)generate_faults(dev);
> -
> -return false;
> -}
> -
> -static bool
> -vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> -{
> -unsigned int i;
> -VhostUserMemory m = vmsg->payload.memory, *memory = &m;
> -
>  vu_remove_all_mem_regs(dev);
>  dev->nregions = memory->nregions;
>
> -if (dev->postcopy_listening) {
> -return vu_set_mem_table_exec_postcopy(dev, vmsg);
> -}
> -
>  DPRINT("Nregions: %u\n", memory->nregions

Re: [PATCH v1 06/15] libvhost-user: Factor out adding a memory region

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's factor it out, reducing quite some code duplication and perparing
> for further changes.
>
> If we fail to mmap a region and panic, we now simply don't add that
> (broken) region.
>
> Note that we now increment dev->nregions as we are successfully
> adding memory regions, and don't increment dev->nregions if anything went
> wrong.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 168 --
>  1 file changed, 60 insertions(+), 108 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d9e2214ad2..a2baefe84b 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -256,6 +256,61 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static void
> +_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> +{
> +int prot = PROT_READ | PROT_WRITE;
> +VuDevRegion *r;
> +void *mmap_addr;
> +
> +DPRINT("Adding region %d\n", dev->nregions);
> +DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> +   msg_region->guest_phys_addr);
> +DPRINT("memory_size: 0x%016"PRIx64"\n",
> +   msg_region->memory_size);
> +DPRINT("userspace_addr   0x%016"PRIx64"\n",
> +   msg_region->userspace_addr);
> +DPRINT("mmap_offset  0x%016"PRIx64"\n",
> +   msg_region->mmap_offset);
> +
> +if (dev->postcopy_listening) {
> +/*
> + * In postcopy we're using PROT_NONE here to catch anyone
> + * accessing it before we userfault
> + */
> +prot = PROT_NONE;
> +}
> +
> +/*
> + * We don't use offset argument of mmap() since the mapped address has
> + * to be page aligned, and we use huge pages.
> + */
> +mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
> + prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
> +if (mmap_addr == MAP_FAILED) {
> +vu_panic(dev, "region mmap error: %s", strerror(errno));
> +return;
> +}
> +DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> +   (uint64_t)(uintptr_t)mmap_addr);
> +
> +r = &dev->regions[dev->nregions];
> +r->gpa = msg_region->guest_phys_addr;
> +r->size = msg_region->memory_size;
> +r->qva = msg_region->userspace_addr;
> +r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> +r->mmap_offset = msg_region->mmap_offset;
> +dev->nregions++;
> +
> +if (dev->postcopy_listening) {
> +/*
> + * Return the address to QEMU so that it can translate the ufd
> + * fault addresses back.
> + */
> +msg_region->userspace_addr = r->mmap_addr + r->mmap_offset;
> +}
> +}
> +
>  static void
>  vmsg_close_fds(VhostUserMsg *vmsg)
>  {
> @@ -727,10 +782,7 @@ generate_faults(VuDev *dev) {
>  static bool
>  vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  int i;
> -bool track_ramblocks = dev->postcopy_listening;
>  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
> -VuDevRegion *dev_region = &dev->regions[dev->nregions];
> -void *mmap_addr;
>
>  if (vmsg->fd_num != 1) {
>  vmsg_close_fds(vmsg);
> @@ -760,69 +812,20 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>   * we know all the postcopy client bases have been received, and we
>   * should start generating faults.
>   */
> -if (track_ramblocks &&
> +if (dev->postcopy_listening &&
>  vmsg->size == sizeof(vmsg->payload.u64) &&
>  vmsg->payload.u64 == 0) {
>  (void)generate_faults(dev);
>  return false;
>  }
>
> -DPRINT("Adding region: %u\n", dev->nregions);
> -DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> -   msg_region->guest_phys_addr);
> -DPRINT("memory_size: 0x%016"PRIx64"\n",
> -   msg_region->memory_size);
> -DPRINT("userspace_addr   0x%016"PRIx64"\n",
> -   msg_region->userspace_addr);
> -DPRINT("mmap_offset  0x%016"PRIx64"\n",
> -   msg_region->mmap_offset);
>

Re: [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> We never add a memory region if mmap() failed. Therefore, no need to check
> for NULL.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index a2baefe84b..f99c888b48 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -247,11 +247,8 @@ vu_remove_all_mem_regs(VuDev *dev)
>
>  for (i = 0; i < dev->nregions; i++) {
>  VuDevRegion *r = &dev->regions[i];
> -void *ma = (void *)(uintptr_t)r->mmap_addr;
>
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> +munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
>  }
>  dev->nregions = 0;
>  }
> @@ -888,11 +885,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  for (i = 0; i < dev->nregions; i++) {
>  if (reg_equal(&dev->regions[i], msg_region)) {
>  VuDevRegion *r = &dev->regions[i];
> -void *ma = (void *) (uintptr_t) r->mmap_addr;
>
> -if (ma) {
> -munmap(ma, r->size + r->mmap_offset);
> -}
> +munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
>
>  /*
>   * Shift all affected entries by 1 to close the hole at index i 
> and
> --
> 2.43.0
>
>

Re: [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> dev->nregions always covers only valid entries. Stop zeroing out other
> array elements that are unused.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index f99c888b48..e1a1b9df88 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -888,13 +888,9 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>
>  munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
>
> -/*
> - * Shift all affected entries by 1 to close the hole at index i 
> and
> - * zero out the last entry.
> - */
> +/* Shift all affected entries by 1 to close the hole at index. */
>  memmove(dev->regions + i, dev->regions + i + 1,
>  sizeof(VuDevRegion) * (dev->nregions - i - 1));
> -memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
>  DPRINT("Successfully removed a region\n");
>  dev->nregions--;
>  i--;
> @@ -2119,7 +2115,6 @@ vu_init(VuDev *dev,
>  DPRINT("%s: failed to malloc mem regions\n", __func__);
>  return false;
>  }
> -memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * 
> sizeof(dev->regions[0]));
>
>  dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
>  if (!dev->vq) {
> --
> 2.43.0
>
>

Re: [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing memory regions

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> We cannot have duplicate memory regions, something would be deeply
> flawed elsewhere. Let's just stop the search once we found an entry.
>
> We'll add more sanity checks when adding memory regions later.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index e1a1b9df88..22154b217f 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -896,8 +896,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  i--;
>
>  found = true;
> -
> -/* Continue the search for eventual duplicates. */
> +break;
>  }
>  }
>
> --
> 2.43.0
>
>

Re: [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
>
> Memory regions cannot overlap, and if we ever hit that case something
> would be really flawed.
>
> For example, when vhost code in QEMU decides to increase the size of memory
> regions to cover full huge pages, it makes sure to never create overlaps,
> and if there would be overlaps, it would bail out.
>
> QEMU commits 48d7c9757749 ("vhost: Merge sections added to temporary
> list"), c1ece84e7c93 ("vhost: Huge page align and merge") and
> e7b94a84b6cb ("vhost: Allow adjoining regions") added and clarified that
> handling and how overlaps are impossible.
>
> Consequently, each GPA can belong to at most one memory region, and
> everything else doesn't make sense. Let's factor out our search to prepare
> for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 79 +--
>  1 file changed, 45 insertions(+), 34 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 22154b217f..d036b54ed0 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -195,30 +195,47 @@ vu_panic(VuDev *dev, const char *msg, ...)
>   */
>  }
>
> +/* Search for a memory region that covers this guest physical address. */
> +static VuDevRegion *
> +vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> +{
> +unsigned int i;
> +
> +/*
> + * Memory regions cannot overlap in guest physical address space. Each
> + * GPA belongs to exactly one memory region, so there can only be one
> + * match.
> + */
> +for (i = 0; i < dev->nregions; i++) {
> +VuDevRegion *cur = &dev->regions[i];
> +
> +if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
> +return cur;
> +}
> +}
> +return NULL;
> +}
> +
>  /* Translate guest physical address to our virtual address.  */
>  void *
>  vu_gpa_to_va(VuDev *dev, uint64_t *plen, uint64_t guest_addr)
>  {
> -unsigned int i;
> +VuDevRegion *r;
>
>  if (*plen == 0) {
>  return NULL;
>  }
>
> -/* Find matching memory region.  */
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *r = &dev->regions[i];
> -
> -if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) {
> -if ((guest_addr + *plen) > (r->gpa + r->size)) {
> -*plen = r->gpa + r->size - guest_addr;
> -}
> -return (void *)(uintptr_t)
> -guest_addr - r->gpa + r->mmap_addr + r->mmap_offset;
> -}
> +r = vu_gpa_to_mem_region(dev, guest_addr);
> +if (!r) {
> +return NULL;
>  }
>
> -return NULL;
> +if ((guest_addr + *plen) > (r->gpa + r->size)) {
> +*plen = r->gpa + r->size - guest_addr;
> +}
> +return (void *)(uintptr_t)guest_addr - r->gpa + r->mmap_addr +
> +   r->mmap_offset;
>  }
>
>  /* Translate qemu virtual address to our virtual address.  */
> @@ -854,8 +871,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
>  static bool
>  vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
> -unsigned int i;
> -bool found = false;
> +unsigned int idx;
> +VuDevRegion *r;
>
>  if (vmsg->fd_num > 1) {
>  vmsg_close_fds(vmsg);
> @@ -882,28 +899,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>  DPRINT("mmap_offset  0x%016"PRIx64"\n",
> msg_region->mmap_offset);
>
> -for (i = 0; i < dev->nregions; i++) {
> -if (reg_equal(&dev->regions[i], msg_region)) {
> -VuDevRegion *r = &dev->regions[i];
> -
> -munmap((void *)(uintptr_t)r->mmap_addr, r->size + 
> r->mmap_offset);
> -
> -/* Shift all affected entries by 1 to close the hole at index. */
> -memmove(dev->regions + i, dev->regions + i + 1,
> -sizeof(VuDevRegion) * (dev->nregions - i - 1));
> -DPRINT("Successfully removed a region\n");
> -dev->nregions--;
> -i--;
> -
> -found = true;
> -break;
> -}
> -}
> -
> -if (!found) {
> +r = vu_gpa_to_mem_region(d

Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()

2024-02-03 Thread Raphael Norwitz

One comment on this one.

On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> Let's speed up GPA to memory region / virtual address lookup. Store the
> memory regions ordered by guest physical addresses, and use binary
> search for address translation, as well as when adding/removing memory
> regions.
>
> Most importantly, this will speed up GPA->VA address translation when we
> have many memslots.
>
> Signed-off-by: David Hildenbrand 
> ---
>  subprojects/libvhost-user/libvhost-user.c | 49 +--
>  1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index d036b54ed0..75e47b7bb3 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
>  static VuDevRegion *
>  vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
>  {
> -unsigned int i;
> +int low = 0;
> +int high = dev->nregions - 1;
>
>  /*
>   * Memory regions cannot overlap in guest physical address space. Each
>   * GPA belongs to exactly one memory region, so there can only be one
>   * match.
> + *
> + * We store our memory regions ordered by GPA and can simply perform a
> + * binary search.
>   */
> -for (i = 0; i < dev->nregions; i++) {
> -VuDevRegion *cur = &dev->regions[i];
> +while (low <= high) {
> +unsigned int mid = low + (high - low) / 2;
> +VuDevRegion *cur = &dev->regions[mid];
>
>  if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
>  return cur;
>  }
> +if (guest_addr >= cur->gpa + cur->size) {
> +low = mid + 1;
> +}
> +if (guest_addr < cur->gpa) {
> +high = mid - 1;
> +}
>  }
>  return NULL;
>  }
> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
>  static void
>  _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>  {
> +const uint64_t start_gpa = msg_region->guest_phys_addr;
> +const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>  int prot = PROT_READ | PROT_WRITE;
>  VuDevRegion *r;
>  void *mmap_addr;
> +int low = 0;
> +int high = dev->nregions - 1;
> +unsigned int idx;
>
>  DPRINT("Adding region %d\n", dev->nregions);
>  DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  prot = PROT_NONE;
>  }
>
> +/*
> + * We will add memory regions into the array sorted by GPA. Perform a
> + * binary search to locate the insertion point: it will be at the low
> + * index.
> + */
> +while (low <= high) {
> +unsigned int mid = low + (high - low)  / 2;
> +VuDevRegion *cur = &dev->regions[mid];
> +
> +/* Overlap of GPA addresses. */

Looks like this check will only catch if the new region is fully
contained within an existing region. I think we need to check whether
either start or end region are in the range, i.e.:

if ((start_gpa > curr_gpa && start_gpa < cur->gpa + curr_size ) ||
(end_gpa > currr->gpa && end_gpa < cur->gpa + curr->size)  )


> +if (start_gpa < cur->gpa + cur->size && cur->gpa < end_gpa) {
> +vu_panic(dev, "regions with overlapping guest physical 
> addresses");
> +return;
> +}
> +if (start_gpa >= cur->gpa + cur->size) {
> +low = mid + 1;
> +}
> +if (start_gpa < cur->gpa) {
> +high = mid - 1;
> +}
> +}
> +idx = low;
> +
>  /*
>   * We don't use offset argument of mmap() since the mapped address has
>   * to be page aligned, and we use huge pages.
> @@ -308,7 +347,9 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> -r = &dev->regions[dev->nregions];
> +/* Shift all affected entries by 1 to open a hole at idx. */
> +r = &dev->regions[idx];
> +memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
>  r->gpa = msg_region->guest_phys_addr;
>  r->size = msg_region->memory_size;
>  r->qva = msg_region->userspace_addr;
> --
> 2.43.0
>
>

Re: [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> In the past, QEMU would create memory regions that could partially cover
> hugetlb pages, making mmap() fail if we would use the mmap_offset as an
> fd_offset. For that reason, we never used the mmap_offset as an offset into
> the fd and instead always mapped the fd from the very start.
>
> However, that can easily result in us mmap'ing a lot of unnecessary
> parts of an fd, possibly repeatedly.
>
> QEMU nowadays does not create memory regions that partially cover huge
> pages -- it never really worked with postcopy. QEMU handles merging of
> regions that partially cover huge pages (due to holes in boot memory) since
> 2018 in c1ece84e7c93 ("vhost: Huge page align and merge").
>
> Let's be a bit careful and not unconditionally convert the
> mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb
> size and pass as much as we can as fd_offset, making sure that we call
> mmap() with a properly aligned offset.
>
> With QEMU and a virtio-mem device that is fully plugged (50GiB using 50
> memslots) the qemu-storage daemon process consumes in the VA space
> 1281GiB before this change and 58GiB after this change.
>
> Example debug output:
>    Vhost user message 
>   Request: VHOST_USER_ADD_MEM_REG (37)
>   Flags:   0x9
>   Size:40
>   Fds: 59
>   Adding region 50
>   guest_phys_addr: 0x000d8000
>   memory_size: 0x4000
>   userspace_addr   0x7f54ebffe000
>   mmap_offset  0x000c
>   fd_offset:   0x000c
>   new mmap_offset: 0x
>   mmap_addr:   0x7f7ecc00
>   Successfully added new region
>    Vhost user message 
>   Request: VHOST_USER_ADD_MEM_REG (37)
>   Flags:   0x9
>   Size:40
>   Fds: 59
>   Adding region 51
>   guest_phys_addr: 0x000dc000
>   memory_size: 0x4000
>   userspace_addr   0x7f552bffe000
>   mmap_offset  0x000c4000
>   fd_offset:   0x000c4000
>   new mmap_offset: 0x
>   mmap_addr:   0x7f7e8c00
>   Successfully added new region
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 50 ---
>  1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 75e47b7bb3..7d8293dc84 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -43,6 +43,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #ifdef __NR_userfaultfd
>  #include 
> @@ -281,12 +283,36 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static size_t
> +get_fd_pagesize(int fd)
> +{
> +static size_t pagesize;
> +#if defined(__linux__)
> +struct statfs fs;
> +int ret;
> +
> +do {
> +ret = fstatfs(fd, &fs);
> +} while (ret != 0 && errno == EINTR);
> +
> +if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
> +return fs.f_bsize;
> +}
> +#endif
> +
> +if (!pagesize) {
> +pagesize = getpagesize();
> +}
> +return pagesize;
> +}
> +
>  static void
>  _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>  {
>  const uint64_t start_gpa = msg_region->guest_phys_addr;
>  const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>  int prot = PROT_READ | PROT_WRITE;
> +uint64_t mmap_offset, fd_offset;
>  VuDevRegion *r;
>  void *mmap_addr;
>  int low = 0;
> @@ -335,11 +361,25 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  idx = low;
>
>  /*
> - * We don't use offset argument of mmap() since the mapped address has
> - * to be page aligned, and we use huge pages.
> + * Convert most of msg_region->mmap_offset to fd_offset. In almost all
> + * cases, this will leave us with mmap_offset == 0, mmap()'ing only
> + * what we really need. Only if a memory region would partially cover
> + * hugetlb pages, we'd get mmap_offset != 0, which usually doesn't happen
> + * anymore (i.e., modern QEMU).
> + *
> + * Note that mmap() with hugetlb would fail if the offset into the file
> + * is not aligned to the huge page size.
>   */
> -mmap_addr = mmap(0, msg_region->memo

Re: [PATCH v1 13/15] libvhost-user: Factor out vq usability check

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Let's factor it out to prepare for further changes.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 24 +++
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 7d8293dc84..febeb2eb89 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,6 +283,12 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static bool
> +vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
> +{
> +return likely(!dev->broken) && likely(vq->vring.avail);
> +}
> +
>  static size_t
>  get_fd_pagesize(int fd)
>  {
> @@ -2378,8 +2384,7 @@ vu_queue_get_avail_bytes(VuDev *dev, VuVirtq *vq, 
> unsigned int *in_bytes,
>  idx = vq->last_avail_idx;
>
>  total_bufs = in_total = out_total = 0;
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  goto done;
>  }
>
> @@ -2494,8 +2499,7 @@ vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned 
> int in_bytes,
>  bool
>  vu_queue_empty(VuDev *dev, VuVirtq *vq)
>  {
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return true;
>  }
>
> @@ -2534,8 +2538,7 @@ vring_notify(VuDev *dev, VuVirtq *vq)
>
>  static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
>  {
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> @@ -2860,8 +2863,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
>  unsigned int head;
>  VuVirtqElement *elem;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return NULL;
>  }
>
> @@ -3018,8 +3020,7 @@ vu_queue_fill(VuDev *dev, VuVirtq *vq,
>  {
>  struct vring_used_elem uelem;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> @@ -3048,8 +3049,7 @@ vu_queue_flush(VuDev *dev, VuVirtq *vq, unsigned int 
> count)
>  {
>  uint16_t old, new;
>
> -if (unlikely(dev->broken) ||
> -unlikely(!vq->vring.avail)) {
> +if (!vu_is_vq_usable(dev, vq)) {
>  return;
>  }
>
> --
> 2.43.0
>
>

Re: [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions

2024-02-03 Thread Raphael Norwitz

Someone else with more knowledge of the VQ mapping code should also review.

On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand  wrote:
>
> Currently, we try to remap all rings whenever we add a single new memory
> region. That doesn't quite make sense, because we already map rings when
> setting the ring address, and panic if that goes wrong. Likely, that
> handling was simply copied from set_mem_table code, where we actually
> have to remap all rings.
>
> Remapping all rings might require us to walk quite a lot of memory
> regions to perform the address translations. Ideally, we'd simply remove
> that remapping.
>
> However, let's be a bit careful. There might be some weird corner cases
> where we might temporarily remove a single memory region (e.g., resize
> it), that would have worked for now. Further, a ring might be located on
> hotplugged memory, and as the VM reboots, we might unplug that memory, to
> hotplug memory before resetting the ring addresses.
>
> So let's unmap affected rings as we remove a memory region, and try
> dynamically mapping the ring again when required.
>
> Signed-off-by: David Hildenbrand 

Acked-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 107 --
>  1 file changed, 78 insertions(+), 29 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index febeb2eb89..738e84ab63 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,10 +283,75 @@ vu_remove_all_mem_regs(VuDev *dev)
>  dev->nregions = 0;
>  }
>
> +static bool
> +map_ring(VuDev *dev, VuVirtq *vq)
> +{
> +vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> +vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> +vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> +
> +DPRINT("Setting virtq addresses:\n");
> +DPRINT("vring_desc  at %p\n", vq->vring.desc);
> +DPRINT("vring_used  at %p\n", vq->vring.used);
> +DPRINT("vring_avail at %p\n", vq->vring.avail);
> +
> +return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> +}
> +
>  static bool

Consider changing the function name to indicate that it may actually map a vq?

Maybe vu_maybe_map_vq()?

>  vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
>  {
> -return likely(!dev->broken) && likely(vq->vring.avail);
> +if (unlikely(dev->broken)) {
> +return false;
> +}
> +
> +if (likely(vq->vring.avail)) {
> +return true;
> +}
> +
> +/*
> + * In corner cases, we might temporarily remove a memory region that
> + * mapped a ring. When removing a memory region we make sure to
> + * unmap any rings that would be impacted. Let's try to remap if we
> + * already succeeded mapping this ring once.
> + */
> +if (!vq->vra.desc_user_addr || !vq->vra.used_user_addr ||
> +!vq->vra.avail_user_addr) {
> +return false;
> +}
> +if (map_ring(dev, vq)) {
> +vu_panic(dev, "remapping queue on access");
> +return false;
> +}
> +return true;
> +}
> +
> +static void
> +unmap_rings(VuDev *dev, VuDevRegion *r)
> +{
> +int i;
> +
> +for (i = 0; i < dev->max_queues; i++) {
> +VuVirtq *vq = &dev->vq[i];
> +const uintptr_t desc = (uintptr_t)vq->vring.desc;
> +const uintptr_t used = (uintptr_t)vq->vring.used;
> +const uintptr_t avail = (uintptr_t)vq->vring.avail;
> +
> +if (desc < r->mmap_addr || desc >= r->mmap_addr + r->size) {
> +continue;
> +}
> +if (used < r->mmap_addr || used >= r->mmap_addr + r->size) {
> +continue;
> +}
> +if (avail < r->mmap_addr || avail >= r->mmap_addr + r->size) {
> +continue;
> +}
> +
> +DPRINT("Unmapping rings of queue %d\n", i);
> +vq->vring.desc = NULL;
> +vq->vring.used = NULL;
> +vq->vring.avail = NULL;
> +}
>  }
>
>  static size_t
> @@ -784,21 +849,6 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
>  return false;
>  }
>
> -static bool
> -map_ring(VuDev *dev, VuVirtq *vq)
> -{
> -vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> -vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> -vq->vring.avail = qva_to_va(dev, vq->

Re: [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP

2024-02-03 Thread Raphael Norwitz

On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
>
> We already use MADV_NORESERVE to deal with sparse memory regions. Let's
> also set madvise(MADV_DONTDUMP), otherwise a crash of the process can
> result in us allocating all memory in the mmap'ed region for dumping
> purposes.
>
> This change implies that the mmap'ed rings won't be included in a
> coredump. If ever required for debugging purposes, we could mark only
> the mapped rings MADV_DODUMP.
>
> Ignore errors during madvise() for now.
>
> Signed-off-by: David Hildenbrand 

Reviewed-by: Raphael Norwitz 

> ---
>  subprojects/libvhost-user/libvhost-user.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 738e84ab63..26c289518c 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -458,6 +458,12 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> *msg_region, int fd)
>  DPRINT("mmap_addr:   0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> +#if defined(__linux__)
> +/* Don't include all guest memory in a coredump. */
> +madvise(mmap_addr, msg_region->memory_size + mmap_offset,
> +MADV_DONTDUMP);
> +#endif
> +
>  /* Shift all affected entries by 1 to open a hole at idx. */
>  r = &dev->regions[idx];
>  memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
> --
> 2.43.0
>
>

[PATCH] MAINTAINERS: Switch to my Enfabrica email

2024-02-03 Thread Raphael Norwitz

I'd prefer to use my new work email so this change updates MAINTAINERS
with it.

Signed-off-by: Raphael Norwitz 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2f9741b898..a12b58abe2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2556,7 +2556,7 @@ F: include/hw/virtio/virtio-gpu.h
 F: docs/system/devices/virtio-gpu.rst
 
 vhost-user-blk
-M: Raphael Norwitz 
+M: Raphael Norwitz 
 S: Maintained
 F: contrib/vhost-user-blk/
 F: contrib/vhost-user-scsi/
-- 
2.43.0

Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation

2024-02-04 Thread Raphael Norwitz

On Sun, Feb 4, 2024 at 9:36 AM David Hildenbrand  wrote:
>
> On 04.02.24 02:35, Raphael Norwitz wrote:
> > As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
> > be updating it again shortly so tagging these with my new work email.
> >
>
> Thanks for the fast review! The mail server already complained to me :)
>
> Maybe consider adding yourself as reviewer for vhost as well? (which
> covers libvhost-user), I took your mail address from git history, not
> get_maintainers.pl.

I don't expect I'll have much time to review code outside of
vhost-user-blk/vhost-user-scsi, but happy to add an entry if it helps
folks tag me on relevant patches.

>
> > On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand  wrote:
> >>
> >> We barely had mmap_offset set in the past. With virtio-mem and
> >> dynamic-memslots that will change.
> >>
> >> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> >> performing pointer arithmetics, which is wrong. Let's simply
> >> use dev_region->mmap_addr instead of "void *mmap_addr".
> >>
> >> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> >> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> >> Cc: Raphael Norwitz 
> >> Signed-off-by: David Hildenbrand 
> >
> > Reviewed-by: Raphael Norwitz 
>
>
> --
> Cheers,
>
> David / dhildenb
>

Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()

2024-02-04 Thread Raphael Norwitz

On Sun, Feb 4, 2024 at 9:51 AM David Hildenbrand  wrote:
>
> On 04.02.24 03:10, Raphael Norwitz wrote:
> > One comment on this one.
> >
> > On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand  wrote:
> >>
> >> Let's speed up GPA to memory region / virtual address lookup. Store the
> >> memory regions ordered by guest physical addresses, and use binary
> >> search for address translation, as well as when adding/removing memory
> >> regions.
> >>
> >> Most importantly, this will speed up GPA->VA address translation when we
> >> have many memslots.
> >>
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>   subprojects/libvhost-user/libvhost-user.c | 49 +--
> >>   1 file changed, 45 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> >> b/subprojects/libvhost-user/libvhost-user.c
> >> index d036b54ed0..75e47b7bb3 100644
> >> --- a/subprojects/libvhost-user/libvhost-user.c
> >> +++ b/subprojects/libvhost-user/libvhost-user.c
> >> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
> >>   static VuDevRegion *
> >>   vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> >>   {
> >> -unsigned int i;
> >> +int low = 0;
> >> +int high = dev->nregions - 1;
> >>
> >>   /*
> >>* Memory regions cannot overlap in guest physical address space. 
> >> Each
> >>* GPA belongs to exactly one memory region, so there can only be one
> >>* match.
> >> + *
> >> + * We store our memory regions ordered by GPA and can simply perform a
> >> + * binary search.
> >>*/
> >> -for (i = 0; i < dev->nregions; i++) {
> >> -VuDevRegion *cur = &dev->regions[i];
> >> +while (low <= high) {
> >> +unsigned int mid = low + (high - low) / 2;
> >> +VuDevRegion *cur = &dev->regions[mid];
> >>
> >>   if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) 
> >> {
> >>   return cur;
> >>   }
> >> +if (guest_addr >= cur->gpa + cur->size) {
> >> +low = mid + 1;
> >> +}
> >> +if (guest_addr < cur->gpa) {
> >> +high = mid - 1;
> >> +}
> >>   }
> >>   return NULL;
> >>   }
> >> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
> >>   static void
> >>   _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> >>   {
> >> +const uint64_t start_gpa = msg_region->guest_phys_addr;
> >> +const uint64_t end_gpa = start_gpa + msg_region->memory_size;
> >>   int prot = PROT_READ | PROT_WRITE;
> >>   VuDevRegion *r;
> >>   void *mmap_addr;
> >> +int low = 0;
> >> +int high = dev->nregions - 1;
> >> +unsigned int idx;
> >>
> >>   DPRINT("Adding region %d\n", dev->nregions);
> >>   DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> >> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion 
> >> *msg_region, int fd)
> >>   prot = PROT_NONE;
> >>   }
> >>
> >> +/*
> >> + * We will add memory regions into the array sorted by GPA. Perform a
> >> + * binary search to locate the insertion point: it will be at the low
> >> + * index.
> >> + */
> >> +while (low <= high) {
> >> +unsigned int mid = low + (high - low)  / 2;
> >> +VuDevRegion *cur = &dev->regions[mid];
> >> +
> >> +/* Overlap of GPA addresses. */
> >
> > Looks like this check will only catch if the new region is fully
> > contained within an existing region. I think we need to check whether
> > either start or end region are in the range, i.e.:
>
> That check should cover all cases of overlaps, not just fully contained.
>
> See the QEMU implementation of range_overlaps_rang() that contains a
> similar logic:
>
> return !(range2->upb < range1->lob || range1->upb < range2->lob);
>
> !(range2->upb < range1->lob || range1->upb < range2->lob);
> =  !(range2->upb < range1->lob) && !(range1->upb < range2->lob)
> =   range2->upb >= range1->lob && range1->upb >= range2->lob
> =   range1->lob <= range2->upb && range2->lob <= range1->upb
>
> In QEMU, upb is inclusive, if it were exclusive (like we have here):
>
> =   range1->lob < range2->upb && range2->lob < range1->upb
>
> Which is what we have here with:
>
> range1->lob = start_gpa
> range1->upb = end_gpa
> range2->lob = cur->gpa
> range2->upb = cur->gpa + cur->size
>
> Also if you are interested, see
>
> https://stackoverflow.com/questions/3269434/whats-the-most-efficient-way-to-test-if-two-ranges-overlap
>
> Thanks!

Got it, thanks for the full explanation. With that:

Reviewed-by: Raphael Norwitz 

>
> --
> Cheers,
>
> David / dhildenb
>

Re: [PATCH 1/2] libvhost-user: Fix pointer arithmetic in indirect read

2024-04-18 Thread Raphael Norwitz

The change looks right to me. As is, it looks like the code is
skipping over descriptors when the intent should be to bounce data
into a single descriptor.

I agree the variable rename should go in as a separate change.

On Thu, Apr 18, 2024 at 6:56 AM Daniel P. Berrangé  wrote:
>
> On Sat, Jan 13, 2024 at 04:27:40AM +0300, Temir Zharaspayev wrote:
> > When zero-copy usage of indirect descriptors buffer table isn't
> > possible, library gather scattered memory chunks in a local copy.
> > This commit fixes the issue with pointer arithmetic for the local copy
> > buffer.
> >
> > Signed-off-by: Temir Zharaspayev 
> > ---
> >  subprojects/libvhost-user/libvhost-user.c | 11 ++-
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 6684057370..e952c098a3 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -2307,7 +2307,7 @@ static int
> >  virtqueue_read_indirect_desc(VuDev *dev, struct vring_desc *desc,
> >   uint64_t addr, size_t len)
> >  {
> > -struct vring_desc *ori_desc;
> > +uint8_t *src_cursor, *dst_cursor;
> >  uint64_t read_len;
> >
> >  if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) {
> > @@ -2318,17 +2318,18 @@ virtqueue_read_indirect_desc(VuDev *dev, struct 
> > vring_desc *desc,
> >  return -1;
> >  }
> >
> > +dst_cursor = (uint8_t *) desc;

Nit - no space on cast

> >  while (len) {
> >  read_len = len;
> > -ori_desc = vu_gpa_to_va(dev, &read_len, addr);
> > -if (!ori_desc) {
> > +src_cursor = vu_gpa_to_va(dev, &read_len, addr);
> > +if (!src_cursor) {
> >  return -1;
> >  }
> >
> > -memcpy(desc, ori_desc, read_len);
> > +memcpy(dst_cursor, src_cursor, read_len);
> >  len -= read_len;
> >  addr += read_len;
> > -desc += read_len;
> > +dst_cursor += read_len;
>
> The ori_desc -> src_cursor changes don't look to have any functional
> effect. Having that change present obscures the functional part of
> the patch, which is this line. FWIW, it is generally preferrable to
> not mix functional and non-functional changes in the same patch
>
> It now interprets 'read_len' as the number of bytes to increment the
> address by, rather than incrementing by the number of elements of
> size 'sizeof(struct vring_desc)'.
>
> I don't know enough about this area of QEMU code to say which
> semantics were desired, so I'll defer to the Michael as maintainer
> to give a formal review.
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>
>

[Qemu-devel] [PATCH] vhost-user-scsi: prevent using uninitialized vqs

2019-06-11 Thread Raphael Norwitz

Of the 3 virtqueues, seabios only sets cmd, leaving ctrl
and event without a physical address. This can cause
vhost_verify_ring_part_mapping to return ENOMEM, causing
the following logs:

qemu-system-x86_64: Unable to map available ring for ring 0
qemu-system-x86_64: Verify ring failure on region 0

The qemu commit e6cc11d64fc998c11a4dfcde8fda3fc33a74d844
has already resolved the issue for vhost scsi devices but
the fix was never applied to vhost-user scsi devices.

Signed-off-by: Raphael Norwitz 
---
 hw/scsi/vhost-user-scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a9fd8ea..e4aae95 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -91,7 +91,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error 
**errp)
 }
 
 vsc->dev.nvqs = 2 + vs->conf.num_queues;
-vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
+vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
 vsc->dev.vq_index = 0;
 vsc->dev.backend_features = 0;
 vqs = vsc->dev.vqs;
-- 
1.9.4

Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation

2020-02-19 Thread Raphael Norwitz

On Mon, Feb 10, 2020 at 11:04:28AM -0500, Michael S. Tsirkin wrote:
> 
> On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> > On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > > 
> > > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > > 
> > > > Changes since V1:
> > > > * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > > >   to prevent corruption
> > > > * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > > >   startup and cache the returned value so that QEMU does not need to
> > > >   query the backend every time vhost_backend_memslots_limit is 
> > > > called.
> > > 
> > > I'm a bit confused about what happens on reconnect.
> > > Can you clarify pls?
> > > 
> > >From what I can see, backends which support reconnect call vhost_dev_init,
> > which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> > https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> > ram slots limit is fetched in vhost_user_backend_init() so every time the
> > device reconnects the limit should be refetched. 
> 
> Right. Point is, we might have validated using an old limit.
> Reconnect needs to verify limit did not change or at least
> did not decrease.
> 
> -- 
> MST
Good point - I did not consider this case. Could we keep the slots limit in
the VhostUserState instead?

Say vhost_user_init() initializes the limit inside the VhostUserState to 0. 
Then,
vhost_user_backend_init() checks if this limit is 0. If so, this is the initial
connection and qemu fetches the limit from the backend, ensures the returned
value is nonzero, and sets the limit the VhostUserState. If not, qemu knows this
is a reconnect and queries the backend slots limit. If the returned value does
not equal the limit in the VhostUserState, vhost_user_backend_init() returns an
error.

Thoughts?

Re: [PATCH 1/2] vhost-user: add VHOST_USER_RESET_DEVICE to reset devices

2019-12-12 Thread Raphael Norwitz

On Wed, Nov 06, 2019 at 06:36:01AM -0500, Michael S. Tsirkin wrote:
> 
> On Tue, Oct 29, 2019 at 05:38:02PM -0400, Raphael Norwitz wrote:
> > Add a VHOST_USER_RESET_DEVICE message which will reset the vhost user
> > backend. Disabling all rings, and resetting all internal state, ready
> > for the backend to be reinitialized.
> > 
> > A backend has to report it supports this features with the
> > VHOST_USER_PROTOCOL_F_RESET_DEVICE protocol feature bit. If it does
> > so, the new message is used instead of sending a RESET_OWNER which has
> > had inconsistent implementations.
> > 
> > Signed-off-by: David Vrabel 
> > Signed-off-by: Raphael Norwitz 

Ping on this. 

> 
> Looks ok, pls ping me after the release to apply this.
> > ---
> >  docs/interop/vhost-user.rst | 15 +++
> >  hw/virtio/vhost-user.c  |  8 +++-
> >  2 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 7827b71..d213d4a 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -785,6 +785,7 @@ Protocol features
> >#define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
> >#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> >#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> > +  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
> >  
> >  Master message types
> >  
> > @@ -1190,6 +1191,20 @@ Master message types
> >ancillary data. The GPU protocol is used to inform the master of
> >rendering state and updates. See vhost-user-gpu.rst for details.
> >  
> > +``VHOST_USER_RESET_DEVICE``
> > +  :id: 34
> > +  :equivalent ioctl: N/A
> > +  :master payload: N/A
> > +  :slave payload: N/A
> > +
> > +  Ask the vhost user backend to disable all rings and reset all
> > +  internal device state to the initial state, ready to be
> > +  reinitialized. The backend retains ownership of the device
> > +  throughout the reset operation.
> > +
> > +  Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
> > +  feature is set by the backend.
> > +
> >  Slave message types
> >  ---
> >  
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 02a9b25..d27a10f 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -58,6 +58,7 @@ enum VhostUserProtocolFeature {
> >  VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
> >  VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
> >  VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
> > +VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
> >  VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -98,6 +99,7 @@ typedef enum VhostUserRequest {
> >  VHOST_USER_GET_INFLIGHT_FD = 31,
> >  VHOST_USER_SET_INFLIGHT_FD = 32,
> >  VHOST_USER_GPU_SET_SOCKET = 33,
> > +VHOST_USER_RESET_DEVICE = 34,
> >  VHOST_USER_MAX
> >  } VhostUserRequest;
> >  
> > @@ -890,10 +892,14 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
> >  static int vhost_user_reset_device(struct vhost_dev *dev)
> >  {
> >  VhostUserMsg msg = {
> > -.hdr.request = VHOST_USER_RESET_OWNER,
> >  .hdr.flags = VHOST_USER_VERSION,
> >  };
> >  
> > +msg.hdr.request = virtio_has_feature(dev->protocol_features,
> > + 
> > VHOST_USER_PROTOCOL_F_RESET_DEVICE)
> > +? VHOST_USER_RESET_DEVICE
> > +: VHOST_USER_RESET_OWNER;
> > +
> >  if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> >  return -1;
> >  }
> > -- 
> > 1.8.3.1
> 
>

[RFC PATCH 0/3] vhost-user: Lift Max Ram Slots Limitation

2019-12-16 Thread Raphael Norwitz

In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This RFC/patch set introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

The patch consists of 3 changes:
1. Fixed Error Handling in vhost_user_set_mem_table_postcopy:
   This is a bug fix in the postcopy migration path
2. vhost-user: Refactor vhost_user_set_mem_table Functions:
   This is a non-functional change refractoring the
   vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
   functions such that the feature can be more cleanly added.
3. Introduce Configurable Number of Memory Slots Exposed by vhost-user:
   This change introduces the new protocol feature
   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
  send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
  messages to map and unmap individual memory regions instead of one large
  VHOST_USER_SET_MEM_TABLE message containing all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
regions grows, the message becomes very large. In practice, such large
messages caused problems (truncated messages) and in the past it seems the
community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

Current Limitations:
- postcopy migration is not supported when the
  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
- VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
  VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.

Both of these limitations are due to resource contraints. They are not
imposed for technical reasons.

Questions:
- In the event transmitting a VHOST_USER_ADD_MEM_REG or
  VHOST_USER_REM_REG message fails, is there any reason the error handling
  should differ from when transmitting VHOST_USER_SET_MEM_TABLE message fails?
- Is there a cleaner way to ensure to ensure a postcopy migration cannot be
  started with this protocol feature enabled?

Best,
Raphael

Raphael Norwitz (3):
  Fixed Error Handling in vhost_user_set_mem_table_postcopy
  vhost-user: Refactor vhost_user_set_mem_table Functions
  Introduce Configurable Number of Memory Slots Exposed by vhost-user:

 docs/interop/vhost-user.rst |  43 +
 hw/virtio/vhost-user.c  | 384 +---
 2 files changed, 335 insertions(+), 92 deletions(-)

-- 
1.8.3.1

[RFC PATCH 1/3] Fixed Error Handling in vhost_user_set_mem_table_postcopy

2019-12-16 Thread Raphael Norwitz

The current vhost_user_set_mem_table_postcopy() implementation
populates each region of the VHOST_USER_SET_MEM_TABLE
message without first checking if there are more than
VHOST_MEMORY_MAX_NREGIONS already populated. This can
cause memory corruption and potentially a crash if too many
regions are added to the message during the postcopy step.

Additionally, after populating each region, the current
implementation asserts that the current region index is less than
VHOST_MEMORY_MAX_NREGIONS. Thus, even if the aforementioned
bug is fixed by moving the existing assert up, too many hot-adds
during the postcopy step will bring down qemu instead of
gracefully propogating up the error as in
vhost_user_set_mem_table().

This change cleans up error handling in
vhost_user_set_mem_table_postcopy() such that it handles
an unsupported number of memory hot-adds like
vhost_user_set_mem_table(), gracefully propogating an error
up instead of corrupting memory and crashing qemu.

Signed-off-by: Raphael Norwitz 
---
 hw/virtio/vhost-user.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 02a9b25..f74ff3b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -441,6 +441,10 @@ static int vhost_user_set_mem_table_postcopy(struct 
vhost_dev *dev,
  &offset);
 fd = memory_region_get_fd(mr);
 if (fd > 0) {
+if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+error_report("Failed preparing vhost-user memory table msg");
+return -1;
+}
 trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
   reg->memory_size,
   reg->guest_phys_addr,
@@ -453,7 +457,6 @@ static int vhost_user_set_mem_table_postcopy(struct 
vhost_dev *dev,
 msg.payload.memory.regions[fd_num].guest_phys_addr =
 reg->guest_phys_addr;
 msg.payload.memory.regions[fd_num].mmap_offset = offset;
-assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
 fds[fd_num++] = fd;
 } else {
 u->region_rb_offset[i] = 0;
-- 
1.8.3.1

[RFC PATCH 3/3] Introduce Configurable Number of Memory Slots Exposed by vhost-user:

2019-12-16 Thread Raphael Norwitz

The current vhost-user implementation in Qemu imposes a limit on the
maximum number of memory slots exposed to a VM using a vhost-user
device. This change provides a new protocol feature
VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit
and allows a VM with a vhost-user device to expose a configurable
number of memory slots, up to the maximum supported by the platform.
Existing backends are unaffected.

This feature works by using three new messages,
VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
number of memory slots the backend is willing to accept. Then,
when the memory tables are set or updated, a series of
VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages are sent
to transmit the regions to map and/or unmap instead of trying to
send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE message.

The vhost_user struct maintains a shadow state of the VM’s memory
regions. When the memory tables are modified, the
vhost_user_set_mem_table() function compares the new device memory state
to the shadow state and only sends regions which need to be unmapped or
mapped in. The regions which must be unmapped are sent first, followed
by the new regions to be mapped in. After all the messages have been sent,
the shadow state is set to the current virtual device state.

The current feature implementation does not work with postcopy migration
and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
also been negotiated.

Signed-off-by: Raphael Norwitz 
---
 docs/interop/vhost-user.rst |  43 
 hw/virtio/vhost-user.c  | 251 
 2 files changed, 273 insertions(+), 21 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 7827b71..855a072 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -785,6 +785,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   13
 
 Master message types
 
@@ -1190,6 +1191,48 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+``VHOST_USER_GET_MAX_MEM_SLOTS``
+  :id: 34
+  :equivalent ioctl: N/A
+  :slave payload: u64
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the
+  slave. The slave should return the message with a u64 payload
+  containing the maximum number of memory slots for QEMU to expose to
+  the guest. This message is not supported with postcopy migration or if
+  the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_ADD_MEM_REG``
+  :id: 35
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must map in. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_REM_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
+``VHOST_USER_REM_MEM_REG``
+  :id: 36
+  :equivalent ioctl: N/A
+  :slave payload: memory region
+
+  When the VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been
+  successfully negotiated, this message is submitted by master to the slave.
+  The message payload contains a memory region descriptor struct, describing
+  a region of guest memory which the slave device must unmap. When the
+  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS protocol feature has been successfully
+  negotiated, along with the VHOST_USER_ADD_MEM_REG message, this message is
+  used to set and update the memory tables of the slave device. This message
+  is not supported with postcopy migration or if the
+  VHOST_USER_PROTOCOL_F_REPLY_ACK feature has also been negotiated.
+
 Slave message types
 ---
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 2134e81..3432462 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,11 +35,29 @@
 #include 
 #endif
 
-#define VHOST_MEMORY_MAX_NREGIONS8
+#define VHOST_MEMORY_LEGACY_NREGIONS8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_SLAVE_MAX_FDS 8
 
 /*
+ * Set maximum number of RAM slots supported to
+ * the maximum number supported by the target
+ * hardware plaform

[RFC PATCH 2/3] vhost-user: Refactor vhost_user_set_mem_table Functions

2019-12-16 Thread Raphael Norwitz

vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy()
have gotten convoluted, and have some identical code.

This change moves the logic populating the VhostUserMemory struct
and fds array from vhost_user_set_mem_table() and
vhost_user_set_mem_table_postcopy() to a new function,
vhost_user_fill_set_mem_table_msg().

No functionality is impacted.

Signed-off-by: Raphael Norwitz 
---
 hw/virtio/vhost-user.c | 144 +++--
 1 file changed, 66 insertions(+), 78 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f74ff3b..2134e81 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -405,76 +405,97 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 return 0;
 }
 
-static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
- struct vhost_memory *mem)
+static int vhost_user_fill_set_mem_table_msg(struct vhost_user *u,
+ struct vhost_dev *dev,
+ VhostUserMsg *msg,
+ int *fds, size_t *fd_num,
+ bool postcopy)
 {
-struct vhost_user *u = dev->opaque;
-int fds[VHOST_MEMORY_MAX_NREGIONS];
 int i, fd;
-size_t fd_num = 0;
-VhostUserMsg msg_reply;
-int region_i, msg_i;
+ram_addr_t offset;
+MemoryRegion *mr;
+struct vhost_memory_region *reg;
 
-VhostUserMsg msg = {
-.hdr.request = VHOST_USER_SET_MEM_TABLE,
-.hdr.flags = VHOST_USER_VERSION,
-};
-
-if (u->region_rb_len < dev->mem->nregions) {
-u->region_rb = g_renew(RAMBlock*, u->region_rb, dev->mem->nregions);
-u->region_rb_offset = g_renew(ram_addr_t, u->region_rb_offset,
-  dev->mem->nregions);
-memset(&(u->region_rb[u->region_rb_len]), '\0',
-   sizeof(RAMBlock *) * (dev->mem->nregions - u->region_rb_len));
-memset(&(u->region_rb_offset[u->region_rb_len]), '\0',
-   sizeof(ram_addr_t) * (dev->mem->nregions - u->region_rb_len));
-u->region_rb_len = dev->mem->nregions;
-}
+msg->hdr.request = VHOST_USER_SET_MEM_TABLE;
 
 for (i = 0; i < dev->mem->nregions; ++i) {
-struct vhost_memory_region *reg = dev->mem->regions + i;
-ram_addr_t offset;
-MemoryRegion *mr;
+reg = dev->mem->regions + i;
 
 assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
 mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
  &offset);
 fd = memory_region_get_fd(mr);
 if (fd > 0) {
-if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
+if (*fd_num == VHOST_MEMORY_MAX_NREGIONS) {
 error_report("Failed preparing vhost-user memory table msg");
 return -1;
 }
-trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
-  reg->memory_size,
-  reg->guest_phys_addr,
-  reg->userspace_addr, offset);
-u->region_rb_offset[i] = offset;
-u->region_rb[i] = mr->ram_block;
-msg.payload.memory.regions[fd_num].userspace_addr =
+if (postcopy) {
+trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
+  reg->memory_size,
+  reg->guest_phys_addr,
+  reg->userspace_addr,
+  offset);
+u->region_rb_offset[i] = offset;
+u->region_rb[i] = mr->ram_block;
+}
+msg->payload.memory.regions[*fd_num].userspace_addr =
 reg->userspace_addr;
-msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-msg.payload.memory.regions[fd_num].guest_phys_addr =
+msg->payload.memory.regions[*fd_num].memory_size =
+reg->memory_size;
+msg->payload.memory.regions[*fd_num].guest_phys_addr =
 reg->guest_phys_addr;
-msg.payload.memory.regions[fd_num].mmap_offset = offset;
-fds[fd_num++] = fd;
-} else {
+msg->payload.memory.regions[*fd_num].mmap_offset = offset;
+fds[(*fd_num)++] = fd;
+} else if (postcopy) {
 u->region_rb_offset[i] = 0;
 u->region_r

Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation

2020-02-05 Thread Raphael Norwitz

Ping

On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> 
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html  
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html 
> 
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html 
> 
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
> 
> The patch consists of 3 changes:
> 1. Fixed assert in vhost_user_set_mem_table_postcopy:
>This is a bug fix in the postcopy migration path
> 2. Refactor vhost_user_set_mem_table functions:
>This is a non-functional change refractoring the
>vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
>functions such that the feature can be more cleanly added.
> 3. Lift max memory slots limit imposed by vhost-user:
>This change introduces the new protocol feature
>VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.
> 
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
>   send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>   messages to map and unmap individual memory regions instead of one large
>   VHOST_USER_SET_MEM_TABLE message containing all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
> 
> The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
> regions grows, the message becomes very large. In practice, such large
> messages caused problems (truncated messages) and in the past it seems the
> community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
> 
> Current Limitations:
> - postcopy migration is not supported when the
>   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
> - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
>   VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.
> 
> Both of these limitations are due to resource contraints. They are not
> imposed for technical reasons.
> 
> Changes since V1:
> * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>   to prevent corruption
> * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>   startup and cache the returned value so that QEMU does not need to
>   query the backend every time vhost_backend_memslots_limit is called.
> 
> Best,
> Raphael
> 
> Raphael Norwitz (3):
>   Fixed assert in vhost_user_set_mem_table_postcopy
>   Refactor vhost_user_set_mem_table functions
>   Lift max memory slots limit imposed by vhost-user
> 
>  docs/interop/vhost-user.rst |  43 +
>  hw/virtio/vhost-user.c  | 385 
> +---
>  2 files changed, 336 insertions(+), 92 deletions(-)
> 
> -- 
> 1.8.3.1
> 
>

Re: [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation

2020-02-10 Thread Raphael Norwitz

On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > 
> > Changes since V1:
> > * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> >   to prevent corruption
> > * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> >   startup and cache the returned value so that QEMU does not need to
> >   query the backend every time vhost_backend_memslots_limit is called.
> 
> I'm a bit confused about what happens on reconnect.
> Can you clarify pls?
> 
>From what I can see, backends which support reconnect call vhost_dev_init,
which then calls vhost_user_backend_init(), as vhost-user-blk does here:
https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
ram slots limit is fetched in vhost_user_backend_init() so every time the
device reconnects the limit should be refetched.

Re: [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy

2020-02-10 Thread Raphael Norwitz

Yes - it's just a cleanup.

On Thu, Feb 06, 2020 at 03:20:01AM -0500, Michael S. Tsirkin wrote:
> 
> On Thu, Feb 06, 2020 at 03:17:04AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 15, 2020 at 09:57:04PM -0500, Raphael Norwitz wrote:
> > > The current vhost_user_set_mem_table_postcopy() implementation
> > > populates each region of the VHOST_USER_SET_MEM_TABLE message without
> > > first checking if there are more than VHOST_MEMORY_MAX_NREGIONS already
> > > populated. This can cause memory corruption if too many regions are
> > > added to the message during the postcopy step.
> > > 
> > > This change moves an existing assert up such that attempting to
> > > construct a VHOST_USER_SET_MEM_TABLE message with too many memory
> > > regions will gracefully bring down qemu instead of corrupting memory.
> > > 
> > > Signed-off-by: Raphael Norwitz 
> > > Signed-off-by: Peter Turschmid 
> > 
> > 
> > Could you pls add Fixes: and stable tags?
> 
> oh wait no, this is just a theoretical thing, right?
> it doesn't actually trigger, it's just a cleanup.
> 
> no fixes/stable needed then, sorry
>

Re: [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions

2020-02-10 Thread Raphael Norwitz

Sounds good

On Thu, Feb 06, 2020 at 03:21:42AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:05PM -0500, Raphael Norwitz wrote:
> > vhost_user_set_mem_table() and vhost_user_set_mem_table_postcopy() have
> > gotten convoluted, and have some identical code.
> > 
> > This change moves the logic populating the VhostUserMemory struct and
> > fds array from vhost_user_set_mem_table() and
> > vhost_user_set_mem_table_postcopy() to a new function,
> > vhost_user_fill_set_mem_table_msg().
> > 
> > No functionality is impacted.
> > 
> > Signed-off-by: Raphael Norwitz 
> > Signed-off-by: Peter Turschmid 
> 
> 
> Looks ok, but just cosmetics: let's have the flag say what
> it does, not who uses it.
> 
> So s/postcopy/track_ramblocks/ ?
>

Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user

2020-02-10 Thread Raphael Norwitz

On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > The current vhost-user implementation in Qemu imposes a limit on the
> > maximum number of memory slots exposed to a VM using a vhost-user
> > device. This change provides a new protocol feature
> > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > allows a VM with a vhost-user device to expose a configurable number of
> > memory slots, up to the ACPI defined maximum. Existing backends which
> > do not support this protocol feature are unaffected.
> 
> Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> single message?  So can't we just increase the number (after negotiating
> with remote) and be done with it, instead of add/remove?  Or is there
> another reason to prefer add/remove?
>

As mentioned in my cover letter, we experimented with simply increasing the
message size and it didn’t work on our setup. We debugged down to the socket
layer and found that on the receiving end the messages were truncated at
around 512 bytes, or around 16 memory regions. To support 512 memory regions
we would need a message size of around  32  * 512 
+ 8  ~= 16k packet size. That would be 64
times larger than the next largest message size. We thought it would be cleaner
and more in line with the rest of the protocol to keep the message sizes
smaller. In particular, we thought memory regions should be treated like the
rings, which are sent over one message at a time instead of in one large 
message.
Whether or not such a large message size can be made to work in our case,
separate messages will always work on Linux, and most likely all other UNIX
platforms QEMU is used on.

> > 
> > This feature works by using three new messages,
> > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > number of memory slots the backend is willing to accept when the
> > backend is initialized. Then, when the memory tables are set or updated,
> > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > are sent to transmit the regions to map and/or unmap instead of trying
> > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > message.
> > 
> > The vhost_user struct maintains a shadow state of the VM’s memory
> > regions. When the memory tables are modified, the
> > vhost_user_set_mem_table() function compares the new device memory state
> > to the shadow state and only sends regions which need to be unmapped or
> > mapped in. The regions which must be unmapped are sent first, followed
> > by the new regions to be mapped in. After all the messages have been
> > sent, the shadow state is set to the current virtual device state.
> > 
> > The current feature implementation does not work with postcopy migration
> > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > also been negotiated.
> 
> Hmm what would it take to lift the restrictions?
> conflicting features like this makes is very hard for users to make
> an informed choice what to support.
>

We would need a setup with a backend which supports these features (REPLY_ACK
and postcopy migration). At first glance it looks like DPDK could work but
I'm not sure how easy it will be to test postcopy migration with the resources
we have.
 
> > Signed-off-by: Raphael Norwitz 
> > Signed-off-by: Peter Turschmid 
> > Suggested-by: Mike Cui 
> > ---
> >  docs/interop/vhost-user.rst |  43 
> >  hw/virtio/vhost-user.c  | 254 
> > 
> >  2 files changed, 275 insertions(+), 22 deletions(-)
> > 
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index 5f8b3a4..ae9acf2 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -786,6 +786,7 @@ Protocol features
> >#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
> >#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
> >#define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
> > +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS   14
> >  
> >  Master message types
> >  
> > @@ -1205,6 +1206,48 @@ Master message types
> >Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
> >feature is set by the backend.
> >  
> > +``VHOST_USER_GET_MAX_MEM_SLOTS``
> > +  :id: 35
> > +  :equivalent ioctl: N/A
> > +  :slave payload: u64
> > +
> > +  When the VHOST_USER_PROTOCOL_F_CONFI

Re: [PATCH 1/1] vhost-user: Skip unnecessary duplicated VHOST_USER_ADD/REM_MEM_REG requests

2023-01-17 Thread Raphael Norwitz

I’m confused by this “one time request” path.

MST - why do we classify SET_MEM_TABLE as a one time request if we send it on 
every hot-add/hot-remove.

In particular I’m tripping over the following in vhost_user_write:

 /*
 * For non-vring specific requests, like VHOST_USER_SET_MEM_TABLE,
 * we just need send it once in the first time. For later such
 * request, we just ignore it.
 */
if (vhost_user_one_time_request(msg->hdr.request) && dev->vq_index != 0) {
msg->hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
return 0;
}

With the hot-add case in mind, this comment sounds off. IIUC hot-add works for 
vhost-user-blk and vhost-user-scsi because dev->vq_index is set to 0 and never 
changed.

Ref: 
https://git.qemu.org/?p=qemu.git;a=blob;f=hw/scsi/vhost-user-scsi.c;h=b7a71a802cdbf7430704f83fc8c6c04c135644b7;hb=HEAD#l121

Breakpoint 1, vhost_user_set_mem_table (dev=0x.., mem=0x..) at 
../hw/virtio/vhost-user.c
(gdb) where
#0  vhost_user_set_mem_table (dev=0x..., mem=0x...) at ...hw/virtio/vhost-user.c
#1  0x… in vhost_commit (listener=0x..) at .../hw/virtio/vhost.c
#2  0x… in memory_region_transaction_commit () at ...memory.c
...
(gdb) p dev->nvqs 
$4 = 10
(gdb) p dev->vq_index
$3 = 0
(gdb)

Looks like this functionality came in here:

commit b931bfbf042983f311b3b09894d8030b2755a638
Author: Changchun Ouyang 
Date:   Wed Sep 23 12:20:00 2015 +0800

vhost-user: add multiple queue support

This patch is initially based a patch from Nikolay Nikolaev.

This patch adds vhost-user multiple queue support, by creating a nc
and vhost_net pair for each queue.

...

In older version, it was reported that some messages are sent more times
than necessary. Here we came an agreement with Michael that we could
categorize vhost user messages to 2 types: non-vring specific messages,
which should be sent only once, and vring specific messages, which should
be sent per queue.

Here I introduced a helper function vhost_user_one_time_request(), which
lists following messages as non-vring specific messages:

VHOST_USER_SET_OWNER
VHOST_USER_RESET_DEVICE
VHOST_USER_SET_MEM_TABLE
VHOST_USER_GET_QUEUE_NUM

For above messages, we simply ignore them when they are not sent the first
time.

With hot-add in mind, should we revisit the non-vring specific messages and 
possibly clean the code up?

> On Jan 1, 2023, at 11:45 PM, Minghao Yuan  wrote:
> 
> The VHOST_USER_ADD/REM_MEM_REG requests should be categorized into
> non-vring specific messages, and should be sent only once.
> 
> Signed-off-by: Minghao Yuan 
> ---
> configure  | 2 +-
> hw/virtio/vhost-user.c | 2 ++
> 2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 9e407ce2e3..8b4deca342 100755

This configure change looks irrelevant. Did you mean to send it?

> --- a/configure
> +++ b/configure
> @@ -1147,7 +1147,7 @@ cat > $TMPC << EOF
> #  endif
> # endif
> #elif defined(__GNUC__) && defined(__GNUC_MINOR__)
> -# if __GNUC__ < 7 || (__GNUC__ == 7 && __GNUC_MINOR__ < 4)
> +# if __GNUC__ < 7 || (__GNUC__ == 7 && __GNUC_MINOR__ < 3)
> #  error You need at least GCC v7.4.0 to compile QEMU
> # endif
> #else
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index d9ce0501b2..3f2a8c3bdd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -459,6 +459,8 @@ static bool vhost_user_one_time_request(VhostUserRequest 
> request)
> case VHOST_USER_SET_MEM_TABLE:
> case VHOST_USER_GET_QUEUE_NUM:
> case VHOST_USER_NET_SET_MTU:
> +case VHOST_USER_ADD_MEM_REG:
> +case VHOST_USER_REM_MEM_REG:
> return true;
> default:
> return false;
> -- 
> 2.27.0
> 
>

Re: [PATCH] contrib/vhost-user-blk: Clean up deallocation of VuVirtqElement

2022-06-30 Thread Raphael Norwitz

On Thu, Jun 30, 2022 at 10:52:19AM +0200, Markus Armbruster wrote:
> We allocate VuVirtqElement with g_malloc() in
> virtqueue_alloc_element(), but free it with free() in
> vhost-user-blk.c.  Harmless, but use g_free() anyway.
> 
> One of the calls is guarded by a "not null" condition.  Useless,
> because it cannot be null (it's dereferenced right before), and even

NIT: if it

> it it could be, free() and g_free() do the right thing.  Drop the
> conditional.
> 

Reviewed-by: Raphael Norwitz 

> Fixes: Coverity CID 1490290
> Signed-off-by: Markus Armbruster 
> ---
> Not even compile-tested, because I can't figure out how this thing is
> supposed to be built.  Its initial commit message says "make
> vhost-user-blk", but that doesn't work anymore.
> 

make contrib/vhost-user-blk/vhost-user-blk works for me.

>  contrib/vhost-user-blk/vhost-user-blk.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
> b/contrib/vhost-user-blk/vhost-user-blk.c
> index 9cb78ca1d0..d6932a2645 100644
> --- a/contrib/vhost-user-blk/vhost-user-blk.c
> +++ b/contrib/vhost-user-blk/vhost-user-blk.c
> @@ -106,10 +106,7 @@ static void vub_req_complete(VubReq *req)
>req->size + 1);
>  vu_queue_notify(vu_dev, req->vq);
>  
> -if (req->elem) {
> -free(req->elem);
> -}
> -
> +g_free(req->elem);
>  g_free(req);
>  }
>  
> @@ -243,7 +240,7 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
>  /* refer to hw/block/virtio_blk.c */
>  if (elem->out_num < 1 || elem->in_num < 1) {
>  fprintf(stderr, "virtio-blk request missing headers\n");
> -free(elem);
> +g_free(elem);
>  return -1;
>  }
>  
> @@ -325,7 +322,7 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
>  return 0;
>  
>  err:
> -free(elem);
> +g_free(elem);
>  g_free(req);
>  return -1;
>  }
> -- 
> 2.35.3
>

Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests

2023-04-17 Thread Raphael Norwitz

Hey Andrey - apologies for the late reply here.

It sounds like you are dealing with a buggy guest, rather than a QEMU issue.

> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin  wrote:
> 
> 
> 
> On 4/10/23 10:35, Andrey Ryabinin wrote:
>> Some guests hang on boot when using the vhost-user-blk-pci device,
>> but boot normally when using the virtio-blk device. The problem occurs
>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to

Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the 
device before setting DRIVER_OK.”

Therefore what you are describing is buggy guest behavior. Sounds like the 
driver should be made to either
- not advertise VIRTIO_F_VERSION_1
- not kick before setting VIRTIO_CONFIG_S_DRIVER_OK

If anything, the virtio-blk virtio_blk_handle_output() function should probably 
check start_on_kick?

>> be false in vhost_user_blk_handle_output() and preventing the device from
>> starting.
>> 
>> Fix this by removing the check for vdev->start_on_kick to ensure
>> that the device starts after the kick. This aligns the behavior of
>> 'vhost-user-blk-pci' device with 'virtio-blk' as it does the similar
>> thing in its virtio_blk_handle_output() function.
>> 
>> Fixes: 110b9463d5c8 ("vhost-user-blk: start vhost when guest kicks")
>> Signed-off-by: Andrey Ryabinin 
>> ---
>> hw/block/vhost-user-blk.c | 4 
>> 1 file changed, 4 deletions(-)
>> 
>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>> index aff4d2b8cbd..448ead448f3 100644
>> --- a/hw/block/vhost-user-blk.c
>> +++ b/hw/block/vhost-user-blk.c
>> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice 
>> *vdev, VirtQueue *vq)
>> Error *local_err = NULL;
>> int i, ret;
>> 
>> -if (!vdev->start_on_kick) {
>> -return;
>> -}
>> -
>> if (!s->connected) {
>> return;
>> }
> 
> 
> After looking a bit closer to this ->start_on_kick thing ( commit 
> badaf79cfdbd ("virtio: Introduce started flag to VirtioDevice")
> and follow ups) I'm starting to think that removing it entirely would be the 
> right thing to do here.
> The whole reason for it was to add special case for !VIRTIO_F_VERSION_1 
> guests.

The virtio 1.0 spec section 2.1.2 explicitly says: "The device MUST NOT consume 
buffers or notify the driver before DRIVER_OK.”

Your change here would make QEMU violate this condition. I don’t know what the 
danger is but I assume that wording is there for a reason.

Unless MST or Cornellia (CCed) say otherwise I don’t think this is the correct 
approach.

> If we making start on kick thing for misbehaving VIRTIO_F_VERSION_1 guests 
> too, than the flag is no longer required,
> so we can do following:
> 
> ---
> hw/block/vhost-user-blk.c  |  4 
> hw/virtio/virtio-qmp.c |  2 +-
> hw/virtio/virtio.c | 21 ++---
> include/hw/virtio/virtio.h |  5 -
> 4 files changed, 3 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index aff4d2b8cbd..448ead448f3 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
> Error *local_err = NULL;
> int i, ret;
> 
> -if (!vdev->start_on_kick) {
> -return;
> -}
> -
> if (!s->connected) {
> return;
> }
> diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
> index e4d4bece2d7..4865819cd2f 100644
> --- a/hw/virtio/virtio-qmp.c
> +++ b/hw/virtio/virtio-qmp.c
> @@ -773,7 +773,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
> Error **errp)
> status->disabled = vdev->disabled;
> status->use_started = vdev->use_started;
> status->started = vdev->started;
> -status->start_on_kick = vdev->start_on_kick;
> +status->start_on_kick = true;
> status->disable_legacy_check = vdev->disable_legacy_check;
> status->bus_name = g_strdup(vdev->bus_name);
> status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f35178f5fcd..218584eae85 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2126,7 +2126,6 @@ void virtio_reset(void *opaque)
> k->reset(vdev);
> }
> 
> -vdev->start_on_kick = false;
> vdev->started = false;
> vdev->broken = false;
> vdev->guest_features = 0;
> @@ -2248,9 +2247,7 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
> trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> vq->handle_output(vdev, vq);
> 
> -if (unlikely(vdev->start_on_kick)) {
> -virtio_set_started(vdev, true);
> -}
> +virtio_set_started(vdev, true);
> }
> }
> 
> @@ -2268,9 +2265,7 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
> } else if (vq->handle_output) {
> vq->hand

[PATCH v5] Prevent vhost-user-blk-test hang

2021-09-27 Thread Raphael Norwitz

In the vhost-user-blk-test, as of now there is nothing stoping
vhost-user-blk in QEMU writing to the socket right after forking off the
storage daemon before it has a chance to come up properly, leaving the
test hanging forever. This intermittently hanging test has caused QEMU
automation failures reported multiple times on the mailing list [1].

This change makes the storage-daemon notify the vhost-user-blk-test
that it is fully initialized and ready to handle client connections by
creating a pidfile on initialiation. This ensures that the storage-daemon
backend won't miss vhost-user messages and thereby resolves the hang.

[1] 
https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/

Signed-off-by: Raphael Norwitz 
Reviewed-by: Eric Blake 
---
 tests/qtest/vhost-user-blk-test.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 6f108a1b62..5fed262da1 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -24,6 +24,7 @@
 #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
 #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
 #define PCI_SLOT_HP 0x06
+#define PIDFILE_RETRIES 5
 
 typedef struct {
 pid_t pid;
@@ -885,7 +886,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
  int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
-int i;
+int i, retries;
+char *daemon_pidfile_path;
 gchar *img_path;
 GString *storage_daemon_command = g_string_new(NULL);
 QemuStorageDaemonState *qsd;
@@ -898,6 +900,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 " -object memory-backend-memfd,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
+daemon_pidfile_path = g_strdup_printf("/tmp/daemon-%d", getpid());
+
 for (i = 0; i < vus_instances; i++) {
 int fd;
 char *sock_path = create_listen_socket(&fd);
@@ -914,6 +918,9 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
i + 1, sock_path);
 }
 
+g_string_append_printf(storage_daemon_command, "--pidfile %s ",
+   daemon_pidfile_path);
+
 g_test_message("starting vhost-user backend: %s",
storage_daemon_command->str);
 pid_t pid = fork();
@@ -930,7 +937,24 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
 exit(1);
 }
+
+/*
+ * Ensure the storage-daemon has come up properly before allowing the
+ * test to proceed.
+ */
+retries = 0;
+while (access(daemon_pidfile_path, F_OK) != 0) {
+g_assert_cmpint(retries, <, PIDFILE_RETRIES);
+
+retries++;
+g_usleep(1000);
+}
+
 g_string_free(storage_daemon_command, true);
+if (access(daemon_pidfile_path, F_OK) == 0) {
+unlink(daemon_pidfile_path);
+}
+g_free(daemon_pidfile_path);
 
 qsd = g_new(QemuStorageDaemonState, 1);
 qsd->pid = pid;
-- 
2.20.1

Re: [PATCH v5] Prevent vhost-user-blk-test hang

2021-09-29 Thread Raphael Norwitz

On Tue, Sep 28, 2021 at 10:55:00AM +0200, Stefan Hajnoczi wrote:
> On Mon, Sep 27, 2021 at 05:17:01PM +0000, Raphael Norwitz wrote:
> > In the vhost-user-blk-test, as of now there is nothing stoping
> > vhost-user-blk in QEMU writing to the socket right after forking off the
> > storage daemon before it has a chance to come up properly, leaving the
> > test hanging forever. This intermittently hanging test has caused QEMU
> > automation failures reported multiple times on the mailing list [1].
> > 
> > This change makes the storage-daemon notify the vhost-user-blk-test
> > that it is fully initialized and ready to handle client connections by
> > creating a pidfile on initialiation. This ensures that the storage-daemon
> > backend won't miss vhost-user messages and thereby resolves the hang.
> > 
> > [1] 
> > https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> 

Hey Stefan,

> Hi Raphael,
> I would like to understand the issue that is being worked around in the
> patch.
> 
> QEMU should be okay with listen fd passing. The qemu-storage-daemon
> documentation even contains example code for this
> (docs/tools/qemu-storage-daemon.rst) and that may need to be updated if
> listen fd passing is fundamentally broken.
> 

The issue is that the "client" (in this case vhost-user-blk in QEMU) can
proceed to use the socket before the storage-daemon has a chance to
properly start up and monitor it. This is nothing unique to the
storage-daemon - I've seen races like this happen happend with different
vhost-user backends before.

Yes - I do think the docs can be improved to explicitly state that the
storage-daemon must be allowed to properly initialize before any data is
sent over the socket. Maybe we should even perscribe the use of the pidfile
option?

> Can you share more details about the problem?
> 

Did you see my analysis [1]?

[1] 
https://lore.kernel.org/qemu-devel/20210827165253.GA14291@raphael-debian-dev/

Basically QEMU sends VHOST_USER_GET_PROTOCOL_FEATURES across the vhost
socket and the storage daemon never receives it. Looking at the
QEMU state we see it is stuck waiting for a vhost-user response. Meanwhile
the storage-daemon never receives any message to begin with. AFAICT
there is nothing stopping QEMU from running first and sending a message
before vhost-user-blk comes up, and from testing we can see that waiting
for the storage-daemon to come up resolves the problem completely.

> Does "writing to the socket" mean writing vhost-user protocol messages
> or does it mean connect(2)?
> 

Yes - it means writing vhost-user messages. We see a message sent from
QEMU to the backend.

Note that in qtest_socket_server() (called from create_listen_socket())
we have already called listen() on the socket, so I would expect QEMU
calling connect(2) to succeed and proceed to successfully send messages
whether or not there is another listener. I even tried commenting out the
execlp for the storage-daemon and I saw the same behavior from QEMU - it
sends the message and hangs indefinitely.

> Could the problem be that vhost-user-blk-test.c creates the listen fds
> and does not close them? This means the host network stack doesn't
> consider the socket closed after QEMU terminates and therefore the test
> process hangs after QEMU is gone? In that case vhost-user-blk-test needs
> to close the fds after spawning qemu-storage-daemon.
> 

When the test hangs both QEMU and storage-daemon are still up and
connected to the socket and waiting for messages from each other. I don't
see how we would close the FD in this state or how it would help.

We may want to think about implementing some kind of timeoout for initial
vhost-user messages so that we fail instead of hang in cases like these,
as I proposed in [1]. What do you think?

> Stefan
> 
> > 
> > Signed-off-by: Raphael Norwitz 
> > Reviewed-by: Eric Blake 
> > ---
> >  tests/qtest/vhost-user-blk-test.c | 26 +-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/vhost-user-blk-test.c 
> > b/tests/qtest/vhost-user-blk-test.c
> > index 6f108a1b62..5fed262da1 100644
> > --- a/tests/qtest/vhost-user-blk-test.c
> > +++ b/tests/qtest/vhost-user-blk-test.c
> > @@ -24,6 +24,7 @@
> >  #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
> >  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
> >  #define PCI_SLOT_HP 0x06
> > +#define PIDFILE_RETRIES 5
> >  
> >  typedef struct {
> >  pid_t pid;
> > @@ -885,7 +886,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> > vus_instances,
> >

Re: [PATCH v5] Prevent vhost-user-blk-test hang

2021-10-01 Thread Raphael Norwitz

On Thu, Sep 30, 2021 at 10:48:09AM +0100, Stefan Hajnoczi wrote:
> On Thu, Sep 30, 2021 at 05:29:06AM +0000, Raphael Norwitz wrote:
> > On Tue, Sep 28, 2021 at 10:55:00AM +0200, Stefan Hajnoczi wrote:
> > > On Mon, Sep 27, 2021 at 05:17:01PM +0000, Raphael Norwitz wrote:
> > > > In the vhost-user-blk-test, as of now there is nothing stoping
> > > > vhost-user-blk in QEMU writing to the socket right after forking off the
> > > > storage daemon before it has a chance to come up properly, leaving the
> > > > test hanging forever. This intermittently hanging test has caused QEMU
> > > > automation failures reported multiple times on the mailing list [1].
> > > > 
> > > > This change makes the storage-daemon notify the vhost-user-blk-test
> > > > that it is fully initialized and ready to handle client connections by
> > > > creating a pidfile on initialiation. This ensures that the 
> > > > storage-daemon
> > > > backend won't miss vhost-user messages and thereby resolves the hang.
> > > > 
> > > > [1] 
> > > > https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> > > 
> > 
> > Hey Stefan,
> > 
> > > Hi Raphael,
> > > I would like to understand the issue that is being worked around in the
> > > patch.
> > > 
> > > QEMU should be okay with listen fd passing. The qemu-storage-daemon
> > > documentation even contains example code for this
> > > (docs/tools/qemu-storage-daemon.rst) and that may need to be updated if
> > > listen fd passing is fundamentally broken.
> > > 
> > 
> > The issue is that the "client" (in this case vhost-user-blk in QEMU) can
> > proceed to use the socket before the storage-daemon has a chance to
> > properly start up and monitor it. This is nothing unique to the
> > storage-daemon - I've seen races like this happen happend with different
> > vhost-user backends before.
> > 
> > Yes - I do think the docs can be improved to explicitly state that the
> > storage-daemon must be allowed to properly initialize before any data is
> > sent over the socket. Maybe we should even perscribe the use of the pidfile
> > option?
> > 
> > > Can you share more details about the problem?
> > > 
> > 
> > Did you see my analysis [1]?
> > 
> > [1] 
> > https://lore.kernel.org/qemu-devel/20210827165253.GA14291@raphael-debian-dev/
> > 
> > Basically QEMU sends VHOST_USER_GET_PROTOCOL_FEATURES across the vhost
> > socket and the storage daemon never receives it. Looking at the
> > QEMU state we see it is stuck waiting for a vhost-user response. Meanwhile
> > the storage-daemon never receives any message to begin with. AFAICT
> > there is nothing stopping QEMU from running first and sending a message
> > before vhost-user-blk comes up, and from testing we can see that waiting
> > for the storage-daemon to come up resolves the problem completely.
> 
> The root cause has not been determined yet. QEMU should accept the
> incoming connection and then read the previously-sent
> VHOST_USER_GET_PROTOCOL_FEATURES message. There is no reason at the
> Sockets API level why the message should get lost, so there is probably
> a QEMU bug here.
> 
> > > Does "writing to the socket" mean writing vhost-user protocol messages
> > > or does it mean connect(2)?
> > > 
> > 
> > Yes - it means writing vhost-user messages. We see a message sent from
> > QEMU to the backend.
> > 
> > Note that in qtest_socket_server() (called from create_listen_socket())
> > we have already called listen() on the socket, so I would expect QEMU
> > calling connect(2) to succeed and proceed to successfully send messages
> > whether or not there is another listener. I even tried commenting out the
> > execlp for the storage-daemon and I saw the same behavior from QEMU - it
> > sends the message and hangs indefinitely.
> 
> QEMU is correct in waiting for a vhost-user reply. The question is why
> qemu-storage-daemon's vhost-user-block export isn't processing the
> request and replying to it?
> 
> > > Could the problem be that vhost-user-blk-test.c creates the listen fds
> > > and does not close them? This means the host network stack doesn't
> > > consider the socket closed after QEMU terminates and therefore the test
> > > process hangs after QEMU is gone? In that case vhost-user-blk-test needs
> > > to close the fds after spawning qemu-stor

[PATCH v3 3/6] libvhost-user: Simplify VHOST_USER_REM_MEM_REG

2022-01-16 Thread Raphael Norwitz

From: David Hildenbrand 

Let's avoid having to manually copy all elements. Copy only the ones
necessary to close the hole and perform the operation in-place without
a second array.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: David Hildenbrand 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 30 +++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 1a8fc9d600..7dd8e918b4 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -811,10 +811,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
 
 static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
-int i, j;
-bool found = false;
-VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
+int i;
 
 if (vmsg->fd_num != 1) {
 vmsg_close_fds(vmsg);
@@ -841,28 +839,28 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 DPRINT("mmap_offset  0x%016"PRIx64"\n",
msg_region->mmap_offset);
 
-for (i = 0, j = 0; i < dev->nregions; i++) {
-if (!reg_equal(&dev->regions[i], msg_region)) {
-shadow_regions[j].gpa = dev->regions[i].gpa;
-shadow_regions[j].size = dev->regions[i].size;
-shadow_regions[j].qva = dev->regions[i].qva;
-shadow_regions[j].mmap_addr = dev->regions[i].mmap_addr;
-shadow_regions[j].mmap_offset = dev->regions[i].mmap_offset;
-j++;
-} else {
-found = true;
+for (i = 0; i < dev->nregions; i++) {
+if (reg_equal(&dev->regions[i], msg_region)) {
 VuDevRegion *r = &dev->regions[i];
 void *m = (void *) (uintptr_t) r->mmap_addr;
 
 if (m) {
 munmap(m, r->size + r->mmap_offset);
 }
+
+break;
 }
 }
 
-if (found) {
-memcpy(dev->regions, shadow_regions,
-   sizeof(VuDevRegion) * VHOST_USER_MAX_RAM_SLOTS);
+if (i < dev->nregions) {
+/*
+ * Shift all affected entries by 1 to close the hole at index i and
+ * zero out the last entry.
+ */
+memmove(dev->regions + i, dev->regions + i + 1,
+   sizeof(VuDevRegion) * (dev->nregions - i - 1));
+memset(dev->regions + dev->nregions - 1, 0,
+   sizeof(VuDevRegion));
 DPRINT("Successfully removed a region\n");
 dev->nregions--;
 vmsg_set_reply_u64(vmsg, 0);
-- 
2.20.1

[PATCH v3 5/6] libvhost-user: prevent over-running max RAM slots

2022-01-16 Thread Raphael Norwitz

When VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS support was added to
libvhost-user, no guardrails were added to protect against QEMU
attempting to hot-add too many RAM slots to a VM with a libvhost-user
based backed attached.

This change adds the missing error handling by introducing a check on
the number of RAM slots the device has available before proceeding to
process the VHOST_USER_ADD_MEM_REG message.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 3f4d7221ca..2a1fa00a44 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -705,6 +705,14 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 return false;
 }
 
+if (dev->nregions == VHOST_USER_MAX_RAM_SLOTS) {
+close(vmsg->fds[0]);
+vu_panic(dev, "failing attempt to hot add memory via "
+  "VHOST_USER_ADD_MEM_REG message because the backend has "
+  "no free ram slots available");
+return false;
+}
+
 /*
  * If we are in postcopy mode and we receive a u64 payload with a 0 value
  * we know all the postcopy client bases have been received, and we
-- 
2.20.1

[PATCH v3 0/6] Clean up error handling in libvhost-user memory mapping

2022-01-16 Thread Raphael Norwitz

Hey Stefan, Marc-Andre, MST, David -

As promised here is a series cleaning up error handling in the
libvhost-user memory mapping path. Most of these cleanups are
straightforward and have been discussed on the mailing list in threads
[1] and [2].

[1] 
https://lore.kernel.org/qemu-devel/20211018143319.GA11006@raphael-debian-dev/
[2] 
https://lore.kernel.org/qemu-devel/9391f500-70be-26cf-bcfc-591d3ee84...@redhat.com/

Changes since V1:
 * Checks for a single fd vu_add_mem_reg and vu_rem_mem_reg return false
   instead of true.
 * Check for over-running max ram slots in vu_add_mem_reg returns false
   instead of true.
 * vu_rem_mem_reg unmaps all matching regions.
 * Decriment iterator variable when looping through regions in
   vu_rem_mem_reg to ensure matching regions aren’t missed.

Changes since V2:
 * Fixed FD leaks on all input validation failures
 * Added comment David suggested to explain removing duplicate regions
 * Added David’s patch to close message FDs on VHOST_USER_REM_MEM_REG
 * Expanded commit message for patches checking FD numbers
 * Fixed vmsg->size <= sizeof(vmsg->payload.memreg) validation check
 * Improved error message when a backend has no free slots
 * Improved error messages when the backend receives invalid vmsg->fd_num
   and/or vmeg->size

Dropped R-b tags due to non-trivial changes.

Thanks,
Raphael

David Hildenbrand (2):
  libvhost-user: Simplify VHOST_USER_REM_MEM_REG
  libvhost-user: fix VHOST_USER_REM_MEM_REG not closing the fd

Raphael Norwitz (4):
  libvhost-user: Add vu_rem_mem_reg input validation
  libvhost-user: Add vu_add_mem_reg input validation
  libvhost-user: prevent over-running max RAM slots
  libvhost-user: handle removal of identical regions

 subprojects/libvhost-user/libvhost-user.c | 76 ++-
 subprojects/libvhost-user/libvhost-user.h |  2 +
 2 files changed, 61 insertions(+), 17 deletions(-)

-- 
2.20.1

[PATCH v3 2/6] libvhost-user: Add vu_add_mem_reg input validation

2022-01-16 Thread Raphael Norwitz

Today if multiple FDs are sent from the VMM to the backend in a
VHOST_USER_ADD_MEM_REG message, one FD will be mapped and the remaining
FDs will be leaked. Therefore if multiple FDs are sent we report an
error and fail the operation, closing all FDs in the message.

Likewise in case the VMM sends a message with a size less than that
of a memory region descriptor, we add a check to gracefully report an
error and fail the operation rather than crashing.

Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index b09b1c269e..1a8fc9d600 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -690,6 +690,21 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VuDevRegion *dev_region = &dev->regions[dev->nregions];
 void *mmap_addr;
 
+if (vmsg->fd_num != 1) {
+vmsg_close_fds(vmsg);
+vu_panic(dev, "VHOST_USER_ADD_MEM_REG received %d fds - only 1 fd "
+  "should be sent for this message type", vmsg->fd_num);
+return false;
+}
+
+if (vmsg->size < VHOST_USER_MEM_REG_SIZE) {
+close(vmsg->fds[0]);
+vu_panic(dev, "VHOST_USER_ADD_MEM_REG requires a message size of at "
+  "least %d bytes and only %d bytes were received",
+  VHOST_USER_MEM_REG_SIZE, vmsg->size);
+return false;
+}
+
 /*
  * If we are in postcopy mode and we receive a u64 payload with a 0 value
  * we know all the postcopy client bases have been received, and we
-- 
2.20.1

[PATCH v3 4/6] libvhost-user: fix VHOST_USER_REM_MEM_REG not closing the fd

2022-01-16 Thread Raphael Norwitz

From: David Hildenbrand 

We end up not closing the file descriptor, resulting in leaking one
file descriptor for each VHOST_USER_REM_MEM_REG message.

Fixes: 875b9fd97b34 ("Support individual region unmap in libvhost-user")
Cc: Michael S. Tsirkin 
Cc: Raphael Norwitz 
Cc: "Marc-André Lureau" 
Cc: Stefan Hajnoczi 
Cc: Paolo Bonzini 
Cc: Coiby Xu 
Signed-off-by: David Hildenbrand 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 7dd8e918b4..3f4d7221ca 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -868,6 +868,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 vu_panic(dev, "Specified region not found\n");
 }
 
+close(vmsg->fds[0]);
+
 return true;
 }
 
-- 
2.20.1

[PATCH v3 6/6] libvhost-user: handle removal of identical regions

2022-01-16 Thread Raphael Norwitz

Today if QEMU (or any other VMM) has sent multiple copies of the same
region to a libvhost-user based backend and then attempts to remove the
region, only one instance of the region will be removed, leaving stale
copies of the region in dev->regions[].

This change resolves this by having vu_rem_mem_reg() iterate through all
regions in dev->regions[] and delete all matching regions.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 28 +--
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 2a1fa00a44..0ee43b8e93 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -821,6 +821,7 @@ static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
 int i;
+bool found = false;
 
 if (vmsg->fd_num != 1) {
 vmsg_close_fds(vmsg);
@@ -856,21 +857,24 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 munmap(m, r->size + r->mmap_offset);
 }
 
-break;
+/*
+ * Shift all affected entries by 1 to close the hole at index i and
+ * zero out the last entry.
+ */
+memmove(dev->regions + i, dev->regions + i + 1,
+sizeof(VuDevRegion) * (dev->nregions - i - 1));
+memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
+DPRINT("Successfully removed a region\n");
+dev->nregions--;
+i--;
+
+found = true;
+
+/* Continue the search for eventual duplicates. */
 }
 }
 
-if (i < dev->nregions) {
-/*
- * Shift all affected entries by 1 to close the hole at index i and
- * zero out the last entry.
- */
-memmove(dev->regions + i, dev->regions + i + 1,
-   sizeof(VuDevRegion) * (dev->nregions - i - 1));
-memset(dev->regions + dev->nregions - 1, 0,
-   sizeof(VuDevRegion));
-DPRINT("Successfully removed a region\n");
-dev->nregions--;
+if (found) {
 vmsg_set_reply_u64(vmsg, 0);
 } else {
 vu_panic(dev, "Specified region not found\n");
-- 
2.20.1

[PATCH v3 1/6] libvhost-user: Add vu_rem_mem_reg input validation

2022-01-16 Thread Raphael Norwitz

Today if multiple FDs are sent from the VMM to the backend in a
VHOST_USER_REM_MEM_REG message, one FD will be unmapped and the remaining
FDs will be leaked. Therefore if multiple FDs are sent we report an
error and fail the operation, closing all FDs in the message.

Likewise in case the VMM sends a message with a size less than that of a
memory region descriptor, we add a check to gracefully report an error
and fail the operation rather than crashing.

Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 15 +++
 subprojects/libvhost-user/libvhost-user.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 787f4d2d4f..b09b1c269e 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -801,6 +801,21 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
 
+if (vmsg->fd_num != 1) {
+vmsg_close_fds(vmsg);
+vu_panic(dev, "VHOST_USER_REM_MEM_REG received %d fds - only 1 fd "
+  "should be sent for this message type", vmsg->fd_num);
+return false;
+}
+
+if (vmsg->size < VHOST_USER_MEM_REG_SIZE) {
+close(vmsg->fds[0]);
+vu_panic(dev, "VHOST_USER_REM_MEM_REG requires a message size of at "
+  "least %d bytes and only %d bytes were received",
+  VHOST_USER_MEM_REG_SIZE, vmsg->size);
+return false;
+}
+
 DPRINT("Removing region:\n");
 DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
msg_region->guest_phys_addr);
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index 3d13dfadde..cde9f07bb3 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -129,6 +129,8 @@ typedef struct VhostUserMemoryRegion {
 uint64_t mmap_offset;
 } VhostUserMemoryRegion;
 
+#define VHOST_USER_MEM_REG_SIZE (sizeof(VhostUserMemoryRegion))
+
 typedef struct VhostUserMemory {
 uint32_t nregions;
 uint32_t padding;
-- 
2.20.1

Re: [PATCH v1] libvhost-user: Map shared RAM with MAP_NORESERVE to support virtio-mem with hugetlb

2022-01-16 Thread Raphael Norwitz

On Tue, Jan 11, 2022 at 01:39:39PM +0100, David Hildenbrand wrote:
> For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
> otherwise it's ignored. Older Linux versions that didn't support
> reservation of huge pages ignored MAP_NORESERVE completely.
> 
> The first client to mmap a hugetlb fd without MAP_NORESERVE will
> trigger reservation of huge pages for the whole mmapped range. There are
> two cases to consider:
> 
> 1) QEMU mapped RAM without MAP_NORESERVE
> 
> We're not dealing with a sparse mapping, huge pages for the whole range
> have already been reserved by QEMU. An additional mmap() without
> MAP_NORESERVE won't have any effect on the reservation.
> 
> 2) QEMU mapped RAM with MAP_NORESERVE
> 
> We're delaing with a sparse mapping, no huge pages should be reserved.
> Further mappings without MAP_NORESERVE should be avoided.
> 
> For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
> simply set it. For 2), we'd be overriding QEMUs decision and trigger
> reservation of huge pages, which might just fail if there are not
> sufficient huge pages around. We must map with MAP_NORESERVE.
> 
> This change is required to support virtio-mem with hugetlb: a
> virtio-mem device mapped into the guest physical memory corresponds to
> a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
> Whenever memory in that sparse region will be accessed by the VM, QEMU
> populates huge pages for the affected range by preallocating memory
> and handling any preallocation errors gracefully.
> 
> So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
> supports Linux, there shouldn't be anything to take care of in regard of
> other OS support.
> 
> Without this change, libvhost-user will fail mapping the region if there
> are currently not enough huge pages to perform the reservation:
>  fv_panic: libvhost-user: region mmap error: Cannot allocate memory
> 
> Cc: "Marc-André Lureau" 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Raphael Norwitz 
> Cc: Stefan Hajnoczi 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: David Hildenbrand 
> ---
>  subprojects/libvhost-user/libvhost-user.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 787f4d2d4f..3b538930be 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -728,12 +728,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>   * accessing it before we userfault.
>   */
>  mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED,
> + PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>   vmsg->fds[0], 0);
>  } else {
>  mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
> - 0);
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> + vmsg->fds[0], 0);
>  }
>  
>  if (mmap_addr == MAP_FAILED) {
> @@ -878,7 +878,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
> *vmsg)
>   * accessing it before we userfault
>   */
>  mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED,
> + PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>   vmsg->fds[i], 0);
>  
>  if (mmap_addr == MAP_FAILED) {
> @@ -965,7 +965,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>   * mapped address has to be page aligned, and we use huge
>   * pages.  */
>  mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_READ | PROT_WRITE, MAP_SHARED,
> + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
>   vmsg->fds[i], 0);
>  
>  if (mmap_addr == MAP_FAILED) {

Acked-by: Raphael Norwitz

Re: [PATCH v2] vhost-user-scsi: avoid unlink(NULL) with fd passing

2022-05-16 Thread Raphael Norwitz

On Mon, May 16, 2022 at 04:57:01PM +0100, Stefan Hajnoczi wrote:
> Commit 747421e949fc1eb3ba66b5fcccdb7ba051918241 ("Implements Backend
> Program conventions for vhost-user-scsi") introduced fd-passing support
> as part of implementing the vhost-user backend program conventions.
> 
> When fd passing is used the UNIX domain socket path is NULL and we must
> not call unlink(2).
> 
> The unlink(2) call is necessary when the listen socket, lsock, was
> created successfully since that means the UNIX domain socket is visible
> in the file system.
> 
> Fixes: Coverity CID 1488353
> Fixes: 747421e949fc1eb3ba66b5fcccdb7ba051918241 ("Implements Backend Program 
> conventions for vhost-user-scsi")
> Signed-off-by: Stefan Hajnoczi 
> ---

Reviewed-by: Raphael Norwitz 

>  contrib/vhost-user-scsi/vhost-user-scsi.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c 
> b/contrib/vhost-user-scsi/vhost-user-scsi.c
> index b2c0f98253..9ef61cf5a7 100644
> --- a/contrib/vhost-user-scsi/vhost-user-scsi.c
> +++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
> @@ -433,13 +433,16 @@ out:
>  if (vdev_scsi) {
>  g_main_loop_unref(vdev_scsi->loop);
>  g_free(vdev_scsi);
> -unlink(opt_socket_path);
>  }
>  if (csock >= 0) {
>  close(csock);
>  }
>  if (lsock >= 0) {
>  close(lsock);
> +
> +if (opt_socket_path) {
> +unlink(opt_socket_path);
> +}
>  }
>  g_free(opt_socket_path);
>  g_free(iscsi_uri);
> -- 
> 2.36.1
>

Accelerating non-standard disk types

2022-05-16 Thread Raphael Norwitz

Hey Stefan,

We've been thinking about ways to accelerate other disk types such as
SATA and IDE rather than translating to SCSI and using QEMU's iSCSI
driver, with existing and more performant backends such as SPDK. We
think there are some options worth exploring:

[1] Keep using the SCSI translation in QEMU but back vDisks with a
vhost-user-scsi or vhost-user-blk backend device.
[2] Implement SATA and IDE emulation with vfio-user (likely with an SPDK
client?).
[3] We've also been looking at your libblkio library. From your
description in
https://lists.gnu.org/archive/html/qemu-devel/2021-04/msg06146.html it
sounds like it may definitely play a role here, and possibly provide the
nessesary abstractions to back I/O from these emulated disks to any
backends we may want?

We are planning to start a review of these options internally to survey
tradeoffs, potential timelines and practicality for these approaches. We
were also considering putting a submission together for KVM forum
describing our findings. Would you see any value in that?

Thanks,
Raphael

Re: Accelerating non-standard disk types

2022-05-19 Thread Raphael Norwitz

On Tue, May 17, 2022 at 04:29:17PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 16, 2022 at 05:38:31PM +0000, Raphael Norwitz wrote:
> > Hey Stefan,
> > 
> > We've been thinking about ways to accelerate other disk types such as
> > SATA and IDE rather than translating to SCSI and using QEMU's iSCSI
> > driver, with existing and more performant backends such as SPDK. We
> > think there are some options worth exploring:
> > 
> > [1] Keep using the SCSI translation in QEMU but back vDisks with a
> > vhost-user-scsi or vhost-user-blk backend device.
> 
> If I understand correctly the idea is to have a QEMU Block Driver that
> connects to SPDK using vhost-user-scsi/blk?
>

Yes - the idea would be to introduce logic to translate SATA/IDE to SCSI
or block requests and send them via vhost-user-{scsi/blk} to SPDK or any
other vhost-user backend. Our thought is that this is doable today
whereas we may have to wait for QEMU to formally adopt libblkio before
proceeding with [3], and depending on timelines it may make sense to
implement [1] and then switch over to [3] later. Thoughts?

> > [2] Implement SATA and IDE emulation with vfio-user (likely with an SPDK
> > client?).
> 
> This is definitely the option with the lowest overhead. I'm not sure if
> implementing SATA and IDE emulation in SPDK is worth the effort for
> saving the last few cycles.
>

Agreed - it’s probably not worth exploring because of the amount of work
involved. One good argument would be that it may be better for security
in the multiprocess QEMU world, but to me that does not seem strong
enough to justify the work involved so I suggest we drop option [2].

> > [3] We've also been looking at your libblkio library. From your
> > description in
> > https://lists.gnu.org/archive/html/qemu-devel/2021-04/msg06146.html it
> > sounds like it may definitely play a role here, and possibly provide the
> > nessesary abstractions to back I/O from these emulated disks to any
> > backends we may want?
> 
> Kevin Wolf has contributed a vhost-user-blk driver for libblkio. This
> lets you achieve #1 using QEMU's libblkio Block Driver. The guest still
> sees IDE or SATA but instead of translating to iSCSI the I/O requests
> are sent over vhost-user-blk.
> 
> I suggest joining the libblkio chat and we can discuss how to set this
> up (the QEMU libblkio BlockDriver is not yet in qemu.git):
> https://matrix.to/#/#libblkio:matrix.org

Great - I have joined and will follow up there.

> 
> > We are planning to start a review of these options internally to survey
> > tradeoffs, potential timelines and practicality for these approaches. We
> > were also considering putting a submission together for KVM forum
> > describing our findings. Would you see any value in that?
> 
> I think it's always interesting to see performance results. I wonder if
> you have more cutting-edge optimizations or performance results you want
> to share at KVM Forum because IDE and SATA are more legacy/niche
> nowadays?
>

I realize I over-emphasized performance in my question - our larger goal
here is to align the data path for all disk types. We have some hope
that SATA can be sped up a bit, but it’s entirely possible that the MMIO
overhead will way outweigh and disk I/O improvements. Our thought was to
present a “Roadmap for supporting offload alternate disk types”, but
with your and Paolo’s response it seems like there isn’t enough material
to warrant a KVM talk and we should rather invest time in prototyping
and evaluating solutions.

> Stefan

Re: Accelerating non-standard disk types

2022-05-19 Thread Raphael Norwitz

On Tue, May 17, 2022 at 03:53:52PM +0200, Paolo Bonzini wrote:
> On 5/16/22 19:38, Raphael Norwitz wrote:
> > [1] Keep using the SCSI translation in QEMU but back vDisks with a
> > vhost-user-scsi or vhost-user-blk backend device.
> > [2] Implement SATA and IDE emulation with vfio-user (likely with an SPDK
> > client?).
> > [3] We've also been looking at your libblkio library. From your
> > description in
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2021-2D04_msg06146.html&d=DwICaQ&c=s883GpUCOChKOHiocYtGcg&r=In4gmR1pGzKB8G5p6LUrWqkSMec2L5EtXZow_FZNJZk&m=wBSqcw0cal3wPP87YIKgFgmqMHjGCC3apYf4wCn1SIrX6GW_FR-J9wO68v-cyrpn&s=CP-6ZY-gqgQ2zLAJdR8WVTrMBoqmFHilGvW_qnf2myU&e=
> >it
> > sounds like it may definitely play a role here, and possibly provide the
> > nessesary abstractions to back I/O from these emulated disks to any
> > backends we may want?
> 
> First of all: have you benchmarked it?  How much time is spent on MMIO vs.
> disk I/O?
>

Good point - we haven’t benchmarked the emulation, exit and translation
overheads - it is very possible speeding up disk I/O may not have a huge
impact. We would definitely benchmark this before exploring any of the
options seriously, but as you rightly note, performance is not the only
motivation here.

> Of the options above, the most interesting to me is to implement a
> vhost-user-blk/vhost-user-scsi backend in QEMU, similar to the NVMe one,
> that would translate I/O submissions to virtqueue (including polling and the
> like) and could be used with SATA.
>

We were certainly eyeing [1] as the most viable in the immediate future.
That said, since a vhost-user-blk driver has been added to libblkio, [3]
also sounds like a strong option. Do you see any long term benefit to
translating SATA/IDE submissions to virtqueues in a world where libblkio
is to be adopted?

> For IDE specifically, I'm not sure how much it can be sped up since it has
> only 1 in-flight operation.  I think using KVM coalesced I/O could provide
> an interesting boost (assuming instant or near-instant reply from the
> backend).  If all you're interested in however is not really performance,
> but rather having a single "connection" to your back end, vhost-user is
> certainly an option.
> 

Interesting - I will take a look at KVM coalesced I/O.

You’re totally right though, performance is not our main interest for
these disk types. I should have emphasized offload rather than
acceleration and performance. We would prefer to QA and support as few
data paths as possible, and a vhost-user offload mechanism would allow
us to use the same path for all I/O. I imagine other QEMU users who
offload to backends like SPDK and use SATA/IDE disk types may feel
similarly?

> Paolo

Re: [RFC 0/5] Clean up error handling in libvhost-user memory mapping

2022-01-04 Thread Raphael Norwitz

Ping

On Wed, Dec 15, 2021 at 10:29:46PM +, Raphael Norwitz wrote:
> Hey Stefan, Marc-Andre, MST, David -
> 
> As promised here is a series cleaning up error handling in the
> libvhost-user memory mapping path. Most of these cleanups are
> straightforward and have been discussed on the mailing list in threads
> [1] and [2]. Hopefully there is nothing super controversial in the first
> 4 patches.
> 
> I am concerned about patch 5 “libvhost-user: handle removal of
> identical regions”. From my reading of Stefan's comments in [1], the
> proposal seemed to be to remove any duplicate regions. I’d prefer to
> prevent duplicate regions from being added in the first place. Thoughts? 
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20211018143319.GA11006@raphael-debian-dev/
> [2] 
> https://lore.kernel.org/qemu-devel/9391f500-70be-26cf-bcfc-591d3ee84...@redhat.com/
> 
> Sorry for the delay,
> Raphael
> 
> David Hildenbrand (1):
>   libvhost-user: Simplify VHOST_USER_REM_MEM_REG
> 
> Raphael Norwitz (4):
>   libvhost-user: Add vu_rem_mem_reg input validation
>   libvhost-user: Add vu_add_mem_reg input validation
>   libvhost-user: prevent over-running max RAM slots
>   libvhost-user: handle removal of identical regions
> 
>  subprojects/libvhost-user/libvhost-user.c | 52 +++
>  1 file changed, 34 insertions(+), 18 deletions(-)
> 
> -- 
> 2.20.1

Re: [RFC 1/5] libvhost-user: Add vu_rem_mem_reg input validation

2022-01-05 Thread Raphael Norwitz

On Wed, Jan 05, 2022 at 11:00:35AM +, Stefan Hajnoczi wrote:
> On Wed, Dec 15, 2021 at 10:29:48PM +0000, Raphael Norwitz wrote:
> > Signed-off-by: Raphael Norwitz 
> > ---
> >  subprojects/libvhost-user/libvhost-user.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 787f4d2d4f..573212a83b 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -801,6 +801,12 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
> >  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> > &m;
> >  
> > +if (vmsg->fd_num != 1 ||
> > +vmsg->size != sizeof(vmsg->payload.memreg)) {
> > +vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
> > +return true;
> 
> Most vu_panic() callers return false to indicate that a reply does not
> need to be sent. When the return value is true vu_dispatch() sends a
> response, which we don't want.
> 
> Note that vu_dispatch() returns true (success) when the message handler
> function returns false. The success/failure behavior should probably be
> separated from the reply_requested behavior :(.
> 
> Anyway, returning false is probably appropriate here.
>

Ack - I'll fix it in all the patches.

> Stefan

Re: [RFC 5/5] libvhost-user: handle removal of identical regions

2022-01-05 Thread Raphael Norwitz

On Wed, Jan 05, 2022 at 11:18:52AM +, Stefan Hajnoczi wrote:
> On Wed, Dec 15, 2021 at 10:29:55PM +0000, Raphael Norwitz wrote:
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 74a9980194..2f465a4f0e 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -809,6 +809,7 @@ static bool
> >  vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> > &m;
> >  int i;
> > +bool found = false;
> >  
> >  if (vmsg->fd_num != 1 ||
> >  vmsg->size != sizeof(vmsg->payload.memreg)) {
> > @@ -831,25 +832,25 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VuDevRegion *r = &dev->regions[i];
> >  void *m = (void *) (uintptr_t) r->mmap_addr;
> >  
> > -if (m) {
> > +if (m && !found) {
> >  munmap(m, r->size + r->mmap_offset);
> >  }
> 
> Why is only the first region unmapped? My interpretation of
> vu_add_mem_reg() is that it mmaps duplicate regions to unique mmap_addr
> addresses, so we need to munmap each of them.

I agree - I will remove the found check here.

>
> >  
> > -break;
> > +/*
> > + * Shift all affected entries by 1 to close the hole at index 
> > i and
> > + * zero out the last entry.
> > + */
> > +memmove(dev->regions + i, dev->regions + i + 1,
> > +sizeof(VuDevRegion) * (dev->nregions - i - 1));
> > +memset(dev->regions + dev->nregions - 1, 0, 
> > sizeof(VuDevRegion));
> > +DPRINT("Successfully removed a region\n");
> > +dev->nregions--;
> > +
> > +found = true;
> >  }
> 
> i-- is missing. dev->regions[] has been shortened so we need to check
> the same element again.

Ack

[PATCH v2 1/5] libvhost-user: Add vu_rem_mem_reg input validation

2022-01-05 Thread Raphael Norwitz

Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 787f4d2d4f..a6dadeb637 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -801,6 +801,12 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
 
+if (vmsg->fd_num != 1 ||
+vmsg->size != sizeof(vmsg->payload.memreg)) {
+vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
+return false;
+}
+
 DPRINT("Removing region:\n");
 DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
msg_region->guest_phys_addr);
-- 
2.20.1

[PATCH v2 2/5] libvhost-user: Add vu_add_mem_reg input validation

2022-01-05 Thread Raphael Norwitz

Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a6dadeb637..d61285e991 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -690,6 +690,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VuDevRegion *dev_region = &dev->regions[dev->nregions];
 void *mmap_addr;
 
+if (vmsg->fd_num != 1 ||
+vmsg->size != sizeof(vmsg->payload.memreg)) {
+vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
+return false;
+}
+
 /*
  * If we are in postcopy mode and we receive a u64 payload with a 0 value
  * we know all the postcopy client bases have been received, and we
-- 
2.20.1

[PATCH v2 4/5] libvhost-user: prevent over-running max RAM slots

2022-01-05 Thread Raphael Norwitz

When VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS support was added to
libvhost-user, no guardrails were added to protect against QEMU
attempting to hot-add too many RAM slots to a VM with a libvhost-user
based backed attached.

This change adds the missing error handling by introducing a check on
the number of RAM slots the device has available before proceeding to
process the VHOST_USER_ADD_MEM_REG message.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 77ddc96ddf..0fe3aa155b 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -690,6 +690,11 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VuDevRegion *dev_region = &dev->regions[dev->nregions];
 void *mmap_addr;
 
+if (dev->nregions == VHOST_USER_MAX_RAM_SLOTS) {
+vu_panic(dev, "No free ram slots available");
+return false;
+}
+
 if (vmsg->fd_num != 1 ||
 vmsg->size != sizeof(vmsg->payload.memreg)) {
 vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
-- 
2.20.1

[PATCH v2 0/5] Clean up error handling in libvhost-user memory mapping

2022-01-05 Thread Raphael Norwitz

Hey Stefan, Marc-Andre, MST, David -

As promised here is a series cleaning up error handling in the
libvhost-user memory mapping path. Most of these cleanups are
straightforward and have been discussed on the mailing list in threads
[1] and [2].

[1] 
https://lore.kernel.org/qemu-devel/20211018143319.GA11006@raphael-debian-dev/
[2] 
https://lore.kernel.org/qemu-devel/9391f500-70be-26cf-bcfc-591d3ee84...@redhat.com/

Changes since V1:
 * Checks for a single fd vu_add_mem_reg and vu_rem_mem_reg return false
   instead of true.
 * Check for over-running max ram slots in vu_add_mem_reg returns false
   instead of true.
 * vu_rem_mem_reg unmaps all matching regions.
 * Decriment iterator variable when looping through regions in
   vu_rem_mem_reg to ensure matching regions aren’t missed.

Thanks,
Raphael

David Hildenbrand (1):
  libvhost-user: Simplify VHOST_USER_REM_MEM_REG

Raphael Norwitz (4):
  libvhost-user: Add vu_rem_mem_reg input validation
  libvhost-user: Add vu_add_mem_reg input validation
  libvhost-user: prevent over-running max RAM slots
  libvhost-user: handle removal of identical regions

 subprojects/libvhost-user/libvhost-user.c | 51 +++
 1 file changed, 34 insertions(+), 17 deletions(-)

-- 
2.20.1

[PATCH v2 5/5] libvhost-user: handle removal of identical regions

2022-01-05 Thread Raphael Norwitz

Today if QEMU (or any other VMM) has sent multiple copies of the same
region to a libvhost-user based backend and then attempts to remove the
region, only one instance of the region will be removed, leaving stale
copies of the region in dev->regions[].

This change resolves this by having vu_rem_mem_reg() iterate through all
regions in dev->regions[] and delete all matching regions.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Raphael Norwitz 
---
 subprojects/libvhost-user/libvhost-user.c | 26 ---
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 0fe3aa155b..14482484d3 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -809,6 +809,7 @@ static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
 int i;
+bool found = false;
 
 if (vmsg->fd_num != 1 ||
 vmsg->size != sizeof(vmsg->payload.memreg)) {
@@ -835,21 +836,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 munmap(m, r->size + r->mmap_offset);
 }
 
-break;
+/*
+ * Shift all affected entries by 1 to close the hole at index i and
+ * zero out the last entry.
+ */
+memmove(dev->regions + i, dev->regions + i + 1,
+sizeof(VuDevRegion) * (dev->nregions - i - 1));
+memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
+DPRINT("Successfully removed a region\n");
+dev->nregions--;
+i--;
+
+found = true;
 }
 }
 
-if (i < dev->nregions) {
-/*
- * Shift all affected entries by 1 to close the hole at index i and
- * zero out the last entry.
- */
-memmove(dev->regions + i, dev->regions + i + 1,
-   sizeof(VuDevRegion) * (dev->nregions - i - 1));
-memset(dev->regions + dev->nregions - 1, 0,
-   sizeof(VuDevRegion));
-DPRINT("Successfully removed a region\n");
-dev->nregions--;
+if (found) {
 vmsg_set_reply_u64(vmsg, 0);
 } else {
 vu_panic(dev, "Specified region not found\n");
-- 
2.20.1

[PATCH v2 3/5] libvhost-user: Simplify VHOST_USER_REM_MEM_REG

2022-01-05 Thread Raphael Norwitz

From: David Hildenbrand 

Let's avoid having to manually copy all elements. Copy only the ones
necessary to close the hole and perform the operation in-place without
a second array.

Signed-off-by: David Hildenbrand 
Signed-off-by: Raphael Norwitz 
Reviewed-by: Stefan Hajnoczi 
---
 subprojects/libvhost-user/libvhost-user.c | 30 +++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index d61285e991..77ddc96ddf 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -802,10 +802,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
 
 static bool
 vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
-int i, j;
-bool found = false;
-VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
 VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
+int i;
 
 if (vmsg->fd_num != 1 ||
 vmsg->size != sizeof(vmsg->payload.memreg)) {
@@ -823,28 +821,28 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
 DPRINT("mmap_offset  0x%016"PRIx64"\n",
msg_region->mmap_offset);
 
-for (i = 0, j = 0; i < dev->nregions; i++) {
-if (!reg_equal(&dev->regions[i], msg_region)) {
-shadow_regions[j].gpa = dev->regions[i].gpa;
-shadow_regions[j].size = dev->regions[i].size;
-shadow_regions[j].qva = dev->regions[i].qva;
-shadow_regions[j].mmap_addr = dev->regions[i].mmap_addr;
-shadow_regions[j].mmap_offset = dev->regions[i].mmap_offset;
-j++;
-} else {
-found = true;
+for (i = 0; i < dev->nregions; i++) {
+if (reg_equal(&dev->regions[i], msg_region)) {
 VuDevRegion *r = &dev->regions[i];
 void *m = (void *) (uintptr_t) r->mmap_addr;
 
 if (m) {
 munmap(m, r->size + r->mmap_offset);
 }
+
+break;
 }
 }
 
-if (found) {
-memcpy(dev->regions, shadow_regions,
-   sizeof(VuDevRegion) * VHOST_USER_MAX_RAM_SLOTS);
+if (i < dev->nregions) {
+/*
+ * Shift all affected entries by 1 to close the hole at index i and
+ * zero out the last entry.
+ */
+memmove(dev->regions + i, dev->regions + i + 1,
+   sizeof(VuDevRegion) * (dev->nregions - i - 1));
+memset(dev->regions + dev->nregions - 1, 0,
+   sizeof(VuDevRegion));
 DPRINT("Successfully removed a region\n");
 dev->nregions--;
 vmsg_set_reply_u64(vmsg, 0);
-- 
2.20.1

Re: [PATCH v2 1/5] libvhost-user: Add vu_rem_mem_reg input validation

2022-01-10 Thread Raphael Norwitz

On Mon, Jan 10, 2022 at 04:36:34AM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 06, 2022 at 06:47:26AM +0000, Raphael Norwitz wrote:
> > Signed-off-by: Raphael Norwitz 
> 
> 
> Raphael any chance you can add a bit more to commit logs?
> E.g. what happens right now if you pass more?
>

Sure - I'll add those details.

> > ---
> >  subprojects/libvhost-user/libvhost-user.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 787f4d2d4f..a6dadeb637 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -801,6 +801,12 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VuDevRegion shadow_regions[VHOST_USER_MAX_RAM_SLOTS] = {};
> >  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> > &m;
> >  
> > +if (vmsg->fd_num != 1 ||
> > +vmsg->size != sizeof(vmsg->payload.memreg)) {
> 
> Is there a chance someone is sending larger messages and relying
> on libvhost-user to ignore padding?
> 

Great point -  I didn't consider padding. I'll drop the vmsg->size check
here.

It looks like we are inconsistent with size checking. For example, in
vu_set_log_base_exec() we also check the size.

Should we make it consistent across the library or am I missing some
details about why the padding is not an issue in that case?

> > +vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
> 
> Maybe print the parameters that caused the issue?
>

Ack.

> > +return false;
> > +}
> > +
> >  DPRINT("Removing region:\n");
> >  DPRINT("guest_phys_addr: 0x%016"PRIx64"\n",
> > msg_region->guest_phys_addr);
> > -- 
> > 2.20.1
>

Re: [PATCH v2 0/5] Clean up error handling in libvhost-user memory mapping

2022-01-10 Thread Raphael Norwitz

On Mon, Jan 10, 2022 at 10:01:36AM +0100, David Hildenbrand wrote:
> On 06.01.22 07:47, Raphael Norwitz wrote:
> > Hey Stefan, Marc-Andre, MST, David -
> > 
> > As promised here is a series cleaning up error handling in the
> > libvhost-user memory mapping path. Most of these cleanups are
> > straightforward and have been discussed on the mailing list in threads
> > [1] and [2].
> > 
> 
> A note that we still want the fix in [3] upstream:
>
> [3] https://lore.kernel.org/all/20211012183832.62603-1-da...@redhat.com/T/#u 

Ah I thought it was merged.

I'll add it to the series to ensure it doesn't get lost.

> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
>

Re: [PATCH v2 4/5] libvhost-user: prevent over-running max RAM slots

2022-01-10 Thread Raphael Norwitz

On Mon, Jan 10, 2022 at 04:40:08AM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 06, 2022 at 06:47:35AM +0000, Raphael Norwitz wrote:
> > When VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS support was added to
> > libvhost-user, no guardrails were added to protect against QEMU
> > attempting to hot-add too many RAM slots to a VM with a libvhost-user
> > based backed attached.
> > 
> > This change adds the missing error handling by introducing a check on
> > the number of RAM slots the device has available before proceeding to
> > process the VHOST_USER_ADD_MEM_REG message.
> > 
> > Suggested-by: Stefan Hajnoczi 
> > Signed-off-by: Raphael Norwitz 
> > ---
> >  subprojects/libvhost-user/libvhost-user.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 77ddc96ddf..0fe3aa155b 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -690,6 +690,11 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VuDevRegion *dev_region = &dev->regions[dev->nregions];
> >  void *mmap_addr;
> >  
> > +if (dev->nregions == VHOST_USER_MAX_RAM_SLOTS) {
> > +vu_panic(dev, "No free ram slots available");
> 
> A bit more verbose maybe? Describe what happened to trigger this?
> e.g. adding a region with no free ram slots?
>

Ack

> > +return false;
> > +}
> > +
> >  if (vmsg->fd_num != 1 ||
> >  vmsg->size != sizeof(vmsg->payload.memreg)) {
> >  vu_panic(dev, "VHOST_USER_REM_MEM_REG received multiple regions");
> > -- 
> > 2.20.1
>

Re: [PATCH v2 5/5] libvhost-user: handle removal of identical regions

2022-01-10 Thread Raphael Norwitz

On Mon, Jan 10, 2022 at 09:58:01AM +0100, David Hildenbrand wrote:
> On 06.01.22 07:47, Raphael Norwitz wrote:
> > Today if QEMU (or any other VMM) has sent multiple copies of the same
> > region to a libvhost-user based backend and then attempts to remove the
> > region, only one instance of the region will be removed, leaving stale
> > copies of the region in dev->regions[].
> > 
> > This change resolves this by having vu_rem_mem_reg() iterate through all
> > regions in dev->regions[] and delete all matching regions.
> > 
> > Suggested-by: Stefan Hajnoczi 
> > Signed-off-by: Raphael Norwitz 
> > ---
> >  subprojects/libvhost-user/libvhost-user.c | 26 ---
> >  1 file changed, 14 insertions(+), 12 deletions(-)
> > 
> > diff --git a/subprojects/libvhost-user/libvhost-user.c 
> > b/subprojects/libvhost-user/libvhost-user.c
> > index 0fe3aa155b..14482484d3 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -809,6 +809,7 @@ static bool
> >  vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = 
> > &m;
> >  int i;
> > +bool found = false;
> >  
> >  if (vmsg->fd_num != 1 ||
> >  vmsg->size != sizeof(vmsg->payload.memreg)) {
> > @@ -835,21 +836,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> >  munmap(m, r->size + r->mmap_offset);
> >  }
> >  
> > -break;
> > +/*
> > + * Shift all affected entries by 1 to close the hole at index 
> > i and
> > + * zero out the last entry.
> > + */
> > +memmove(dev->regions + i, dev->regions + i + 1,
> > +sizeof(VuDevRegion) * (dev->nregions - i - 1));
> > +memset(dev->regions + dev->nregions - 1, 0, 
> > sizeof(VuDevRegion));
> > +DPRINT("Successfully removed a region\n");
> > +dev->nregions--;
> > +i--;
> > +
> > +found = true;
> 
> Maybe add a comment like
> 
> /* Continue the search for eventual duplicates. */

Ack - I'll add it.

> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
>

Re: [PATCH] vhost-user-scsi: avoid unlink(NULL) with fd passing

2022-04-29 Thread Raphael Norwitz

On Wed, Apr 27, 2022 at 11:01:16AM +0100, Stefan Hajnoczi wrote:
> Commit 747421e949fc1eb3ba66b5fcccdb7ba051918241 ("Implements Backend
> Program conventions for vhost-user-scsi") introduced fd-passing support
> as part of implementing the vhost-user backend program conventions.
> 
> When fd passing is used the UNIX domain socket path is NULL and we must
> not call unlink(2).
> 
> Fixes: Coverity CID 1488353
> Fixes: 747421e949fc1eb3ba66b5fcccdb7ba051918241 ("Implements Backend Program 
> conventions for vhost-user-scsi")
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Raphael Norwitz 

> ---
>  contrib/vhost-user-scsi/vhost-user-scsi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c 
> b/contrib/vhost-user-scsi/vhost-user-scsi.c
> index b2c0f98253..08335d4b2b 100644
> --- a/contrib/vhost-user-scsi/vhost-user-scsi.c
> +++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
> @@ -433,7 +433,9 @@ out:
>  if (vdev_scsi) {
>  g_main_loop_unref(vdev_scsi->loop);
>  g_free(vdev_scsi);
> -unlink(opt_socket_path);
> +if (opt_socket_path) {
> +unlink(opt_socket_path);
> +}
>  }
>  if (csock >= 0) {
>  close(csock);
> -- 
> 2.35.1
>

Re: [PATCH] vhost-user: Use correct macro name TARGET_PPC64g

2022-05-04 Thread Raphael Norwitz

On Tue, May 03, 2022 at 03:01:08PM -0300, Murilo Opsfelder Araujo wrote:
> The correct name of the macro is TARGET_PPC64.
> 
> Fixes: 27598393a232 ("Lift max memory slots limit imposed by vhost-user")
> Reported-by: Fabiano Rosas 
> Signed-off-by: Murilo Opsfelder Araujo 
> Cc: Raphael Norwitz 
> Cc: Peter Turschmid 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/virtio/vhost-user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 9c4f84f35f..e356c72c81 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -51,7 +51,7 @@
>  #include "hw/acpi/acpi.h"
>  #define VHOST_USER_MAX_RAM_SLOTS ACPI_MAX_RAM_SLOTS
>  
> -#elif defined(TARGET_PPC) || defined(TARGET_PPC_64)
> +#elif defined(TARGET_PPC) || defined(TARGET_PPC64)
>  #include "hw/ppc/spapr.h"
>  #define VHOST_USER_MAX_RAM_SLOTS SPAPR_MAX_RAM_SLOTS
>  
> -- 
> 2.35.1
>

Re: [PATCH v2 0/8] vhost-user-blk: dynamically resize config space based on features

2022-09-01 Thread Raphael Norwitz

> ping

Apologies for the late review - busy week. First pass looks good but will review
comprehensively tomorrow or over the weekend.

Re: [PATCH v2 1/8] virtio: introduce VirtIOConfigSizeParams & virtio_get_config_size

2022-09-02 Thread Raphael Norwitz

I feel like it would be easier to review if the first 4 patches were
squashed together, but that’s not a big deal.

For this one:

Reviewed-by: Raphael Norwitz 

On Fri, Aug 26, 2022 at 05:32:41PM +0300, Daniil Tatianin wrote:
> This is the first step towards moving all device config size calculation
> logic into the virtio core code. In particular, this adds a struct that
> contains all the necessary information for common virtio code to be able
> to calculate the final config size for a device. This is expected to be
> used with the new virtio_get_config_size helper, which calculates the
> final length based on the provided host features.
> 
> This builds on top of already existing code like VirtIOFeature and
> virtio_feature_get_config_size(), but adds additional fields, as well as
> sanity checking so that device-specifc code doesn't have to duplicate it.
> 
> An example usage would be:
> 
> static const VirtIOFeature dev_features[] = {
> {.flags = 1ULL << FEATURE_1_BIT,
>  .end = endof(struct virtio_dev_config, feature_1)},
> {.flags = 1ULL << FEATURE_2_BIT,
>  .end = endof(struct virtio_dev_config, feature_2)},
> {}
> };
> 
> static const VirtIOConfigSizeParams dev_cfg_size_params = {
> .min_size = DEV_BASE_CONFIG_SIZE,
> .max_size = sizeof(struct virtio_dev_config),
> .feature_sizes = dev_features
> };
> 
> // code inside my_dev_device_realize()
> size_t config_size = virtio_get_config_size(&dev_cfg_size_params,
> host_features);
> virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);
> 
> Currently every device is expected to write its own boilerplate from the
> example above in device_realize(), however, the next step of this
> transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
> so that it can be done automatically by the virtio initialization code.
> 
> Signed-off-by: Daniil Tatianin 
> ---
>  hw/virtio/virtio.c | 17 +
>  include/hw/virtio/virtio.h |  9 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 5d607aeaa0..8518382025 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -3014,6 +3014,23 @@ size_t virtio_feature_get_config_size(const 
> VirtIOFeature *feature_sizes,
>  return config_size;
>  }
>  
> +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
> +  uint64_t host_features)
> +{
> +size_t config_size = params->min_size;
> +const VirtIOFeature *feature_sizes = params->feature_sizes;
> +size_t i;
> +
> +for (i = 0; feature_sizes[i].flags != 0; i++) {
> +if (host_features & feature_sizes[i].flags) {
> +config_size = MAX(feature_sizes[i].end, config_size);
> +}
> +}
> +
> +assert(config_size <= params->max_size);
> +return config_size;
> +}
> +
>  int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>  {
>  int i, ret;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index db1c0ddf6b..1991c58d9b 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -44,6 +44,15 @@ typedef struct VirtIOFeature {
>  size_t end;
>  } VirtIOFeature;
>  
> +typedef struct VirtIOConfigSizeParams {
> +size_t min_size;
> +size_t max_size;
> +const VirtIOFeature *feature_sizes;
> +} VirtIOConfigSizeParams;
> +
> +size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
> +  uint64_t host_features);
> +
>  size_t virtio_feature_get_config_size(const VirtIOFeature *features,
>uint64_t host_features);
>  
> -- 
> 2.25.1
>

Re: [PATCH v2 6/8] vhost-user-blk: make it possible to disable write-zeroes/discard

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:46PM +0300, Daniil Tatianin wrote:
> It is useful to have the ability to disable these features for
> compatibility with older VMs that don't have these implemented.
> 
> Signed-off-by: Daniil Tatianin 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9117222456..4c9727e08c 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -259,8 +259,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
> *vdev,
>  virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
>  virtio_add_feature(&features, VIRTIO_BLK_F_FLUSH);
>  virtio_add_feature(&features, VIRTIO_BLK_F_RO);
> -virtio_add_feature(&features, VIRTIO_BLK_F_DISCARD);
> -virtio_add_feature(&features, VIRTIO_BLK_F_WRITE_ZEROES);
>  
>  if (s->config_wce) {
>  virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
> @@ -592,6 +590,10 @@ static Property vhost_user_blk_properties[] = {
> VHOST_USER_BLK_AUTO_NUM_QUEUES),
>  DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
>  DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
> +DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
> +  VIRTIO_BLK_F_DISCARD, true),
> +DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
> +  VIRTIO_BLK_F_WRITE_ZEROES, true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -- 
> 2.25.1
>

Re: [PATCH v2 4/8] virtio: remove the virtio_feature_get_config_size helper

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:44PM +0300, Daniil Tatianin wrote:
> This has no more users and is superseded by virtio_get_config_size.
> 
> Signed-off-by: Daniil Tatianin 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/virtio/virtio.c | 15 ---
>  include/hw/virtio/virtio.h |  3 ---
>  2 files changed, 18 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 8518382025..c0bf8dd6fd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2999,21 +2999,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t 
> val)
>  return ret;
>  }
>  
> -size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
> -  uint64_t host_features)
> -{
> -size_t config_size = 0;
> -int i;
> -
> -for (i = 0; feature_sizes[i].flags != 0; i++) {
> -if (host_features & feature_sizes[i].flags) {
> -config_size = MAX(feature_sizes[i].end, config_size);
> -}
> -}
> -
> -return config_size;
> -}
> -
>  size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
>uint64_t host_features)
>  {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 1991c58d9b..f3ff6710d5 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -53,9 +53,6 @@ typedef struct VirtIOConfigSizeParams {
>  size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
>uint64_t host_features);
>  
> -size_t virtio_feature_get_config_size(const VirtIOFeature *features,
> -  uint64_t host_features);
> -
>  typedef struct VirtQueue VirtQueue;
>  
>  #define VIRTQUEUE_MAX_SIZE 1024
> -- 
> 2.25.1
>

Re: [PATCH v2 7/8] vhost-user-blk: make 'config_wce' part of 'host_features'

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:47PM +0300, Daniil Tatianin wrote:
> No reason to have this be a separate field. This also makes it more akin
> to what the virtio-blk device does.
> 
> Signed-off-by: Daniil Tatianin 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c  | 6 ++
>  include/hw/virtio/vhost-user-blk.h | 1 -
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 4c9727e08c..0d916edefa 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -260,9 +260,6 @@ static uint64_t vhost_user_blk_get_features(VirtIODevice 
> *vdev,
>  virtio_add_feature(&features, VIRTIO_BLK_F_FLUSH);
>  virtio_add_feature(&features, VIRTIO_BLK_F_RO);
>  
> -if (s->config_wce) {
> -virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
> -}
>  if (s->num_queues > 1) {
>  virtio_add_feature(&features, VIRTIO_BLK_F_MQ);
>  }
> @@ -589,7 +586,8 @@ static Property vhost_user_blk_properties[] = {
>  DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
> VHOST_USER_BLK_AUTO_NUM_QUEUES),
>  DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
> -DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
> +DEFINE_PROP_BIT64("config-wce", VHostUserBlk, parent_obj.host_features,
> +  VIRTIO_BLK_F_CONFIG_WCE, true),
>  DEFINE_PROP_BIT64("discard", VHostUserBlk, parent_obj.host_features,
>VIRTIO_BLK_F_DISCARD, true),
>  DEFINE_PROP_BIT64("write-zeroes", VHostUserBlk, parent_obj.host_features,
> diff --git a/include/hw/virtio/vhost-user-blk.h 
> b/include/hw/virtio/vhost-user-blk.h
> index 7c91f15040..ea085ee1ed 100644
> --- a/include/hw/virtio/vhost-user-blk.h
> +++ b/include/hw/virtio/vhost-user-blk.h
> @@ -34,7 +34,6 @@ struct VHostUserBlk {
>  struct virtio_blk_config blkcfg;
>  uint16_t num_queues;
>  uint32_t queue_size;
> -uint32_t config_wce;
>  struct vhost_dev dev;
>  struct vhost_inflight *inflight;
>  VhostUserState vhost_user;
> -- 
> 2.25.1
>

Re: [PATCH v2 2/8] virtio-blk: utilize VirtIOConfigSizeParams & virtio_get_config_size

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:42PM +0300, Daniil Tatianin wrote:
> Use the common helper instead of duplicating the same logic.
> 
> Signed-off-by: Daniil Tatianin 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/virtio-blk.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index e9ba752f6b..10c47c2934 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -49,13 +49,11 @@ static const VirtIOFeature feature_sizes[] = {
>  {}
>  };
>  
> -static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t 
> host_features)
> -{
> -s->config_size = MAX(VIRTIO_BLK_CFG_SIZE,
> -virtio_feature_get_config_size(feature_sizes, host_features));
> -
> -assert(s->config_size <= sizeof(struct virtio_blk_config));
> -}
> +static const VirtIOConfigSizeParams cfg_size_params = {
> +.min_size = VIRTIO_BLK_CFG_SIZE,
> +.max_size = sizeof(struct virtio_blk_config),
> +.feature_sizes = feature_sizes
> +};
>  
>  static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
>  VirtIOBlockReq *req)
> @@ -1204,8 +1202,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> -virtio_blk_set_config_size(s, s->host_features);
> -
> +s->config_size = virtio_get_config_size(&cfg_size_params,
> +s->host_features);
>  virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
>  
>  s->blk = conf->conf.blk;
> -- 
> 2.25.1
>

Re: [PATCH v2 8/8] vhost-user-blk: dynamically resize config space based on features

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:48PM +0300, Daniil Tatianin wrote:
> Make vhost-user-blk backwards compatible when migrating from older VMs
> running with modern features turned off, the same way it was done for
> virtio-blk in 20764be0421c ("virtio-blk: set config size depending on the 
> features enabled")
> 
> It's currently impossible to migrate from an older VM with
> vhost-user-blk (with disable-legacy=off) because of errors like this:
> 
> qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: 41 
> device: 1 cmask: ff wmask: 80 w1cmask:0
> qemu-system-x86_64: Failed to load PCIDevice:config
> qemu-system-x86_64: Failed to load virtio-blk:virtio
> qemu-system-x86_64: error while loading state for instance 0x0 of device 
> ':00:05.0:00.0:02.0/virtio-blk'
> qemu-system-x86_64: load of migration failed: Invalid argument
> 
> This is caused by the newer (destination) VM requiring a bigger BAR0
> alignment because it has to cover a bigger configuration space, which
> isn't actually needed since those additional config fields are not
> active (write-zeroes/discard).
>

With the relevant meson.build and MAINTAINERS bits rebased here:

Reviewed-by: Raphael Norwitz 


> Signed-off-by: Daniil Tatianin 
> ---
>  hw/block/vhost-user-blk.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 0d916edefa..8dd063eb96 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -23,6 +23,7 @@
>  #include "hw/qdev-core.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-properties-system.h"
> +#include "hw/virtio/virtio-blk-common.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user-blk.h"
>  #include "hw/virtio/virtio.h"
> @@ -63,7 +64,7 @@ static void vhost_user_blk_update_config(VirtIODevice 
> *vdev, uint8_t *config)
>  /* Our num_queues overrides the device backend */
>  virtio_stw_p(vdev, &s->blkcfg.num_queues, s->num_queues);
>  
> -memcpy(config, &s->blkcfg, sizeof(struct virtio_blk_config));
> +memcpy(config, &s->blkcfg, vdev->config_len);
>  }
>  
>  static void vhost_user_blk_set_config(VirtIODevice *vdev, const uint8_t 
> *config)
> @@ -92,12 +93,12 @@ static int vhost_user_blk_handle_config_change(struct 
> vhost_dev *dev)
>  {
>  int ret;
>  struct virtio_blk_config blkcfg;
> +VirtIODevice *vdev = dev->vdev;
>  VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
>  Error *local_err = NULL;
>  
>  ret = vhost_dev_get_config(dev, (uint8_t *)&blkcfg,
> -   sizeof(struct virtio_blk_config),
> -   &local_err);
> +   vdev->config_len, &local_err);
>  if (ret < 0) {
>  error_report_err(local_err);
>  return ret;
> @@ -106,7 +107,7 @@ static int vhost_user_blk_handle_config_change(struct 
> vhost_dev *dev)
>  /* valid for resize only */
>  if (blkcfg.capacity != s->blkcfg.capacity) {
>  s->blkcfg.capacity = blkcfg.capacity;
> -memcpy(dev->vdev->config, &s->blkcfg, sizeof(struct 
> virtio_blk_config));
> +memcpy(dev->vdev->config, &s->blkcfg, vdev->config_len);
>  virtio_notify_config(dev->vdev);
>  }
>  
> @@ -442,7 +443,7 @@ static int vhost_user_blk_realize_connect(VHostUserBlk 
> *s, Error **errp)
>  assert(s->connected);
>  
>  ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
> -   sizeof(struct virtio_blk_config), errp);
> +   s->parent_obj.config_len, errp);
>  if (ret < 0) {
>  qemu_chr_fe_disconnect(&s->chardev);
>  vhost_dev_cleanup(&s->dev);
> @@ -457,6 +458,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  ERRP_GUARD();
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +size_t config_size;
>  int retries;
>  int i, ret;
>  
> @@ -487,8 +489,9 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  return;
>  }
>  
> -virtio_init(vdev, VIRTIO_ID_BLOCK,
> -sizeof(struct virtio_blk_config));
> +config_size = virtio_get_config_size(&virtio_blk_cfg_size_params,
> + vdev->host_features);
> +virtio_init(vdev, VIRTIO_ID_BLOCK, config_size);
>  
>  s->virtqs = g_new(VirtQueue *, s->num_queues);
>  for (i = 0; i < s->num_queues; i++) {
> -- 
> 2.25.1
>

Re: [PATCH v2 3/8] virtio-net: utilize VirtIOConfigSizeParams & virtio_get_config_size

2022-09-02 Thread Raphael Norwitz

On Fri, Aug 26, 2022 at 05:32:43PM +0300, Daniil Tatianin wrote:
> Use the new common helper. As an added bonus this also makes use of
> config size sanity checking via the 'max_size' field.
> 
> Signed-off-by: Daniil Tatianin 
> ---
>  hw/net/virtio-net.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index dd0d056fde..dfc8dd8562 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -106,6 +106,11 @@ static const VirtIOFeature feature_sizes[] = {
>  {}
>  };
>  
> +static const VirtIOConfigSizeParams cfg_size_params = {

Can we have a zero length virtio-net config size? The both the v1.0 and
v1.1 section 5.1.4 say “The mac address field always exists (though is
only valid if VIRTIO_NET_F_MAC is set)” so we should probably set
min_size to offset_of status?

Otherwise LGTM.

> +.max_size = sizeof(struct virtio_net_config),
> +.feature_sizes = feature_sizes
> +};
> +
>  static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
> @@ -3246,8 +3251,7 @@ static void virtio_net_set_config_size(VirtIONet *n, 
> uint64_t host_features)
>  {
>  virtio_add_feature(&host_features, VIRTIO_NET_F_MAC);
>  
> -n->config_size = virtio_feature_get_config_size(feature_sizes,
> -host_features);
> +n->config_size = virtio_get_config_size(&cfg_size_params, host_features);
>  }
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> -- 
> 2.25.1
>

Re: [PATCH v2 5/8] virtio-blk: move config size params to virtio-blk-common

2022-09-02 Thread Raphael Norwitz

The vhost-user-blk bits in meson.build and Maintainers should probably
be in patch 8?

Otherwise LGTM.

On Fri, Aug 26, 2022 at 05:32:45PM +0300, Daniil Tatianin wrote:
> This way we can reuse it for other virtio-blk devices, e.g
> vhost-user-blk, which currently does not control its config space size
> dynamically.
> 
> Signed-off-by: Daniil Tatianin 
> ---
>  MAINTAINERS   |  4 +++
>  hw/block/meson.build  |  4 +--
>  hw/block/virtio-blk-common.c  | 39 +++
>  hw/block/virtio-blk.c | 24 ++---
>  include/hw/virtio/virtio-blk-common.h | 20 ++
>  5 files changed, 67 insertions(+), 24 deletions(-)
>  create mode 100644 hw/block/virtio-blk-common.c
>  create mode 100644 include/hw/virtio/virtio-blk-common.h
> 



i.e. move this.

> @@ -2271,11 +2273,13 @@ S: Maintained
>  F: contrib/vhost-user-blk/
>  F: contrib/vhost-user-scsi/
>  F: hw/block/vhost-user-blk.c
> +F: hw/block/virtio-blk-common.c
>  F: hw/scsi/vhost-user-scsi.c
>  F: hw/virtio/vhost-user-blk-pci.c
>  F: hw/virtio/vhost-user-scsi-pci.c
>  F: include/hw/virtio/vhost-user-blk.h
>  F: include/hw/virtio/vhost-user-scsi.h
> +F: include/hw/virtio/virtio-blk-common.h
>  
>  vhost-user-gpu
>  M: Marc-André Lureau 
> diff --git a/hw/block/meson.build b/hw/block/meson.build
> index 2389326112..1908abd45c 100644
> --- a/hw/block/meson.build
> +++ b/hw/block/meson.build
> @@ -16,7 +16,7 @@ softmmu_ss.add(when: 'CONFIG_SWIM', if_true: 
> files('swim.c'))
>  softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
>  softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
>  
> -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
> -specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
> files('vhost-user-blk.c'))
> +specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 
> 'virtio-blk-common.c'))

And this

> +specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
> files('vhost-user-blk.c', 'virtio-blk-common.c'))

Re: [PATCH 1/2] contrib/vhost-user-blk: Replace lseek64 with lseek

2022-12-18 Thread Raphael Norwitz

> On Dec 17, 2022, at 8:08 PM, Khem Raj  wrote:
> 
> 64bit off_t is already in use since build uses _FILE_OFFSET_BITS=64
> already. Using lseek/off_t also makes it work with latest must without

s/must/musl/ ?

> using _LARGEFILE64_SOURCE macro. This macro is implied with _GNU_SOURCE
> when using glibc but not with musl.
> 

Other than the nit LGTM

Reviewed-by: Raphael Norwitz 

> Signed-off-by: Khem Raj 
> Cc: Michael S. Tsirkin 
> CC: Raphael Norwitz 
> ---
> contrib/vhost-user-blk/vhost-user-blk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
> b/contrib/vhost-user-blk/vhost-user-blk.c
> index aa99877fcd..7941694e53 100644
> --- a/contrib/vhost-user-blk/vhost-user-blk.c
> +++ b/contrib/vhost-user-blk/vhost-user-blk.c
> @@ -532,9 +532,9 @@ vub_get_blocksize(int fd)
> static void
> vub_initialize_config(int fd, struct virtio_blk_config *config)
> {
> -off64_t capacity;
> +off_t capacity;
> 
> -capacity = lseek64(fd, 0, SEEK_END);
> +capacity = lseek(fd, 0, SEEK_END);
> config->capacity = capacity >> 9;
> config->blk_size = vub_get_blocksize(fd);
> config->size_max = 65536;
> -- 
> 2.39.0
>

Re: [PATCH v2] contrib/vhost-user-blk: Replace lseek64 with lseek

2022-12-18 Thread Raphael Norwitz




> On Dec 19, 2022, at 12:07 AM, Khem Raj  wrote:
> 
> 64bit off_t is already in use since build uses _FILE_OFFSET_BITS=64
> already. Using lseek/off_t also makes it work with latest musl without
> using _LARGEFILE64_SOURCE macro. This macro is implied with _GNU_SOURCE
> when using glibc but not with musl.
> 
> Signed-off-by: Khem Raj 
> Cc: Michael S. Tsirkin 
> CC: Raphael Norwitz 
> ---
> v2: Fix typo must->musl
> 
> contrib/vhost-user-blk/vhost-user-blk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
> b/contrib/vhost-user-blk/vhost-user-blk.c
> index aa99877fcd..7941694e53 100644
> --- a/contrib/vhost-user-blk/vhost-user-blk.c
> +++ b/contrib/vhost-user-blk/vhost-user-blk.c
> @@ -532,9 +532,9 @@ vub_get_blocksize(int fd)
> static void
> vub_initialize_config(int fd, struct virtio_blk_config *config)
> {
> -off64_t capacity;
> +off_t capacity;
> 
> -capacity = lseek64(fd, 0, SEEK_END);
> +capacity = lseek(fd, 0, SEEK_END);
> config->capacity = capacity >> 9;
> config->blk_size = vub_get_blocksize(fd);
> config->size_max = 65536;
> -- 
> 2.39.0
> 


Reviewed-by: Raphael Norwitz

Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-07-01 Thread Raphael Norwitz

I have no issues with these APIs, but I'm not a QMP expert so others
should review those bits.

For the vhost-user-blk code:

Acked-by: Raphael Norwitz 

On Tue, Jun 25, 2024 at 8:19 AM Vladimir Sementsov-Ogievskiy
 wrote:
>
> v5:
> 03: drop extra check on is is runstate running
>
>
> Vladimir Sementsov-Ogievskiy (3):
>   qdev-monitor: add option to report GenericError from find_device_state
>   vhost-user-blk: split vhost_user_blk_sync_config()
>   qapi: introduce device-sync-config
>
>  hw/block/vhost-user-blk.c | 27 ++--
>  hw/virtio/virtio-pci.c|  9 +++
>  include/hw/qdev-core.h|  3 +++
>  qapi/qdev.json| 24 ++
>  system/qdev-monitor.c | 53 ---
>  5 files changed, 105 insertions(+), 11 deletions(-)
>
> --
> 2.34.1
>

Re: [PATCH v3 2/2] vhost-user: fix lost reconnect again

2024-05-14 Thread Raphael Norwitz

Code looks good. Just a question on the error case you're trying to fix.

On Tue, May 14, 2024 at 2:12 AM Li Feng  wrote:
>
> When the vhost-user is reconnecting to the backend, and if the vhost-user 
> fails
> at the get_features in vhost_dev_init(), then the reconnect will fail
> and it will not be retriggered forever.
>
> The reason is:
> When the vhost-user fail at get_features, the vhost_dev_cleanup will be called
> immediately.
>
> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>
> The reconnect path is:
> vhost_user_blk_event
>vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>  qemu_chr_fe_set_handlers <- clear the notifier callback
>schedule vhost_user_async_close_bh
>
> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
> called, then the event fd callback will not be reinstalled.
>
> With this patch, the vhost_user_blk_disconnect will call the
> vhost_dev_cleanup() again, it's safe.
>
> In addition, the CLOSE event may occur in a scenario where connected is false.
> At this time, the event handler will be cleared. We need to ensure that the
> event handler can remain installed.

Following on from the prior patch, why would "connected" be false when
a CLOSE event happens?

>
> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>
> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>
> Signed-off-by: Li Feng 
> ---
>  hw/block/vhost-user-blk.c   |  3 ++-
>  hw/scsi/vhost-user-scsi.c   |  3 ++-
>  hw/virtio/vhost-user-base.c |  3 ++-
>  hw/virtio/vhost-user.c  | 10 +-
>  4 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 41d1ac3a5a..c6842ced48 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -353,7 +353,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>
>  if (!s->connected) {
> -return;
> +goto done;
>  }
>  s->connected = false;
>
> @@ -361,6 +361,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>
>  vhost_dev_cleanup(&s->dev);
>
> +done:
>  /* Re-instate the event handler for new connections */
>  qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
>   NULL, dev, NULL, true);
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 48a59e020e..b49a11d23b 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -181,7 +181,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev)
>  VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
>
>  if (!s->connected) {
> -return;
> +goto done;
>  }
>  s->connected = false;
>
> @@ -189,6 +189,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev)
>
>  vhost_dev_cleanup(&vsc->dev);
>
> +done:
>  /* Re-instate the event handler for new connections */
>  qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL,
>   vhost_user_scsi_event, NULL, dev, NULL, true);
> diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> index 4b54255682..11e72b1e3b 100644
> --- a/hw/virtio/vhost-user-base.c
> +++ b/hw/virtio/vhost-user-base.c
> @@ -225,13 +225,14 @@ static void vub_disconnect(DeviceState *dev)
>  VHostUserBase *vub = VHOST_USER_BASE(vdev);
>
>  if (!vub->connected) {
> -return;
> +goto done;
>  }
>  vub->connected = false;
>
>  vub_stop(vdev);
>  vhost_dev_cleanup(&vub->vhost_dev);
>
> +done:
>  /* Re-instate the event handler for new connections */
>  qemu_chr_fe_set_handlers(&vub->chardev,
>   NULL, NULL, vub_event,
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index c929097e87..c407ea8939 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2781,16 +2781,8 @@ typedef struct {
>  static void vhost_user_async_close_bh(void *opaque)
>  {
>  VhostAsyncCallback *data = opaque;
> -struct vhost_dev *vhost = data->vhost;
>
> -/*
> - * If the vhost_dev has been cleared in the meantime there is
> - * nothing left to do as some other path has completed the
> - * cleanup.
> - */
> -if (vhost->vdev) {
> -data->cb(data->dev);
> -}
> +data->cb(data->dev);
>
>  g_free(data);
>  }
> --
> 2.45.0
>

Re: [PATCH v3 1/2] Revert "vhost-user: fix lost reconnect"

2024-05-14 Thread Raphael Norwitz

The code for these two patches looks fine. Just some questions on the
failure case you're trying to fix.


On Tue, May 14, 2024 at 2:12 AM Li Feng  wrote:
>
> This reverts commit f02a4b8e6431598612466f76aac64ab492849abf.
>
> Since the current patch cannot completely fix the lost reconnect
> problem, there is a scenario that is not considered:
> - When the virtio-blk driver is removed from the guest os,
>   s->connected has no chance to be set to false, resulting in

Why would the virtio-blk driver being removed (unloaded?) in the guest
effect s->connected? Isn't this variable just tracking whether Qemu is
connected to the backend process? What does it have to do with the
guest driver state?

>   subsequent reconnection not being executed.
>
> The next patch will completely fix this issue with a better approach.
>
> Signed-off-by: Li Feng 
> ---
>  hw/block/vhost-user-blk.c  |  2 +-
>  hw/scsi/vhost-user-scsi.c  |  3 +--
>  hw/virtio/vhost-user-base.c|  2 +-
>  hw/virtio/vhost-user.c | 10 ++
>  include/hw/virtio/vhost-user.h |  3 +--
>  5 files changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9e6bbc6950..41d1ac3a5a 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -384,7 +384,7 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &s->chardev, &s->dev,
> -   vhost_user_blk_disconnect, 
> vhost_user_blk_event);
> +   vhost_user_blk_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..48a59e020e 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -214,8 +214,7 @@ static void vhost_user_scsi_event(void *opaque, 
> QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev,
> -   vhost_user_scsi_disconnect,
> -   vhost_user_scsi_event);
> +   vhost_user_scsi_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> index a83167191e..4b54255682 100644
> --- a/hw/virtio/vhost-user-base.c
> +++ b/hw/virtio/vhost-user-base.c
> @@ -254,7 +254,7 @@ static void vub_event(void *opaque, QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &vub->chardev, &vub->vhost_dev,
> -   vub_disconnect, vub_event);
> +   vub_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..c929097e87 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2776,7 +2776,6 @@ typedef struct {
>  DeviceState *dev;
>  CharBackend *cd;
>  struct vhost_dev *vhost;
> -IOEventHandler *event_cb;
>  } VhostAsyncCallback;
>
>  static void vhost_user_async_close_bh(void *opaque)
> @@ -2791,10 +2790,7 @@ static void vhost_user_async_close_bh(void *opaque)
>   */
>  if (vhost->vdev) {
>  data->cb(data->dev);
> -} else if (data->event_cb) {
> -qemu_chr_fe_set_handlers(data->cd, NULL, NULL, data->event_cb,
> - NULL, data->dev, NULL, true);
> -   }
> +}
>
>  g_free(data);
>  }
> @@ -2806,8 +2802,7 @@ static void vhost_user_async_close_bh(void *opaque)
>   */
>  void vhost_user_async_close(DeviceState *d,
>  CharBackend *chardev, struct vhost_dev *vhost,
> -vu_async_close_fn cb,
> -IOEventHandler *event_cb)
> +vu_async_close_fn cb)
>  {
>  if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>  /*
> @@ -2823,7 +2818,6 @@ void vhost_user_async_close(DeviceState *d,
>  data->dev = d;
>  data->cd = chardev;
>  data->vhost = vhost;
> -data->event_cb = event_cb;
>
>  /* Disable any further notifications on the chardev */
>  qemu_chr_fe_set_handlers(chardev,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index d7c09ffd34..324cd8663a 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -108,7 +108,6 @@ typedef void (*vu_async_close_fn)(DeviceState *cb);
>
>  void vhost_user_async_close(DeviceState *d,
>  CharBackend *chardev, struct vhost_dev *vhost

Re: [PATCH v3 1/2] Revert "vhost-user: fix lost reconnect"

2024-05-15 Thread Raphael Norwitz

On Wed, May 15, 2024 at 1:47 AM Li Feng  wrote:
>
>
>
> > 2024年5月14日 21:58，Raphael Norwitz  写道：
> >
> > The code for these two patches looks fine. Just some questions on the
> > failure case you're trying to fix.
> >
> >
> > On Tue, May 14, 2024 at 2:12 AM Li Feng  wrote:
> >>
> >> This reverts commit f02a4b8e6431598612466f76aac64ab492849abf.
> >>
> >> Since the current patch cannot completely fix the lost reconnect
> >> problem, there is a scenario that is not considered:
> >> - When the virtio-blk driver is removed from the guest os,
> >>  s->connected has no chance to be set to false, resulting in
> >
> > Why would the virtio-blk driver being removed (unloaded?) in the guest
> > effect s->connected? Isn't this variable just tracking whether Qemu is
> > connected to the backend process? What does it have to do with the
> > guest driver state?
>
> Unload the virtio-blk, it will trigger ‘vhost_user_blk_stop’, and in 
> `vhost_dev_stop`
> it will set the `hdev->vdev = NULL;`.
>
> Next if kill the backend, the CLOSE event will be triggered, and the 
> `vhost->vdev`
> has been set to null before, then the `vhost_user_blk_disconnect` will not 
> have a
> chance to execute.So that he s->connected is still true.

Makes sense - basically if the driver is unloaded and then the device
is brought down s->connected will remain true when it should be false,
which will mess up a subsequent reconnect.

See my comments on the following patch though.

>
> static void vhost_user_async_close_bh(void *opaque)
> {
> VhostAsyncCallback *data = opaque;
> struct vhost_dev *vhost = data->vhost;
>
> /*
>  * If the vhost_dev has been cleared in the meantime there is
>  * nothing left to do as some other path has completed the
>  * cleanup.
>  */
> if (vhost->vdev) {  < HERE vdev is null.
> data->cb(data->dev);
> } else if (data->event_cb) {
> qemu_chr_fe_set_handlers(data->cd, NULL, NULL, data->event_cb,
>  NULL, data->dev, NULL, true);
>}
>
> g_free(data);
> }
>
> Thanks,
> Li
>
> >
> >>  subsequent reconnection not being executed.
> >>
> >> The next patch will completely fix this issue with a better approach.
> >>
> >> Signed-off-by: Li Feng 
> >> ---
> >> hw/block/vhost-user-blk.c  |  2 +-
> >> hw/scsi/vhost-user-scsi.c  |  3 +--
> >> hw/virtio/vhost-user-base.c|  2 +-
> >> hw/virtio/vhost-user.c | 10 ++
> >> include/hw/virtio/vhost-user.h |  3 +--
> >> 5 files changed, 6 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >> index 9e6bbc6950..41d1ac3a5a 100644
> >> --- a/hw/block/vhost-user-blk.c
> >> +++ b/hw/block/vhost-user-blk.c
> >> @@ -384,7 +384,7 @@ static void vhost_user_blk_event(void *opaque, 
> >> QEMUChrEvent event)
> >> case CHR_EVENT_CLOSED:
> >> /* defer close until later to avoid circular close */
> >> vhost_user_async_close(dev, &s->chardev, &s->dev,
> >> -   vhost_user_blk_disconnect, 
> >> vhost_user_blk_event);
> >> +   vhost_user_blk_disconnect);
> >> break;
> >> case CHR_EVENT_BREAK:
> >> case CHR_EVENT_MUX_IN:
> >> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> >> index a63b1f4948..48a59e020e 100644
> >> --- a/hw/scsi/vhost-user-scsi.c
> >> +++ b/hw/scsi/vhost-user-scsi.c
> >> @@ -214,8 +214,7 @@ static void vhost_user_scsi_event(void *opaque, 
> >> QEMUChrEvent event)
> >> case CHR_EVENT_CLOSED:
> >> /* defer close until later to avoid circular close */
> >> vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev,
> >> -   vhost_user_scsi_disconnect,
> >> -   vhost_user_scsi_event);
> >> +   vhost_user_scsi_disconnect);
> >> break;
> >> case CHR_EVENT_BREAK:
> >> case CHR_EVENT_MUX_IN:
> >> diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> >> index a83167191e..4b54255682 100644
> >> --- a/hw/virtio/vhost-user-base.c
> >> +++ b/hw/virtio/vhost-user-base.c
> >> @@ -254,7 +254,7 @@ sta

Re: [PATCH v3 2/2] vhost-user: fix lost reconnect again

2024-05-15 Thread Raphael Norwitz

The case your describing makes sense but now I have some concerns on
the vhost_dev_cleanup bit.

On Wed, May 15, 2024 at 1:47 AM Li Feng  wrote:
>
>
>
> > 2024年5月14日 21:58，Raphael Norwitz  写道：
> >
> > Code looks good. Just a question on the error case you're trying to fix.
> >
> > On Tue, May 14, 2024 at 2:12 AM Li Feng  wrote:
> >>
> >> When the vhost-user is reconnecting to the backend, and if the vhost-user 
> >> fails
> >> at the get_features in vhost_dev_init(), then the reconnect will fail
> >> and it will not be retriggered forever.
> >>
> >> The reason is:
> >> When the vhost-user fail at get_features, the vhost_dev_cleanup will be 
> >> called
> >> immediately.
> >>
> >> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
> >>
> >> The reconnect path is:
> >> vhost_user_blk_event
> >>   vhost_user_async_close(.. vhost_user_blk_disconnect ..)
> >> qemu_chr_fe_set_handlers <- clear the notifier callback
> >>   schedule vhost_user_async_close_bh
> >>
> >> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
> >> called, then the event fd callback will not be reinstalled.
> >>
> >> With this patch, the vhost_user_blk_disconnect will call the
> >> vhost_dev_cleanup() again, it's safe.
> >>
> >> In addition, the CLOSE event may occur in a scenario where connected is 
> >> false.
> >> At this time, the event handler will be cleared. We need to ensure that the
> >> event handler can remain installed.
> >
> > Following on from the prior patch, why would "connected" be false when
> > a CLOSE event happens?
>
> In OPEN event handling, vhost_user_blk_connect calls vhost_dev_init and 
> encounters
> an error such that s->connected remains false.
> Next, after the CLOSE event arrives, it is found that s->connected is false, 
> so nothing
> is done, but the event handler will be cleaned up in `vhost_user_async_close`
> before the CLOSE event is executed.
>

Got it - I see why the event handler is never re-installed in the code
as it was before if we fail at get_features. That said, how do you
explain your comment:

> >> With this patch, the vhost_user_blk_disconnect will call the
> >> vhost_dev_cleanup() again, it's safe.

I see vhost_dev_cleanup() accessing hdev without even a NULL check. In
the case we're talking about here I don't think it's a problem because
if vhost_dev_init() fails, connected will be false and hit the goto
but I am concerned that there could be double-frees or use-after-frees
in other cases.

> Thanks,
> Li
>
> >
> >>
> >> All vhost-user devices have this issue, including vhost-user-blk/scsi.
> >>
> >> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
> >>
> >> Signed-off-by: Li Feng 
> >> ---
> >> hw/block/vhost-user-blk.c   |  3 ++-
> >> hw/scsi/vhost-user-scsi.c   |  3 ++-
> >> hw/virtio/vhost-user-base.c |  3 ++-
> >> hw/virtio/vhost-user.c  | 10 +-
> >> 4 files changed, 7 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >> index 41d1ac3a5a..c6842ced48 100644
> >> --- a/hw/block/vhost-user-blk.c
> >> +++ b/hw/block/vhost-user-blk.c
> >> @@ -353,7 +353,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >> VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>
> >> if (!s->connected) {
> >> -return;
> >> +goto done;
> >> }
> >> s->connected = false;
> >>
> >> @@ -361,6 +361,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >>
> >> vhost_dev_cleanup(&s->dev);
> >>
> >> +done:
> >> /* Re-instate the event handler for new connections */
> >> qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> >>  NULL, dev, NULL, true);
> >> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> >> index 48a59e020e..b49a11d23b 100644
> >> --- a/hw/scsi/vhost-user-scsi.c
> >> +++ b/hw/scsi/vhost-user-scsi.c
> >> @@ -181,7 +181,7 @@ static void vhost_user_scsi_disconnect(DeviceState 
> >> *dev)
> >> VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> >>
> >> if (!s->connected) {
> >> -

Re: [PATCH v3 2/2] vhost-user: fix lost reconnect again

2024-05-15 Thread Raphael Norwitz

OK - I'm happy with this approach then.

On Wed, May 15, 2024 at 10:48 PM Li Feng  wrote:
>
>
>
> 2024年5月15日 23:47，Raphael Norwitz  写道：
>
> The case your describing makes sense but now I have some concerns on
> the vhost_dev_cleanup bit.
>
> On Wed, May 15, 2024 at 1:47 AM Li Feng  wrote:
>
>
>
>
> 2024年5月14日 21:58，Raphael Norwitz  写道：
>
> Code looks good. Just a question on the error case you're trying to fix.
>
> On Tue, May 14, 2024 at 2:12 AM Li Feng  wrote:
>
>
> When the vhost-user is reconnecting to the backend, and if the vhost-user 
> fails
> at the get_features in vhost_dev_init(), then the reconnect will fail
> and it will not be retriggered forever.
>
> The reason is:
> When the vhost-user fail at get_features, the vhost_dev_cleanup will be called
> immediately.
>
> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>
> The reconnect path is:
> vhost_user_blk_event
>  vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>qemu_chr_fe_set_handlers <- clear the notifier callback
>  schedule vhost_user_async_close_bh
>
> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
> called, then the event fd callback will not be reinstalled.
>
> With this patch, the vhost_user_blk_disconnect will call the
> vhost_dev_cleanup() again, it's safe.
>
> In addition, the CLOSE event may occur in a scenario where connected is false.
> At this time, the event handler will be cleared. We need to ensure that the
> event handler can remain installed.
>
>
> Following on from the prior patch, why would "connected" be false when
> a CLOSE event happens?
>
>
> In OPEN event handling, vhost_user_blk_connect calls vhost_dev_init and 
> encounters
> an error such that s->connected remains false.
> Next, after the CLOSE event arrives, it is found that s->connected is false, 
> so nothing
> is done, but the event handler will be cleaned up in `vhost_user_async_close`
> before the CLOSE event is executed.
>
>
> Got it - I see why the event handler is never re-installed in the code
> as it was before if we fail at get_features. That said, how do you
> explain your comment:
>
>
> OK, I will update the commit message because this code has changed some 
> months ago.
>
>
> With this patch, the vhost_user_blk_disconnect will call the
> vhost_dev_cleanup() again, it's safe.
>
>
> I see vhost_dev_cleanup() accessing hdev without even a NULL check. In
> the case we're talking about here I don't think it's a problem because
> if vhost_dev_init() fails, connected will be false and hit the goto
> but I am concerned that there could be double-frees or use-after-frees
> in other cases.
>
>
> OK, you are right, with this patch, the vhost_dev_cleanup will not be
> called multiple times now.
>
> I think there is no need to worry about calling vhost_dev_cleanup multiple 
> times,
> because historically vhost_dev_cleanup has been allowed to be called multiple
> times, and looking at the code, it can be found that calling vhost_dev_cleanup
> multiple times is indeed safe.
>
> Look this patch:
>
> commit e0547b59dc0ead4c605d3f02d1c8829630a1311b
> Author: Marc-André Lureau 
> Date:   Wed Jul 27 01:15:02 2016 +0400
>
> vhost: make vhost_dev_cleanup() idempotent
>
> It is called on multiple code path, so make it safe to call several
> times (note: I don't remember a reproducer here, but a function called
> 'cleanup' should probably be idempotent in my book)
>
> Signed-off-by: Marc-André Lureau 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
>
> Thanks,
> Li
>
>
> Thanks,
> Li
>
>
>
> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>
> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>
> Signed-off-by: Li Feng 
> ---
> hw/block/vhost-user-blk.c   |  3 ++-
> hw/scsi/vhost-user-scsi.c   |  3 ++-
> hw/virtio/vhost-user-base.c |  3 ++-
> hw/virtio/vhost-user.c  | 10 +-
> 4 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 41d1ac3a5a..c6842ced48 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -353,7 +353,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>VHostUserBlk *s = VHOST_USER_BLK(vdev);
>
>if (!s->connected) {
> -return;
> +goto done;
>}
>s->connected = false;
>
> @@ -361,6 +361,7 @@ static void vhost_user_blk_disconnect(DeviceSta

Re: [PATCH v4 1/2] Revert "vhost-user: fix lost reconnect"

2024-05-15 Thread Raphael Norwitz

On Wed, May 15, 2024 at 10:58 PM Li Feng  wrote:
>
> This reverts commit f02a4b8e6431598612466f76aac64ab492849abf.
>
> Since the current patch cannot completely fix the lost reconnect
> problem, there is a scenario that is not considered:
> - When the virtio-blk driver is removed from the guest os,
>   s->connected has no chance to be set to false, resulting in
>   subsequent reconnection not being executed.
>
> The next patch will completely fix this issue with a better approach.
>

Reviewed-by: Raphael Norwitz 

> Signed-off-by: Li Feng 
> ---
>  hw/block/vhost-user-blk.c  |  2 +-
>  hw/scsi/vhost-user-scsi.c  |  3 +--
>  hw/virtio/vhost-user-base.c|  2 +-
>  hw/virtio/vhost-user.c | 10 ++
>  include/hw/virtio/vhost-user.h |  3 +--
>  5 files changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9e6bbc6950..41d1ac3a5a 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -384,7 +384,7 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &s->chardev, &s->dev,
> -   vhost_user_blk_disconnect, 
> vhost_user_blk_event);
> +   vhost_user_blk_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..48a59e020e 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -214,8 +214,7 @@ static void vhost_user_scsi_event(void *opaque, 
> QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev,
> -   vhost_user_scsi_disconnect,
> -   vhost_user_scsi_event);
> +   vhost_user_scsi_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
> index a83167191e..4b54255682 100644
> --- a/hw/virtio/vhost-user-base.c
> +++ b/hw/virtio/vhost-user-base.c
> @@ -254,7 +254,7 @@ static void vub_event(void *opaque, QEMUChrEvent event)
>  case CHR_EVENT_CLOSED:
>  /* defer close until later to avoid circular close */
>  vhost_user_async_close(dev, &vub->chardev, &vub->vhost_dev,
> -   vub_disconnect, vub_event);
> +   vub_disconnect);
>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..c929097e87 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -2776,7 +2776,6 @@ typedef struct {
>  DeviceState *dev;
>  CharBackend *cd;
>  struct vhost_dev *vhost;
> -IOEventHandler *event_cb;
>  } VhostAsyncCallback;
>
>  static void vhost_user_async_close_bh(void *opaque)
> @@ -2791,10 +2790,7 @@ static void vhost_user_async_close_bh(void *opaque)
>   */
>  if (vhost->vdev) {
>  data->cb(data->dev);
> -} else if (data->event_cb) {
> -qemu_chr_fe_set_handlers(data->cd, NULL, NULL, data->event_cb,
> - NULL, data->dev, NULL, true);
> -   }
> +}
>
>  g_free(data);
>  }
> @@ -2806,8 +2802,7 @@ static void vhost_user_async_close_bh(void *opaque)
>   */
>  void vhost_user_async_close(DeviceState *d,
>  CharBackend *chardev, struct vhost_dev *vhost,
> -vu_async_close_fn cb,
> -IOEventHandler *event_cb)
> +vu_async_close_fn cb)
>  {
>  if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>  /*
> @@ -2823,7 +2818,6 @@ void vhost_user_async_close(DeviceState *d,
>  data->dev = d;
>  data->cd = chardev;
>  data->vhost = vhost;
> -data->event_cb = event_cb;
>
>  /* Disable any further notifications on the chardev */
>  qemu_chr_fe_set_handlers(chardev,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index d7c09ffd34..324cd8663a 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -108,7 +108,6 @@ typedef void (*vu_async_close_fn)(DeviceState *cb);
>
>  void vhost_user_async_close(DeviceState *d,
>  CharBackend *chardev, struct vhost_dev *vhost,
> -vu_async_close_fn cb,
> -IOEventHandler *event_cb);
> +vu_async_close_fn cb);
>
>  #endif
> --
> 2.45.0
>

1 2 3 4 >

1 - 100 of 317 matches

Mail list logo