Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread David Woodhouse
On Tue, 2022-02-22 at 01:31 -0500, Michael S. Tsirkin wrote:
> On Mon, Feb 21, 2022 at 05:18:48PM +, David Woodhouse wrote:
> > 
> > [dwoodhou@i7 virtio]$ sudo ~/virtio_test
> > Detected virtual address range 0x1000-0x7000
> > spurious wakeups: 0x0 started=0x10 completed=0x10
> > 
> > Although in some circumstances I also see a different build failure:
> > 
> > cc -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
> > ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow 
> > -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include 
> > ../../include/linux/kconfig.h   -c -o vringh_test.o vringh_test.c
> > In file included from ./linux/uio.h:3,
> >  from ./linux/../../../include/linux/vringh.h:15,
> >  from ./linux/vringh.h:1,
> >  from vringh_test.c:9:
> > ./linux/../../../include/linux/uio.h:10:10: fatal error: linux/mm_types.h: 
> > No such file or directory
> >10 | #include 
> >   |  ^~
> > compilation terminated.
> > make: *** [: vringh_test.o] Error 1
> 
> Which tree has this build failure? In mine linux/uio.h does not
> include linux/mm_types.h.

Strictly it's
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xen-evtchn-kernel
but I'm sure my part isn't relevant; it's just v5.17-rc5.

 $ git blame include/linux/uio.h | grep mm_types.h
d9c19d32d86fa (Matthew Wilcox (Oracle) 2021-10-18 10:39:06 -0400  10) #include 

 $ git describe --tags d9c19d32d86fa
v5.16-rc4-37-gd9c19d32d86f


smime.p7s
Description: S/MIME cryptographic signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 2/4] vdpa: Allow for printing negotiated features of a device

2022-02-21 Thread Jason Wang
On Mon, Feb 21, 2022 at 9:18 PM Eli Cohen  wrote:
>
> When reading the configuration of a vdpa device, check if the
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES is available. If it is, parse the
> feature bits and print a string representation of each of the feature
> bits.
>
> We keep the strings in two different arrays. One for net device related
> devices and one for generic feature bits.
>
> In this patch we parse only net device specific features. Support for
> other devices can be added later. If the device queried is not a net
> device, we print its bit number only.
>
> Examples:
> 1. Standard presentation
> $ vdpa dev config show vdpa-a
> vdpa-a: mac 00:00:00:00:88:88 link up link_announce false max_vq_pairs 2 mtu 
> 9000
>   negotiated_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS \
> CTRL_VQ MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM
>
> 2. json output
> $ vdpa -j dev config show vdpa-a
> {"config":{"vdpa-a":{"mac":"00:00:00:00:88:88","link":"up","link_announce":false,\
> "max_vq_pairs":2,"mtu":9000,"negotiated_features":["CSUM","GUEST_CSUM",\
> "MTU","MAC","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ","MQ","CTRL_MAC_ADDR",\
> "VERSION_1","ACCESS_PLATFORM"]}}}
>
> 3. Pretty json
> $ vdpa -jp dev config show vdpa-a
> {
> "config": {
> "vdpa-a": {
> "mac": "00:00:00:00:88:88",
> "link ": "up",
> "link_announce ": false,
> "max_vq_pairs": 2,
> "mtu": 9000,
> "negotiated_features": [
> "CSUM","GUEST_CSUM","MTU","MAC","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ",\
> "MQ","CTRL_MAC_ADDR","VERSION_1","ACCESS_PLATFORM" ]
> }
> }
> }
>
> Reviewed-by: Si-Wei Liu
> Signed-off-by: Eli Cohen 

Acked-by: Jason Wang 

> ---
>  vdpa/include/uapi/linux/vdpa.h |   2 +
>  vdpa/vdpa.c| 107 -
>  2 files changed, 107 insertions(+), 2 deletions(-)
>
> diff --git a/vdpa/include/uapi/linux/vdpa.h b/vdpa/include/uapi/linux/vdpa.h
> index b7eab069988a..748c350450b2 100644
> --- a/vdpa/include/uapi/linux/vdpa.h
> +++ b/vdpa/include/uapi/linux/vdpa.h
> @@ -40,6 +40,8 @@ enum vdpa_attr {
> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,  /* u16 */
> VDPA_ATTR_DEV_NET_CFG_MTU,  /* u16 */
>
> +   VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
> +
> /* new attributes must be added above here */
> VDPA_ATTR_MAX,
>  };
> diff --git a/vdpa/vdpa.c b/vdpa/vdpa.c
> index 4ccb564872a0..5f1aa91a4b96 100644
> --- a/vdpa/vdpa.c
> +++ b/vdpa/vdpa.c
> @@ -10,6 +10,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "mnl_utils.h"
>  #include 
>
> @@ -78,6 +80,7 @@ static const enum mnl_attr_data_type 
> vdpa_policy[VDPA_ATTR_MAX + 1] = {
> [VDPA_ATTR_DEV_VENDOR_ID] = MNL_TYPE_U32,
> [VDPA_ATTR_DEV_MAX_VQS] = MNL_TYPE_U32,
> [VDPA_ATTR_DEV_MAX_VQ_SIZE] = MNL_TYPE_U16,
> +   [VDPA_ATTR_DEV_NEGOTIATED_FEATURES] = MNL_TYPE_U64,
>  };
>
>  static int attr_cb(const struct nlattr *attr, void *data)
> @@ -385,6 +388,96 @@ static const char *parse_class(int num)
> return class ? class : "< unknown class >";
>  }
>
> +static const char * const net_feature_strs[64] = {
> +   [VIRTIO_NET_F_CSUM] = "CSUM",
> +   [VIRTIO_NET_F_GUEST_CSUM] = "GUEST_CSUM",
> +   [VIRTIO_NET_F_CTRL_GUEST_OFFLOADS] = "CTRL_GUEST_OFFLOADS",
> +   [VIRTIO_NET_F_MTU] = "MTU",
> +   [VIRTIO_NET_F_MAC] = "MAC",
> +   [VIRTIO_NET_F_GUEST_TSO4] = "GUEST_TSO4",
> +   [VIRTIO_NET_F_GUEST_TSO6] = "GUEST_TSO6",
> +   [VIRTIO_NET_F_GUEST_ECN] = "GUEST_ECN",
> +   [VIRTIO_NET_F_GUEST_UFO] = "GUEST_UFO",
> +   [VIRTIO_NET_F_HOST_TSO4] = "HOST_TSO4",
> +   [VIRTIO_NET_F_HOST_TSO6] = "HOST_TSO6",
> +   [VIRTIO_NET_F_HOST_ECN] = "HOST_ECN",
> +   [VIRTIO_NET_F_HOST_UFO] = "HOST_UFO",
> +   [VIRTIO_NET_F_MRG_RXBUF] = "MRG_RXBUF",
> +   [VIRTIO_NET_F_STATUS] = "STATUS",
> +   [VIRTIO_NET_F_CTRL_VQ] = "CTRL_VQ",
> +   [VIRTIO_NET_F_CTRL_RX] = "CTRL_RX",
> +   [VIRTIO_NET_F_CTRL_VLAN] = "CTRL_VLAN",
> +   [VIRTIO_NET_F_CTRL_RX_EXTRA] = "CTRL_RX_EXTRA",
> +   [VIRTIO_NET_F_GUEST_ANNOUNCE] = "GUEST_ANNOUNCE",
> +   [VIRTIO_NET_F_MQ] = "MQ",
> +   [VIRTIO_F_NOTIFY_ON_EMPTY] = "NOTIFY_ON_EMPTY",
> +   [VIRTIO_NET_F_CTRL_MAC_ADDR] = "CTRL_MAC_ADDR",
> +   [VIRTIO_F_ANY_LAYOUT] = "ANY_LAYOUT",
> +   [VIRTIO_NET_F_RSC_EXT] = "RSC_EXT",
> +   [VIRTIO_NET_F_HASH_REPORT] = "HASH_REPORT",
> +   [VIRTIO_NET_F_RSS] = "RSS",
> +   [VIRTIO_NET_F_STANDBY] = "STANDBY",
> +   [VIRTIO_NET_F_SPEED_DUPLEX] = "SPEED_DUPLEX",
> +};
> +
> +#define VIRTIO_F_IN_ORDER 35
> +#define VIRTIO_F_NOTIFICATION_DATA 38
> +#define VDPA_EXT_FEATURES_SZ (VIRTIO_TRANSPORT_F_END - \
> + VIRTIO_TRANSPORT_F_START + 1)
> +
> +static const char * const ext_feature_strs[VDPA_EXT_FEATURES_SZ] = {
> +   [VIRTIO_RING_F_INDIRECT_DESC - 

Re: [PATCH v3 4/4] vdpa: Support reading device features

2022-02-21 Thread Jason Wang
On Mon, Feb 21, 2022 at 9:18 PM Eli Cohen  wrote:
>
> When showing the available management devices, check if
> VDPA_ATTR_DEV_SUPPORTED_FEATURES feature is available and print the
> supported features for a management device.
>
> Examples:
> $ vdpa mgmtdev show
> auxiliary/mlx5_core.sf.1:
>   supported_classes net
>   max_supported_vqs 257
>   dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ MQ \
>CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM
>
> $ vdpa -jp mgmtdev show
> {
> "mgmtdev": {
> "auxiliary/mlx5_core.sf.1": {
> "supported_classes": [ "net" ],
> "max_supported_vqs": 257,
> "dev_features": [
> "CSUM","GUEST_CSUM","MTU","HOST_TSO4","HOST_TSO6","STATUS","CTRL_VQ","MQ",\
> "CTRL_MAC_ADDR","VERSION_1","ACCESS_PLATFORM" ]
> }
> }
> }
>
> Reviewed-by: Si-Wei Liu 
> Signed-off-by: Eli Cohen 

Acked-by: Jason Wang 

> ---
>  vdpa/include/uapi/linux/vdpa.h |  1 +
>  vdpa/vdpa.c| 15 +--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/vdpa/include/uapi/linux/vdpa.h b/vdpa/include/uapi/linux/vdpa.h
> index a3ebf4d4d9b8..96ccbf305d14 100644
> --- a/vdpa/include/uapi/linux/vdpa.h
> +++ b/vdpa/include/uapi/linux/vdpa.h
> @@ -42,6 +42,7 @@ enum vdpa_attr {
>
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
> VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /* u32 */
> +   VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
>
> /* new attributes must be added above here */
> VDPA_ATTR_MAX,
> diff --git a/vdpa/vdpa.c b/vdpa/vdpa.c
> index 22064c755baa..bdc366880ab9 100644
> --- a/vdpa/vdpa.c
> +++ b/vdpa/vdpa.c
> @@ -84,6 +84,7 @@ static const enum mnl_attr_data_type 
> vdpa_policy[VDPA_ATTR_MAX + 1] = {
> [VDPA_ATTR_DEV_MAX_VQ_SIZE] = MNL_TYPE_U16,
> [VDPA_ATTR_DEV_NEGOTIATED_FEATURES] = MNL_TYPE_U64,
> [VDPA_ATTR_DEV_MGMTDEV_MAX_VQS] = MNL_TYPE_U32,
> +   [VDPA_ATTR_DEV_SUPPORTED_FEATURES] = MNL_TYPE_U64,
>  };
>
>  static int attr_cb(const struct nlattr *attr, void *data)
> @@ -494,14 +495,14 @@ static void print_features(struct vdpa *vdpa, uint64_t 
> features, bool mgmtdevf,
>  static void pr_out_mgmtdev_show(struct vdpa *vdpa, const struct nlmsghdr 
> *nlh,
> struct nlattr **tb)
>  {
> +   uint64_t classes = 0;
> const char *class;
> unsigned int i;
>
> pr_out_handle_start(vdpa, tb);
>
> if (tb[VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES]) {
> -   uint64_t classes = 
> mnl_attr_get_u64(tb[VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES]);
> -
> +   classes = 
> mnl_attr_get_u64(tb[VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES]);
> pr_out_array_start(vdpa, "supported_classes");
>
> for (i = 1; i < 64; i++) {
> @@ -523,6 +524,16 @@ static void pr_out_mgmtdev_show(struct vdpa *vdpa, const 
> struct nlmsghdr *nlh,
> print_uint(PRINT_ANY, "max_supported_vqs", "  
> max_supported_vqs %d", num_vqs);
> }
>
> +   if (tb[VDPA_ATTR_DEV_SUPPORTED_FEATURES]) {
> +   uint64_t features;
> +
> +   features  = 
> mnl_attr_get_u64(tb[VDPA_ATTR_DEV_SUPPORTED_FEATURES]);
> +   if (classes & BIT(VIRTIO_ID_NET))
> +   print_features(vdpa, features, true, VIRTIO_ID_NET);
> +   else
> +   print_features(vdpa, features, true, 0);
> +   }
> +
> pr_out_handle_end(vdpa);
>  }
>
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ

2022-02-21 Thread Jason Wang


在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:

On Thu, Feb 17, 2022 at 7:02 AM Jason Wang  wrote:

On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
 wrote:

On Tue, Feb 8, 2022 at 9:25 AM Jason Wang  wrote:


在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 7:50 AM Jason Wang  wrote:

在 2022/1/22 上午4:27, Eugenio Pérez 写道:

SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.

Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.

The log region is still allocated. Future changes might skip that, but
this series is already long enough.

Signed-off-by: Eugenio Pérez 
---
hw/virtio/vhost-vdpa.c | 20 
1 file changed, 20 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index fb0a338baa..75090d65e8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, 
uint64_t *features)
if (ret == 0 && v->shadow_vqs_enabled) {
/* Filter only features that SVQ can offer to guest */
vhost_svq_valid_guest_features(features);
+
+/* Add SVQ logging capabilities */
+*features |= BIT_ULL(VHOST_F_LOG_ALL);
}

return ret;
@@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,

if (v->shadow_vqs_enabled) {
uint64_t dev_features, svq_features, acked_features;
+uint8_t status = 0;
bool ok;

+ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, );
+if (unlikely(ret)) {
+return ret;
+}
+
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+/*
+ * vhost is trying to enable or disable _F_LOG, and the device
+ * would report wrong dirty pages. SVQ handles it.
+ */

I fail to understand this comment, I'd think there's no way to disable
dirty page tracking for SVQ.


vhost_log_global_{start,stop} are called at the beginning and end of
migration. To inform the device that it should start logging, they set
or clean VHOST_F_LOG_ALL at vhost_dev_set_log.


Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
only thing is to ignore or filter out the F_LOG_ALL and pretend to be
enabled and disabled.


Yes, that's what this patch does.


While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
vhost does not block migration. Maybe we need to look for another way
to do this?


I'm fine with filtering since it's much more simpler, but I fail to
understand why we need to check DRIVER_OK.


Ok maybe I can make that part more clear,

Since both operations use vhost_vdpa_set_features we must just filter
the one that actually sets or removes VHOST_F_LOG_ALL, without
affecting other features.

In practice, that means to not forward the set features after
DRIVER_OK. The device is not expecting them anymore.

I wonder what happens if we don't do this.


If we simply delete the check vhost_dev_set_features will return an
error, failing the start of the migration. More on this below.



Ok.





So kernel had this check:

 /*
  * It's not allowed to change the features after they have
  * been negotiated.
  */
if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
 return -EBUSY;

So is it FEATURES_OK actually?


Yes, FEATURES_OK seems more appropriate actually so I will switch to
it for the next version.

But it should be functionally equivalent, since
vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
be concurrent with it.



Right.





For this patch, I wonder if the thing we need to do is to see whether
it is a enable/disable F_LOG_ALL and simply return.


Yes, that's the intention of the patch.

We have 4 cases here:
a) We're being called from vhost_dev_start, with enable_log = false
b) We're being called from vhost_dev_start, with enable_log = true



And this case makes us can't simply return without calling vhost-vdpa.



c) We're being called from vhost_dev_set_log, with enable_log = false
d) We're being called from vhost_dev_set_log, with enable_log = true

The way to tell the difference between a/b and c/d is to check if
{FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
memory through the memory unmapping, so we clear the bit
unconditionally if we detect that VHOST_SET_FEATURES will be called
(cases a and b).

Another possibility is to track if features have been set with a bool
in vhost_vdpa or something like that. But it seems cleaner to me to
only store that in the actual device.



So I suggest to make sure codes match the comment:

    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
    /*
 * vhost is trying to enable 

Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding

2022-02-21 Thread Jason Wang


在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:

On Mon, Feb 21, 2022 at 8:44 AM Jason Wang  wrote:


在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:

On Tue, Feb 8, 2022 at 9:16 AM Jason Wang  wrote:

在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:

On Sun, Jan 30, 2022 at 7:47 AM Jason Wang  wrote:

在 2022/1/22 上午4:27, Eugenio Pérez 写道:

@@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, 
int svq_kick_fd)
 void vhost_svq_stop(VhostShadowVirtqueue *svq)
 {
 event_notifier_set_handler(>svq_kick, NULL);
+g_autofree VirtQueueElement *next_avail_elem = NULL;
+
+if (!svq->vq) {
+return;
+}
+
+/* Send all pending used descriptors to guest */
+vhost_svq_flush(svq, false);

Do we need to wait for all the pending descriptors to be completed here?


No, this function does not wait, it only completes the forwarding of
the *used* descriptors.

The best example is the net rx queue in my opinion. This call will
check SVQ's vring used_idx and will forward the last used descriptors
if any, but all available descriptors will remain as available for
qemu's VQ code.

To skip it would miss those last rx descriptors in migration.

Thanks!

So it's probably to not the best place to ask. It's more about the
inflight descriptors so it should be TX instead of RX.

I can imagine the migration last phase, we should stop the vhost-vDPA
before calling vhost_svq_stop(). Then we should be fine regardless of
inflight descriptors.


I think I'm still missing something here.

To be on the same page. Regarding tx this could cause repeated tx
frames (one at source and other at destination), but never a missed
buffer not transmitted. The "stop before" could be interpreted as "SVQ
is not forwarding available buffers anymore". Would that work?


Right, but this only work if

1) a flush to make sure TX DMA for inflight descriptors are all completed

2) just mark all inflight descriptor used


It currently trusts on the reverse: Buffers not marked as used (by the
device) will be available in the destination, so expect
retransmissions.



I may miss something but I think we do migrate last_avail_idx. So there 
won't be a re-transmission, since we depend on qemu virtqueue code to 
deal with vring base?


Thanks




Thanks!


Otherwise there could be buffers that is inflight forever.

Thanks



Thanks!


Thanks



Thanks



+
+for (unsigned i = 0; i < svq->vring.num; ++i) {
+g_autofree VirtQueueElement *elem = NULL;
+elem = g_steal_pointer(>ring_id_maps[i]);
+if (elem) {
+virtqueue_detach_element(svq->vq, elem, elem->len);
+}
+}
+
+next_avail_elem = g_steal_pointer(>next_guest_avail_elem);
+if (next_avail_elem) {
+virtqueue_detach_element(svq->vq, next_avail_elem,
+ next_avail_elem->len);
+}
 }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call

2022-02-21 Thread Jason Wang


在 2022/2/21 下午4:01, Eugenio Perez Martin 写道:

On Mon, Feb 21, 2022 at 8:39 AM Jason Wang  wrote:


在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:

On Tue, Feb 8, 2022 at 4:23 AM Jason Wang  wrote:

在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:

On Sat, Jan 29, 2022 at 9:06 AM Jason Wang  wrote:

在 2022/1/22 上午4:27, Eugenio Pérez 写道:

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 18de14f0fb..029f98feee 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev 
*dev,
 }
 }

-static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
-   struct vhost_vring_file *file)
+static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
+ struct vhost_vring_file *file)
 {
 trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
 return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
 }

+static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
+ struct vhost_vring_file *file)
+{
+struct vhost_vdpa *v = dev->opaque;
+
+if (v->shadow_vqs_enabled) {
+int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
+
+vhost_svq_set_guest_call_notifier(svq, file->fd);

Two questions here (had similar questions for vring kick):

1) Any reason that we setup the eventfd for vhost-vdpa in
vhost_vdpa_svq_setup() not here?


I'm not sure what you mean.

The guest->SVQ call and kick fds are set here and at
vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
notifier handler since we don't poll it.

On the other hand, the connection SVQ <-> device uses the same fds
from the beginning to the end, and they will not change with, for
example, call fd masking. That's why it's setup from
vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
us add way more logic there.

More logic in general shadow vq code but less codes for vhost-vdpa
specific code I think.

E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
here.


But they are different fds. vhost_vdpa_svq_set_fds sets the
SVQ<->device. This function sets the SVQ->guest call file descriptor.

To move the logic of vhost_vdpa_svq_set_fds here would imply either:
a) Logic to know if we are receiving the first call fd or not.


Any reason for this? I guess you meant multiqueue. If yes, it should not
be much difference since we have idx as the parameter.


With "first call fd" I meant "first time we receive the call fd", so
we only set them once.

I think this is going to be easier if I prepare a patch doing your way
and we comment on it.



That would be helpful but if there's no issue with current code (see 
below), we can leave it as is and do optimization on top.






   That
code is not in the series at the moment, because setting at
vhost_vdpa_dev_start tells the difference for free. Is just adding
code, not moving.
b) Logic to set again *the same* file descriptor to device, with logic
to tell if we have missed calls. That logic is not implemented for
device->SVQ call file descriptor, because we are assuming it never
changes from vhost_vdpa_svq_set_fds. So this is again adding code.

At this moment, we have:
vhost_vdpa_svq_set_fds:
set SVQ<->device fds

vhost_vdpa_set_vring_call:
set guest<-SVQ call

vhost_vdpa_set_vring_kick:
set guest->SVQ kick.

If I understood correctly, the alternative would be something like:
vhost_vdpa_set_vring_call:
set guest<-SVQ call
if(!vq->call_set) {
  - set SVQ<-device call.
  - vq->call_set = true
}

vhost_vdpa_set_vring_kick:
set guest<-SVQ call
if(!vq->dev_kick_set) {
  - set guest->device kick.
  - vq->dev_kick_set = true
}

dev_reset / dev_stop:
for vq in vqs:
vq->dev_kick_set = vq->dev_call_set = false
...

Or have I misunderstood something?


I wonder what happens if MSI-X is masking in guest. So if I understand
correctly, we don't disable the eventfd from device? If yes, this seems
suboptinal.


We cannot disable the device's call fd unless SVQ actively poll it. As
I see it, if the guest masks the call fd, it could be because:
a) it doesn't want to receive more calls because is processing buffers
b) Is going to burn a cpu to poll it.

The masking only affects SVQ->guest call. If we also mask device->SVQ,
we're adding latency in the case a), and we're effectively disabling
forwarding in case b).



Right, so we need leave a comment to explain this, then I'm totally fine 
with this approach.





It only works if guest is 

Re: [PATCH] vdpa/mlx5: Enlarge queue size to 512 entries

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 2:33 PM Eli Cohen  wrote:
>
> On Tue, Feb 22, 2022 at 11:18:14AM +0800, Jason Wang wrote:
> > On Tue, Feb 22, 2022 at 3:00 AM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Mon, Feb 21, 2022 at 2:20 PM Eli Cohen  wrote:
> > > >
> > > > Currently we can work with 512 entries with any MTU size. Change
> > > > MLX5_VDPA_MAX_VQ_ENTRIES accordingly.
> > > >
> > > > Signed-off-by: Eli Cohen 
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index be095dc1134e..de115e3fe761 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -2062,7 +2062,7 @@ static void mlx5_vdpa_set_config_cb(struct 
> > > > vdpa_device *vdev, struct vdpa_callba
> > > > ndev->config_cb = *cb;
> > > >  }
> > > >
> > > > -#define MLX5_VDPA_MAX_VQ_ENTRIES 256
> > > > +#define MLX5_VDPA_MAX_VQ_ENTRIES 512
> >
> > I'd expect this should be something that can be read from the fw or
> > hw? Otherwise we can't increase it in the future.
> >
>
> We do read all the required parameters from firmware in order to fulfill
> any valid value up to the spec defined limit of 32K entries. But, there
> are FW related issues that with 9K MTU we will start seeing firmware
> errors if we go beyond 512. So I thought it would be better to increase
> number of entries to a safe value till we have other issues resolved.

I see.

So

Acked-by: Jason Wang 

>
> > Thanks
> >
> > > >  static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
> > > >  {
> > > > return MLX5_VDPA_MAX_VQ_ENTRIES;
> > > > --
> > > > 2.35.1
> > > >
> > >
> > > Acked-by: Eugenio Pérez 
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam  wrote:
>
> On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam  
> > wrote:
> > >
> > > In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> > > before proceeding with adding it to the iotlb.
> > >
> > > Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> > > One instance where it can happen is when userspace sends an IOTLB
> > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> > > entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > indefinitely due to that erroneous entry:
> > >
> > > Call Trace:
> > >  
> > >  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
> > >  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
> > >  vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
> > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > >  
> > >
> > > Reported by syzbot at:
> > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > >
> > > Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > Signed-off-by: Anirudh Rayabharam 
> > > ---
> > >  drivers/vhost/iotlb.c | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > index 670d56c879e5..b9de74bd2f9c 100644
> > > --- a/drivers/vhost/iotlb.c
> > > +++ b/drivers/vhost/iotlb.c
> > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb 
> > > *iotlb,
> > >   void *opaque)
> > >  {
> > > struct vhost_iotlb_map *map;
> > > +   u64 size = last - start + 1;
> > >
> > > -   if (last < start)
> > > +   // size can overflow to 0 when start is 0 and last is (2^64 - 1).
> > > +   if (last < start || size == 0)
> > > return -EFAULT;
> >
> > I'd move this check to vhost_chr_iter_write(), then for the device who
> > has its own msg handler (e.g vDPA) can benefit from it as well.
>
> Thanks for reviewing!
>
> I kept the check here thinking that all devices would benefit from it
> because they would need to call vhost_iotlb_add_range() to add an entry
> to the iotlb. Isn't that correct?

Correct for now but not for the future, it's not guaranteed that the
per device iotlb message handler will use vhost iotlb.

But I agree that we probably don't need to care about it too much now.

> Do you see any other benefit in moving
> it to vhost_chr_iter_write()?
>
> One concern I have is that if we move it out some future caller to
> vhost_iotlb_add_range() might forget to handle this case.

Yes.

Rethink the whole fix, we're basically rejecting [0, ULONG_MAX] range
which seems a little bit odd. I wonder if it's better to just remove
the map->size. Having a quick glance at the the user, I don't see any
blocker for this.

Thanks

>
> Thanks!
>
> - Anirudh.
>
> >
> > Thanks
> >
> > >
> > > if (iotlb->limit &&
> > > @@ -69,7 +71,7 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
> > > return -ENOMEM;
> > >
> > > map->start = start;
> > > -   map->size = last - start + 1;
> > > +   map->size = size;
> > > map->last = last;
> > > map->addr = addr;
> > > map->perm = perm;
> > > --
> > > 2.35.1
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread Michael S. Tsirkin
On Mon, Feb 21, 2022 at 05:18:48PM +, David Woodhouse wrote:
> On Mon, 2022-02-21 at 18:02 +0100, Stefano Garzarella wrote:
> > On Mon, Feb 21, 2022 at 04:15:22PM +, David Woodhouse wrote:
> > > As things stand, an application which wants to use vhost with a trivial
> > > 1:1 mapping of its virtual address space is forced to jump through hoops
> > > to detect what the address range might be. The VHOST_SET_MEM_TABLE ioctl
> > > helpfully doesn't fail immediately; you only get a failure *later* when
> > > you attempt to set the backend, if the table *could* map to an address
> > > which is out of range, even if no out-of-range address is actually
> > > being referenced.
> > > 
> > > Since userspace is growing workarounds for this lovely kernel API, let's
> > > ensure that we have a regression test that does things basically the same
> > > way as 
> > > https://gitlab.com/openconnect/openconnect/-/commit/443edd9d8826
> > > 
> > > does.
> > > 
> > > This is untested as I can't actually get virtio_test to work at all; it
> > > just seems to deadlock on a spinlock. But it's getting the right answer
> > > for the virtio range on x86_64 at least.
> > 
> > I had a similar issue with virtio_test and this simple patch [1] should 
> > fix the deadlock.
> > 
> > [1] 
> > https://lore.kernel.org/lkml/20220118150631.167015-1-sgarz...@redhat.com/=
> 
> Thanks.
> 
> [dwoodhou@i7 virtio]$ sudo ~/virtio_test
> Detected virtual address range 0x1000-0x7000
> spurious wakeups: 0x0 started=0x10 completed=0x10
> 
> Although in some circumstances I also see a different build failure:
> 
> cc -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
> ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow 
> -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include 
> ../../include/linux/kconfig.h   -c -o vringh_test.o vringh_test.c
> In file included from ./linux/uio.h:3,
>  from ./linux/../../../include/linux/vringh.h:15,
>  from ./linux/vringh.h:1,
>  from vringh_test.c:9:
> ./linux/../../../include/linux/uio.h:10:10: fatal error: linux/mm_types.h: No 
> such file or directory
>10 | #include 
>   |  ^~
> compilation terminated.
> make: *** [: vringh_test.o] Error 1

Which tree has this build failure? In mine linux/uio.h does not
include linux/mm_types.h.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Dan Carpenter
Hi Stefano,

url:
https://github.com/0day-ci/linux/commits/Stefano-Garzarella/vhost-vsock-don-t-check-owner-in-vhost_vsock_stop-while-releasing/20220221-195038
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-m031-20220221 
(https://download.01.org/0day-ci/archive/20220222/202202220707.am3rkucp-...@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

smatch warnings:
drivers/vhost/vsock.c:655 vhost_vsock_stop() error: uninitialized symbol 'ret'.

vim +/ret +655 drivers/vhost/vsock.c

3ace84c91bfcde Stefano Garzarella 2022-02-21  632  static int 
vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
433fc58e6bf2c8 Asias He   2016-07-28  633  {
433fc58e6bf2c8 Asias He   2016-07-28  634   size_t i;
433fc58e6bf2c8 Asias He   2016-07-28  635   int ret;
433fc58e6bf2c8 Asias He   2016-07-28  636  
433fc58e6bf2c8 Asias He   2016-07-28  637   
mutex_lock(>dev.mutex);
433fc58e6bf2c8 Asias He   2016-07-28  638  
3ace84c91bfcde Stefano Garzarella 2022-02-21  639   if (check_owner) {
433fc58e6bf2c8 Asias He   2016-07-28  640   ret = 
vhost_dev_check_owner(>dev);
433fc58e6bf2c8 Asias He   2016-07-28  641   if (ret)
433fc58e6bf2c8 Asias He   2016-07-28  642   goto 
err;
3ace84c91bfcde Stefano Garzarella 2022-02-21  643   }

"ret" not initialized on else path.

433fc58e6bf2c8 Asias He   2016-07-28  644  
433fc58e6bf2c8 Asias He   2016-07-28  645   for (i = 0; i < 
ARRAY_SIZE(vsock->vqs); i++) {
433fc58e6bf2c8 Asias He   2016-07-28  646   struct 
vhost_virtqueue *vq = >vqs[i];
433fc58e6bf2c8 Asias He   2016-07-28  647  
433fc58e6bf2c8 Asias He   2016-07-28  648   
mutex_lock(>mutex);
247643f85782fc Eugenio Pérez  2020-03-31  649   
vhost_vq_set_backend(vq, NULL);
433fc58e6bf2c8 Asias He   2016-07-28  650   
mutex_unlock(>mutex);
433fc58e6bf2c8 Asias He   2016-07-28  651   }
433fc58e6bf2c8 Asias He   2016-07-28  652  
433fc58e6bf2c8 Asias He   2016-07-28  653  err:
433fc58e6bf2c8 Asias He   2016-07-28  654   
mutex_unlock(>dev.mutex);
433fc58e6bf2c8 Asias He   2016-07-28 @655   return ret;
433fc58e6bf2c8 Asias He   2016-07-28  656  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] net/mlx5: Add support for configuring max device MTU

2022-02-21 Thread Jason Wang
On Mon, Feb 21, 2022 at 8:19 PM Eli Cohen  wrote:
>
> Allow an admin creating a vdpa device to specify the max MTU for the
> net device.
>
> For example, to create a device with max MTU of 1000, the following
> command can be used:
>
> $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 mtu 1000
>
> This configuration mechanism assumes that vdpa is the sole real user of
> the function. mlx5_core could theoretically change the mtu of the
> function using the ip command on the mlx5_core net device but this
> should not be done.
>
> Reviewed-by: Si-Wei Liu
> Signed-off-by: Eli Cohen 

Acked-by: Jason Wang 

> ---
> v0 -> v1:
> Update changelog
>
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 32 ++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 6156cf6e9377..be095dc1134e 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2689,6 +2689,28 @@ static int event_handler(struct notifier_block *nb, 
> unsigned long event, void *p
> return ret;
>  }
>
> +static int config_func_mtu(struct mlx5_core_dev *mdev, u16 mtu)
> +{
> +   int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
> +   void *in;
> +   int err;
> +
> +   in = kvzalloc(inlen, GFP_KERNEL);
> +   if (!in)
> +   return -ENOMEM;
> +
> +   MLX5_SET(modify_nic_vport_context_in, in, field_select.mtu, 1);
> +   MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.mtu,
> +mtu + MLX5V_ETH_HARD_MTU);
> +   MLX5_SET(modify_nic_vport_context_in, in, opcode,
> +MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
> +
> +   err = mlx5_cmd_exec_in(mdev, modify_nic_vport_context, in);
> +
> +   kvfree(in);
> +   return err;
> +}
> +
>  static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>  const struct vdpa_dev_set_config *add_config)
>  {
> @@ -2749,6 +2771,13 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
> mutex_init(>reslock);
> mutex_init(>numq_lock);
> config = >config;
> +
> +   if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) {
> +   err = config_func_mtu(mdev, add_config->net.mtu);
> +   if (err)
> +   goto err_mtu;
> +   }
> +
> err = query_mtu(mdev, );
> if (err)
> goto err_mtu;
> @@ -2867,7 +2896,8 @@ static int mlx5v_probe(struct auxiliary_device *adev,
> mgtdev->mgtdev.device = mdev->device;
> mgtdev->mgtdev.id_table = id_table;
> mgtdev->mgtdev.config_attr_mask = 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) |
> - 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> + 
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) |
> + BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU);
> mgtdev->mgtdev.max_supported_vqs =
> MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1;
> mgtdev->mgtdev.supported_features = get_supported_features(mdev);
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-21 Thread Jason Wang


在 2022/2/21 下午7:23, Xuan Zhuo 写道:

On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang  wrote:

On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo  wrote:

On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang  wrote:

On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo  wrote:

virtqueue_add() only supports virtual addresses, dma is completed in
virtqueue_add().

In some scenarios (such as the AF_XDP scenario), DMA is completed in advance, so
it is necessary for us to support passing the DMA address to virtqueue_add().

I'd suggest rename this feature as "unmanaged DMA".

OK


Record this predma information in extra->flags, which can be skipped when
executing dma unmap.

Question still, can we use per-virtqueue flag instead of per
descriptor flag? If my memory is correct, the answer is yes in the
discussion for the previous version.


Yes.

per-virtqueue? I guess it should be per-submit.

This patch set only adds a flag to desc_extra[head].flags, so that we can know
if we need to unmap dma when we detach.

I meant if we can manage to make it per virtqueue, there's no need to
maintain per buffer flag.


Rethinking this question, I feel there is no essential difference between per
virtqueue and per sgs.

per virtqueue:
1. add buf:
a. check vq->premapped for map every sg
2. detach:
a. check vq->premaped for unmap

per sgs:
1. add buf:
a. check function parameter "premapped" for map every sg
b. add flag to extra[head].flag

2. detach:
a: check extra[head].flag for unmap


Thanks.



Per-virtqueue is still a little bit easier at the first glance.

Actually, per-sg have one advantage: it can be used without virtqueue 
reset (to allow switching between the two modes). But I'm not sure 
whether we had such requirements.


I think to answer this question, we probably need a real use case (if we 
can come up with a case that is more lightweight than AF_XDP, that would 
be even better).


Thanks






So we know something that needs to be mapped by virtio core itself,
e.g the indirect page. Other than this, all the rest could be
pre-mapped.

For vnet header, it could be mapped by virtio-net which could be still
treated as pre mapped DMA since it's not the virtio ring code.

Anything I miss here?

Thanks



Thanks.


Thanks


v1:
1. All sgs requested at one time are required to be unified PREDMA, and 
several
   of them are not supported to be PREDMA
2. virtio_dma_map() is removed from this patch set and will be submitted
   together with the next time AF_XDP supports virtio dma
3. Added patch #2 #3 to remove the check for flags when performing unmap
   indirect desc

Xuan Zhuo (6):
   virtio: rename vring_unmap_state_packed() to
 vring_unmap_extra_packed()
   virtio: remove flags check for unmap split indirect desc
   virtio: remove flags check for unmap packed indirect desc
   virtio: virtqueue_add() support predma
   virtio: split: virtqueue_add_split() support dma address
   virtio: packed: virtqueue_add_packed() support dma address

  drivers/virtio/virtio_ring.c | 199 ++-
  1 file changed, 126 insertions(+), 73 deletions(-)

--
2.31.0



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] virtio: Add definition for VIRTIO_F_NOTIFICATION_DATA feature flag

2022-02-21 Thread Jason Wang
On Mon, Feb 21, 2022 at 8:08 PM Eli Cohen  wrote:
>
> This is required by iproute2 to display the capabilities of a vdpa based
> virtio device.
>
> Signed-off-by: Eli Cohen 
> v0 -> v1:
> Avoid modifying mcdi_pcol.h

Acked-by: Jason Wang 

> ---
>  include/uapi/linux/virtio_config.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/linux/virtio_config.h 
> b/include/uapi/linux/virtio_config.h
> index b5eda06f0d57..30eb76dcdcad 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -92,4 +92,10 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV37
> +
> +/* When negotiated, indicates that the driver can pass extra data beyond
> + * virtqueue identification when sending notifications
> + */
> +#define VIRTIO_F_NOTIFICATION_DATA 38
> +
>  #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 12:17 AM David Woodhouse  wrote:
>
> As things stand, an application which wants to use vhost with a trivial
> 1:1 mapping of its virtual address space is forced to jump through hoops
> to detect what the address range might be. The VHOST_SET_MEM_TABLE ioctl
> helpfully doesn't fail immediately; you only get a failure *later* when
> you attempt to set the backend, if the table *could* map to an address
> which is out of range, even if no out-of-range address is actually
> being referenced.
>
> Since userspace is growing workarounds for this lovely kernel API, let's
> ensure that we have a regression test that does things basically the same
> way as https://gitlab.com/openconnect/openconnect/-/commit/443edd9d8826
> does.

I wonder if it's useful to have a small library that wraps vhost
kernel uAPI somewhere.

(In the future, we may want to let the kernel accept 1:1 mapping by
figuring out the illegal range by itself?)

Thanks

>
> This is untested as I can't actually get virtio_test to work at all; it
> just seems to deadlock on a spinlock. But it's getting the right answer
> for the virtio range on x86_64 at least.
>
> Signed-off-by: David Woodhouse 
> ---
>
> Please, tell me I don't need to do this. But if I *do*, it needs a
> regression test in-kernel.
>
>  tools/virtio/virtio_test.c | 109 -
>  1 file changed, 106 insertions(+), 3 deletions(-)
>
> diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
> index cb3f29c09aff..e40eeeb05b71 100644
> --- a/tools/virtio/virtio_test.c
> +++ b/tools/virtio/virtio_test.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -124,6 +125,109 @@ static void vq_info_add(struct vdev_info *dev, int num)
> dev->nvqs++;
>  }
>
> +/*
> + * This is awful. The kernel doesn't let us just ask for a 1:1 mapping of
> + * our virtual address space; we have to *know* the minimum and maximum
> + * addresses. We can't test it directly with VHOST_SET_MEM_TABLE because
> + * that actually succeeds, and the failure only occurs later when we try
> + * to use a buffer at an address that *is* valid, but our memory table
> + * *could* point to addresses that aren't. Ewww.
> + *
> + * So... attempt to work out what TASK_SIZE is for the kernel we happen
> + * to be running on right now...
> + */
> +
> +static int testaddr(unsigned long addr)
> +{
> +   void *res = mmap((void *)addr, getpagesize(), PROT_NONE,
> +MAP_FIXED|MAP_ANONYMOUS, -1, 0);
> +   if (res == MAP_FAILED) {
> +   if (errno == EEXIST || errno == EINVAL)
> +   return 1;
> +
> +   /* We get ENOMEM for a bad virtual address */
> +   return 0;
> +   }
> +   /* It shouldn't actually succeed without either MAP_SHARED or
> +* MAP_PRIVATE in the flags, but just in case... */
> +   munmap((void *)addr, getpagesize());
> +   return 1;
> +}
> +
> +static int find_vmem_range(struct vhost_memory *vmem)
> +{
> +   const unsigned long page_size = getpagesize();
> +   unsigned long top;
> +   unsigned long bottom;
> +
> +   top = -page_size;
> +
> +   if (testaddr(top)) {
> +   vmem->regions[0].memory_size = top;
> +   goto out;
> +   }
> +
> +   /* 'top' is the lowest address known *not* to work */
> +   bottom = top;
> +   while (1) {
> +   bottom >>= 1;
> +   bottom &= ~(page_size - 1);
> +   assert(bottom);
> +
> +   if (testaddr(bottom))
> +   break;
> +   top = bottom;
> +   }
> +
> +   /* It's often a page or two below the boundary */
> +   top -= page_size;
> +   if (testaddr(top)) {
> +   vmem->regions[0].memory_size = top;
> +   goto out;
> +   }
> +   top -= page_size;
> +   if (testaddr(top)) {
> +   vmem->regions[0].memory_size = top;
> +   goto out;
> +   }
> +
> +   /* Now, bottom is the highest address known to work,
> +  and we must search between it and 'top' which is
> +  the lowest address known not to. */
> +   while (bottom + page_size != top) {
> +   unsigned long test = bottom + (top - bottom) / 2;
> +   test &= ~(page_size - 1);
> +
> +   if (testaddr(test)) {
> +   bottom = test;
> +   continue;
> +   }
> +   test -= page_size;
> +   if (testaddr(test)) {
> +   vmem->regions[0].memory_size = test;
> +   goto out;
> +   }
> +
> +   test -= page_size;
> +   if (testaddr(test)) {
> +   vmem->regions[0].memory_size = test;
> +   goto out;
> +   }
> +   top = test;
> +   }
> +   

Re: [PATCH] vdpa/mlx5: Enlarge queue size to 512 entries

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 3:00 AM Eugenio Perez Martin
 wrote:
>
> On Mon, Feb 21, 2022 at 2:20 PM Eli Cohen  wrote:
> >
> > Currently we can work with 512 entries with any MTU size. Change
> > MLX5_VDPA_MAX_VQ_ENTRIES accordingly.
> >
> > Signed-off-by: Eli Cohen 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index be095dc1134e..de115e3fe761 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2062,7 +2062,7 @@ static void mlx5_vdpa_set_config_cb(struct 
> > vdpa_device *vdev, struct vdpa_callba
> > ndev->config_cb = *cb;
> >  }
> >
> > -#define MLX5_VDPA_MAX_VQ_ENTRIES 256
> > +#define MLX5_VDPA_MAX_VQ_ENTRIES 512

I'd expect this should be something that can be read from the fw or
hw? Otherwise we can't increase it in the future.

Thanks

> >  static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
> >  {
> > return MLX5_VDPA_MAX_VQ_ENTRIES;
> > --
> > 2.35.1
> >
>
> Acked-by: Eugenio Pérez 
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
 wrote:
>
> On Mon, Feb 21, 2022 at 8:15 AM Jason Wang  wrote:
> >
> >
> > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang  wrote:
> > >>
> > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang  wrote:
> >  在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > First half of the buffers forwarding part, preparing vhost-vdpa
> > > callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > > this is effectively dead code at the moment, but it helps to reduce
> > > patch size.
> > >
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > > hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > hw/virtio/vhost-shadow-virtqueue.c |  21 -
> > > hw/virtio/vhost-vdpa.c | 133 
> > > ++---
> > > 3 files changed, 143 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> > > b/hw/virtio/vhost-shadow-virtqueue.h
> > > index 035207a469..39aef5ffdf 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const 
> > > VhostShadowVirtqueue *svq);
> > >
> > > void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >
> > > -VhostShadowVirtqueue *vhost_svq_new(void);
> > > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >
> > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > > b/hw/virtio/vhost-shadow-virtqueue.c
> > > index f129ec8395..7c168075d7 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > /**
> > >  * Creates vhost shadow virtqueue, and instruct vhost device to 
> > > use the shadow
> > >  * methods and file descriptors.
> > > + *
> > > + * @qsize Shadow VirtQueue size
> > > + *
> > > + * Returns the new virtqueue or NULL.
> > > + *
> > > + * In case of error, reason is reported through error_report.
> > >  */
> > > -VhostShadowVirtqueue *vhost_svq_new(void)
> > > +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > {
> > > +size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > +size_t device_size, driver_size;
> > > g_autofree VhostShadowVirtqueue *svq = 
> > > g_new0(VhostShadowVirtqueue, 1);
> > > int r;
> > >
> > > @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > /* Placeholder descriptor, it should be deleted at 
> > > set_kick_fd */
> > > event_notifier_init_fd(>svq_kick, INVALID_SVQ_KICK_FD);
> > >
> > > +svq->vring.num = qsize;
> >  I wonder if this is the best. E.g some hardware can support up to 32K
> >  queue size. So this will probably end up with:
> > 
> >  1) SVQ use 32K queue size
> >  2) hardware queue uses 256
> > 
> > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > >>> negotiate any number with SVQ equal or less than 32K,
> > >>
> > >> Sorry for being unclear what I meant is actually
> > >>
> > >> 1) SVQ uses 32K queue size
> > >>
> > >> 2) guest vq uses 256
> > >>
> > >> This looks like a burden that needs extra logic and may damage the
> > >> performance.
> > >>
> > > Still not getting this point.
> > >
> > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > "plenty" of SVQ buffers.
> >
> >
> > Yes, but this case should be rare. So in this case we should deal with
> > overrun on SVQ, that is
> >
> > 1) SVQ is full
> > 2) guest VQ isn't
> >
> > We need to
> >
> > 1) check the available buffer slots
> > 2) disable guest kick and wait for the used buffers
> >
> > But it looks to me the current code is not ready for dealing with this case?
> >
>
> Yes it deals, that's the meaning of svq->next_guest_avail_elem.

Oh right, I missed that.

>
> >
> > >
> > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > to handle this situation. But we would leave out valid virtio drivers.
> > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > (x-svq-size-n=N)?
> > >
> > > If you mean we lose performance because memory gets more sparse I
> > > think the only possibility is to limit that way.
> >
> >
> > If guest is not using 32K, having a 32K for svq may gives extra stress
> > on the cache since we will end up with a pretty large working set.
> >
>
> That might be true. My guess 

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-21 Thread Jason Wang
On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam  wrote:
>
> In vhost_iotlb_add_range_ctx(), validate the range size is non-zero
> before proceeding with adding it to the iotlb.
>
> Range size can overflow to 0 when start is 0 and last is (2^64 - 1).
> One instance where it can happen is when userspace sends an IOTLB
> message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> iotlb. Next time a packet is sent, iotlb_access_ok() loops
> indefinitely due to that erroneous entry:
>
> Call Trace:
>  
>  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
>  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
>  vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
>  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
>  kthread+0x2e9/0x3a0 kernel/kthread.c:377
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>  
>
> Reported by syzbot at:
> https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
>
> Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 
> ---
>  drivers/vhost/iotlb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> index 670d56c879e5..b9de74bd2f9c 100644
> --- a/drivers/vhost/iotlb.c
> +++ b/drivers/vhost/iotlb.c
> @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
>   void *opaque)
>  {
> struct vhost_iotlb_map *map;
> +   u64 size = last - start + 1;
>
> -   if (last < start)
> +   // size can overflow to 0 when start is 0 and last is (2^64 - 1).
> +   if (last < start || size == 0)
> return -EFAULT;

I'd move this check to vhost_chr_iter_write(), then for the device who
has its own msg handler (e.g vDPA) can benefit from it as well.

Thanks

>
> if (iotlb->limit &&
> @@ -69,7 +71,7 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
> return -ENOMEM;
>
> map->start = start;
> -   map->size = last - start + 1;
> +   map->size = size;
> map->last = last;
> map->addr = addr;
> map->perm = perm;
> --
> 2.35.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote:

On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote:

On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote:
> > On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella  
wrote:
> > >
> > > vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> > > ownership. It expects current->mm to be valid.
> > >
> > > vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> > > the user has not done close(), so when we are in do_exit(). In this
> > > case current->mm is invalid and we're releasing the device, so we
> > > should clean it anyway.
> > >
> > > Let's check the owner only when vhost_vsock_stop() is called
> > > by an ioctl.
> > >
> > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> > > Cc: sta...@vger.kernel.org
> > > Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> > > Signed-off-by: Stefano Garzarella 
> > > ---
> > >  drivers/vhost/vsock.c | 14 --
> > >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > Reported-and-tested-by: 
syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
>
> I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush"
> even though syzbot says so. I am able to reproduce the issue locally
> even with this patch applied.

Are you using the sysbot reproducer or another test?
In that case, can you share it?


I am using the syzbot reproducer.



From the stack trace it seemed to me that the worker accesses a zone that
has been cleaned (iotlb), so it is invalid and fails.


Would the thread hang in that case? How?


Looking at this log [1] it seems that the process is blocked on the 
wait_for_completion() in vhost_work_dev_flush().


Since we're not setting the backend to NULL to stop the worker, it's 
likely that the worker will keep running, preventing the flush work from 
completing.


[1] https://syzkaller.appspot.com/text?tag=CrashLog=153f085270

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread David Woodhouse
On Mon, 2022-02-21 at 18:02 +0100, Stefano Garzarella wrote:
> On Mon, Feb 21, 2022 at 04:15:22PM +, David Woodhouse wrote:
> > As things stand, an application which wants to use vhost with a trivial
> > 1:1 mapping of its virtual address space is forced to jump through hoops
> > to detect what the address range might be. The VHOST_SET_MEM_TABLE ioctl
> > helpfully doesn't fail immediately; you only get a failure *later* when
> > you attempt to set the backend, if the table *could* map to an address
> > which is out of range, even if no out-of-range address is actually
> > being referenced.
> > 
> > Since userspace is growing workarounds for this lovely kernel API, let's
> > ensure that we have a regression test that does things basically the same
> > way as 
> > https://gitlab.com/openconnect/openconnect/-/commit/443edd9d8826
> > 
> > does.
> > 
> > This is untested as I can't actually get virtio_test to work at all; it
> > just seems to deadlock on a spinlock. But it's getting the right answer
> > for the virtio range on x86_64 at least.
> 
> I had a similar issue with virtio_test and this simple patch [1] should 
> fix the deadlock.
> 
> [1] 
> https://lore.kernel.org/lkml/20220118150631.167015-1-sgarz...@redhat.com/=

Thanks.

[dwoodhou@i7 virtio]$ sudo ~/virtio_test
Detected virtual address range 0x1000-0x7000
spurious wakeups: 0x0 started=0x10 completed=0x10

Although in some circumstances I also see a different build failure:

cc -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I 
../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing 
-fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h   -c 
-o vringh_test.o vringh_test.c
In file included from ./linux/uio.h:3,
 from ./linux/../../../include/linux/vringh.h:15,
 from ./linux/vringh.h:1,
 from vringh_test.c:9:
./linux/../../../include/linux/uio.h:10:10: fatal error: linux/mm_types.h: No 
such file or directory
   10 | #include 
  |  ^~
compilation terminated.
make: *** [: vringh_test.o] Error 1


smime.p7s
Description: S/MIME cryptographic signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 04:15:22PM +, David Woodhouse wrote:

As things stand, an application which wants to use vhost with a trivial
1:1 mapping of its virtual address space is forced to jump through hoops
to detect what the address range might be. The VHOST_SET_MEM_TABLE ioctl
helpfully doesn't fail immediately; you only get a failure *later* when
you attempt to set the backend, if the table *could* map to an address
which is out of range, even if no out-of-range address is actually
being referenced.

Since userspace is growing workarounds for this lovely kernel API, let's
ensure that we have a regression test that does things basically the same
way as https://gitlab.com/openconnect/openconnect/-/commit/443edd9d8826
does.

This is untested as I can't actually get virtio_test to work at all; it
just seems to deadlock on a spinlock. But it's getting the right answer
for the virtio range on x86_64 at least.


I had a similar issue with virtio_test and this simple patch [1] should 
fix the deadlock.


[1] 
https://lore.kernel.org/lkml/20220118150631.167015-1-sgarz...@redhat.com/


Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: handle zero regions in vhost_set_memory

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 12:58:51PM +0530, Anirudh Rayabharam wrote:

Return early when userspace sends zero regions in the VHOST_SET_MEM_TABLE
ioctl.

Otherwise, this causes an erroneous entry to be added to the iotlb. This
entry has a range size of 0 (due to u64 overflow). This then causes
iotlb_access_ok() to loop indefinitely resulting in a hung thread.
Syzbot has reported this here:

https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87


IIUC vhost_iotlb_add_range() in the for loop is never called if 
mem.nregions is 0, so I'm not sure the problem reported by syzbot is 
related.


In any case maybe this patch is fine, but currently I think we're just 
registering an iotlb without any regions, which in theory shouldn't 
cause any problems.


Thanks,
Stefano



Reported-and-tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
Signed-off-by: Anirudh Rayabharam 
---
drivers/vhost/vhost.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..821aba60eac2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1428,6 +1428,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -EFAULT;
if (mem.padding)
return -EOPNOTSUPP;
+   if (mem.nregions == 0)
+   return 0;
if (mem.nregions > max_mem_regions)
return -E2BIG;
newmem = kvzalloc(struct_size(newmem, regions, mem.nregions),
--
2.35.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote:

On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote:

On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella  wrote:
>
> vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> ownership. It expects current->mm to be valid.
>
> vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> the user has not done close(), so when we are in do_exit(). In this
> case current->mm is invalid and we're releasing the device, so we
> should clean it anyway.
>
> Let's check the owner only when vhost_vsock_stop() is called
> by an ioctl.
>
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: sta...@vger.kernel.org
> Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> Signed-off-by: Stefano Garzarella 
> ---
>  drivers/vhost/vsock.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)

Reported-and-tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com


I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush"
even though syzbot says so. I am able to reproduce the issue locally
even with this patch applied.


Are you using the sysbot reproducer or another test?
In that case, can you share it?

From the stack trace it seemed to me that the worker accesses a zone 
that has been cleaned (iotlb), so it is invalid and fails.
That's why I had this patch tested which should stop the worker before 
cleaning.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] INFO: task hung in vhost_work_dev_flush

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 09:23:04PM +0530, Anirudh Rayabharam wrote:

On Mon, Feb 21, 2022 at 03:12:33PM +0100, Stefano Garzarella wrote:

#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
f71077a4d84b

Patch sent upstream:
https://lore.kernel.org/virtualization/20220221114916.107045-1-sgarz...@redhat.com/T/#u


I don't see how your patch fixes this issue. It looks unrelated. It is
surprising that syzbot is happy with it.

I have sent a patch for this issue here:
https://lore.kernel.org/lkml/20220221072852.31820-1-m...@anirudhrb.com/


It is related because the worker thread is accessing the iotlb that is 
going to be freed, so it could be corrupted/invalid.


Your patch seems right, but simply prevents iotlb from being set for the 
the specific test case, so it remains NULL and iotlb_access_ok() exits 
immediately.


Anyway, currently if nregions is 0 vhost_set_memory() sets an iotlb with 
no regions (the for loop is not executed), so I'm not sure 
iotlb_access_ok() cycles infinitely.


Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] tools/virtio: Test virtual address range detection

2022-02-21 Thread David Woodhouse
As things stand, an application which wants to use vhost with a trivial
1:1 mapping of its virtual address space is forced to jump through hoops
to detect what the address range might be. The VHOST_SET_MEM_TABLE ioctl
helpfully doesn't fail immediately; you only get a failure *later* when
you attempt to set the backend, if the table *could* map to an address
which is out of range, even if no out-of-range address is actually
being referenced.

Since userspace is growing workarounds for this lovely kernel API, let's
ensure that we have a regression test that does things basically the same
way as https://gitlab.com/openconnect/openconnect/-/commit/443edd9d8826
does.

This is untested as I can't actually get virtio_test to work at all; it
just seems to deadlock on a spinlock. But it's getting the right answer
for the virtio range on x86_64 at least.

Signed-off-by: David Woodhouse 
---

Please, tell me I don't need to do this. But if I *do*, it needs a
regression test in-kernel.

 tools/virtio/virtio_test.c | 109 -
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index cb3f29c09aff..e40eeeb05b71 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -124,6 +125,109 @@ static void vq_info_add(struct vdev_info *dev, int num)
dev->nvqs++;
 }
 
+/*
+ * This is awful. The kernel doesn't let us just ask for a 1:1 mapping of
+ * our virtual address space; we have to *know* the minimum and maximum
+ * addresses. We can't test it directly with VHOST_SET_MEM_TABLE because
+ * that actually succeeds, and the failure only occurs later when we try
+ * to use a buffer at an address that *is* valid, but our memory table
+ * *could* point to addresses that aren't. Ewww.
+ *
+ * So... attempt to work out what TASK_SIZE is for the kernel we happen
+ * to be running on right now...
+ */
+
+static int testaddr(unsigned long addr)
+{
+   void *res = mmap((void *)addr, getpagesize(), PROT_NONE,
+MAP_FIXED|MAP_ANONYMOUS, -1, 0);
+   if (res == MAP_FAILED) {
+   if (errno == EEXIST || errno == EINVAL)
+   return 1;
+
+   /* We get ENOMEM for a bad virtual address */
+   return 0;
+   }
+   /* It shouldn't actually succeed without either MAP_SHARED or
+* MAP_PRIVATE in the flags, but just in case... */
+   munmap((void *)addr, getpagesize());
+   return 1;
+}
+
+static int find_vmem_range(struct vhost_memory *vmem)
+{
+   const unsigned long page_size = getpagesize();
+   unsigned long top;
+   unsigned long bottom;
+
+   top = -page_size;
+
+   if (testaddr(top)) {
+   vmem->regions[0].memory_size = top;
+   goto out;
+   }
+
+   /* 'top' is the lowest address known *not* to work */
+   bottom = top;
+   while (1) {
+   bottom >>= 1;
+   bottom &= ~(page_size - 1);
+   assert(bottom);
+
+   if (testaddr(bottom))
+   break;
+   top = bottom;
+   }
+
+   /* It's often a page or two below the boundary */
+   top -= page_size;
+   if (testaddr(top)) {
+   vmem->regions[0].memory_size = top;
+   goto out;
+   }
+   top -= page_size;
+   if (testaddr(top)) {
+   vmem->regions[0].memory_size = top;
+   goto out;
+   }
+
+   /* Now, bottom is the highest address known to work,
+  and we must search between it and 'top' which is
+  the lowest address known not to. */
+   while (bottom + page_size != top) {
+   unsigned long test = bottom + (top - bottom) / 2;
+   test &= ~(page_size - 1);
+
+   if (testaddr(test)) {
+   bottom = test;
+   continue;
+   }
+   test -= page_size;
+   if (testaddr(test)) {
+   vmem->regions[0].memory_size = test;
+   goto out;
+   }
+
+   test -= page_size;
+   if (testaddr(test)) {
+   vmem->regions[0].memory_size = test;
+   goto out;
+   }
+   top = test;
+   }
+   vmem->regions[0].memory_size = bottom;
+
+ out:
+   vmem->regions[0].guest_phys_addr = page_size;
+   vmem->regions[0].userspace_addr = page_size;
+   printf("Detected virtual address range 0x%lx-0x%lx\n",
+  page_size,
+  (unsigned long)(page_size + vmem->regions[0].memory_size));
+
+   return 0;
+}
+
+
 static void vdev_info_init(struct vdev_info* dev, unsigned long long features)
 {
int r;
@@ -143,9 +247,8 @@ static void vdev_info_init(struct vdev_info* dev, unsigned 

Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Stefano Garzarella

On Mon, Feb 21, 2022 at 10:03:39AM -0500, Michael S. Tsirkin wrote:

On Mon, Feb 21, 2022 at 12:49:16PM +0100, Stefano Garzarella wrote:

vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.

vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.

Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.






Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: sta...@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella 
---
 drivers/vhost/vsock.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..f00d2dfd72b7 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -629,16 +629,18 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
return ret;
 }

-static int vhost_vsock_stop(struct vhost_vsock *vsock)
+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)



 {
size_t i;
int ret;

mutex_lock(>dev.mutex);

-   ret = vhost_dev_check_owner(>dev);
-   if (ret)
-   goto err;
+   if (check_owner) {
+   ret = vhost_dev_check_owner(>dev);
+   if (ret)
+   goto err;
+   }

for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
struct vhost_virtqueue *vq = >vqs[i];
@@ -753,7 +755,7 @@ static int vhost_vsock_dev_release(struct inode *inode, 
struct file *file)
 * inefficient.  Room for improvement here. */
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);

-   vhost_vsock_stop(vsock);


Let's add an explanation:

When invoked from release we can not fail so we don't
check return code of vhost_vsock_stop.
We need to stop vsock even if it's not the owner.


Do you want me to send a v2 by adding this as a comment in the code?

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Michael S. Tsirkin
On Mon, Feb 21, 2022 at 12:49:16PM +0100, Stefano Garzarella wrote:
> vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> ownership. It expects current->mm to be valid.
> 
> vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> the user has not done close(), so when we are in do_exit(). In this
> case current->mm is invalid and we're releasing the device, so we
> should clean it anyway.
> 
> Let's check the owner only when vhost_vsock_stop() is called
> by an ioctl.




> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: sta...@vger.kernel.org
> Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> Signed-off-by: Stefano Garzarella 
> ---
>  drivers/vhost/vsock.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index d6ca1c7ad513..f00d2dfd72b7 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -629,16 +629,18 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>   return ret;
>  }
>  
> -static int vhost_vsock_stop(struct vhost_vsock *vsock)
> +static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)

>  {
>   size_t i;
>   int ret;
>  
>   mutex_lock(>dev.mutex);
>  
> - ret = vhost_dev_check_owner(>dev);
> - if (ret)
> - goto err;
> + if (check_owner) {
> + ret = vhost_dev_check_owner(>dev);
> + if (ret)
> + goto err;
> + }
>  
>   for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>   struct vhost_virtqueue *vq = >vqs[i];
> @@ -753,7 +755,7 @@ static int vhost_vsock_dev_release(struct inode *inode, 
> struct file *file)
>* inefficient.  Room for improvement here. */
>   vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
>  
> - vhost_vsock_stop(vsock);

Let's add an explanation:

When invoked from release we can not fail so we don't
check return code of vhost_vsock_stop.
We need to stop vsock even if it's not the owner.

> + vhost_vsock_stop(vsock, false);
>   vhost_vsock_flush(vsock);
>   vhost_dev_stop(>dev);
>  
> @@ -868,7 +870,7 @@ static long vhost_vsock_dev_ioctl(struct file *f, 
> unsigned int ioctl,
>   if (start)
>   return vhost_vsock_start(vsock);
>   else
> - return vhost_vsock_stop(vsock);
> + return vhost_vsock_stop(vsock, true);
>   case VHOST_GET_FEATURES:
>   features = VHOST_VSOCK_FEATURES;
>   if (copy_to_user(argp, , sizeof(features)))
> -- 
> 2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] INFO: task hung in vhost_work_dev_flush

2022-02-21 Thread Stefano Garzarella
#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
f71077a4d84b

Patch sent upstream:
https://lore.kernel.org/virtualization/20220221114916.107045-1-sgarz...@redhat.com/T/#u

On Sat, Feb 19, 2022 at 12:23 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:e6251ab4551f Merge tag 'nfs-for-5.17-2' of git://git.linux..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=163caa3c70
> kernel config:  https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
> dashboard link: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils 
> for Debian) 2.35.2
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=108514a470
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16ca671c70
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
>
> INFO: task syz-executor117:3632 blocked for more than 143 seconds.
>   Not tainted 5.17.0-rc3-syzkaller-00029-ge6251ab4551f #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor117 state:D stack:27512 pid: 3632 ppid:  3631 
> flags:0x4002
> Call Trace:
>  
>  context_switch kernel/sched/core.c:4986 [inline]
>  __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
>  schedule+0xd2/0x260 kernel/sched/core.c:6368
>  schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>  __wait_for_common kernel/sched/completion.c:106 [inline]
>  wait_for_common kernel/sched/completion.c:117 [inline]
>  wait_for_completion+0x174/0x270 kernel/sched/completion.c:138
>  vhost_work_dev_flush.part.0+0xbb/0xf0 drivers/vhost/vhost.c:243
>  vhost_work_dev_flush drivers/vhost/vhost.c:238 [inline]
>  vhost_poll_flush+0x5e/0x80 drivers/vhost/vhost.c:252
>  vhost_vsock_flush drivers/vhost/vsock.c:710 [inline]
>  vhost_vsock_dev_release+0x1be/0x4b0 drivers/vhost/vsock.c:757
>  __fput+0x286/0x9f0 fs/file_table.c:311
>  task_work_run+0xdd/0x1a0 kernel/task_work.c:164
>  exit_task_work include/linux/task_work.h:32 [inline]
>  do_exit+0xb29/0x2a30 kernel/exit.c:806
>  do_group_exit+0xd2/0x2f0 kernel/exit.c:935
>  __do_sys_exit_group kernel/exit.c:946 [inline]
>  __se_sys_exit_group kernel/exit.c:944 [inline]
>  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:944
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fbf04b83b89
> RSP: 002b:7fff5bc9ca18 EFLAGS: 0246 ORIG_RAX: 00e7
> RAX: ffda RBX: 7fbf04bf8330 RCX: 7fbf04b83b89
> RDX: 003c RSI: 00e7 RDI: 
> RBP:  R08: ffc0 R09: 7fff5bc9cc08
> R10: 7fff5bc9cc08 R11: 0246 R12: 7fbf04bf8330
> R13: 0001 R14:  R15: 0001
>  
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/26:
>  #0: 8bb83c20 (rcu_read_lock){}-{1:2}, at: 
> debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6460
> 2 locks held by getty/3275:
>  #0: 88807f0db098 (>ldisc_sem){}-{0:0}, at: 
> tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:244
>  #1: c90002b662e8 (>atomic_read_lock){+.+.}-{3:3}, at: 
> n_tty_read+0xcf0/0x1230 drivers/tty/n_tty.c:2077
> 1 lock held by vhost-3632/3633:
>
> =
>
> NMI backtrace for cpu 0
> CPU: 0 PID: 26 Comm: khungtaskd Not tainted 
> 5.17.0-rc3-syzkaller-00029-ge6251ab4551f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:212 [inline]
>  watchdog+0xc1d/0xf50 kernel/hung_task.c:369
>  kthread+0x2e9/0x3a0 kernel/kthread.c:377
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>  
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 3633 Comm: vhost-3632 Not tainted 
> 5.17.0-rc3-syzkaller-00029-ge6251ab4551f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:check_kcov_mode kernel/kcov.c:166 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x60 kernel/kcov.c:200
> Code: 00 00 e9 c6 41 66 02 66 0f 1f 44 00 00 48 8b be b0 01 00 00 e8 b4 ff ff 
> ff 31 c0 c3 90 65 8b 05 29 f7 89 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 
> 48 8b 14 25 00 70 02 00 a9 00 01 ff 00 74 0e
> RSP: 0018:c9cd7c78 EFLAGS: 

Re: [PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Stefano Garzarella
On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella  wrote:
>
> vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
> ownership. It expects current->mm to be valid.
>
> vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
> the user has not done close(), so when we are in do_exit(). In this
> case current->mm is invalid and we're releasing the device, so we
> should clean it anyway.
>
> Let's check the owner only when vhost_vsock_stop() is called
> by an ioctl.
>
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
> Cc: sta...@vger.kernel.org
> Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
> Signed-off-by: Stefano Garzarella 
> ---
>  drivers/vhost/vsock.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)

Reported-and-tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174...@syzkaller.appspotmail.com

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] general protection fault in vhost_iotlb_itree_first

2022-02-21 Thread Stefano Garzarella

#syz test: https://github.com/stefano-garzarella/linux.git vsock-fix-stop

On Sat, Feb 19, 2022 at 01:18:24AM -0800, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:359303076163 tty: n_tty: do not look ahead for EOL charact..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16b34b5470
kernel config:  https://syzkaller.appspot.com/x/.config?x=da674567f7b6043d
dashboard link: https://syzkaller.appspot.com/bug?extid=bbb030fc51d6f3c5d067
compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for 
Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+bbb030fc51d6f3c5d...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 
0xdc00:  [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x-0x0007]
CPU: 1 PID: 17981 Comm: vhost-17980 Not tainted 
5.17.0-rc4-syzkaller-00052-g359303076163 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:vhost_iotlb_itree_iter_first drivers/vhost/iotlb.c:19 [inline]
RIP: 0010:vhost_iotlb_itree_first+0x29/0x280 drivers/vhost/iotlb.c:169
Code: 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 89 f3 e8 e8 eb a0 fa 48 
89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 e8 01 00 00 
4c 8b 65 00 4d 85 e4 0f 84 b3 01 00
RSP: 0018:c90004f57ac8 EFLAGS: 00010246
RAX: dc00 RBX: 30303030320a0028 RCX: c900103dc000
RDX:  RSI: 86d72738 RDI: 
RBP:  R08:  R09: 0002
R10: 86d62d88 R11:  R12: 8880260e4d68
R13: 303030305f3a3057 R14: dc00 R15: 
FS:  () GS:8880b9d0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f2d46121901 CR3: 1d652000 CR4: 003506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:

translate_desc+0x11e/0x3e0 drivers/vhost/vhost.c:2054
vhost_get_vq_desc+0x662/0x22c0 drivers/vhost/vhost.c:2300
vhost_vsock_handle_tx_kick+0x277/0xa20 drivers/vhost/vsock.c:522
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Modules linked in:
---[ end trace  ]---
RIP: 0010:vhost_iotlb_itree_iter_first drivers/vhost/iotlb.c:19 [inline]
RIP: 0010:vhost_iotlb_itree_first+0x29/0x280 drivers/vhost/iotlb.c:169
Code: 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 89 f3 e8 e8 eb a0 fa 48 
89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 e8 01 00 00 
4c 8b 65 00 4d 85 e4 0f 84 b3 01 00
RSP: 0018:c90004f57ac8 EFLAGS: 00010246
RAX: dc00 RBX: 30303030320a0028 RCX: c900103dc000
RDX:  RSI: 86d72738 RDI: 
RBP:  R08:  R09: 0002
R10: 86d62d88 R11:  R12: 8880260e4d68
R13: 303030305f3a3057 R14: dc00 R15: 
FS:  () GS:8880b9d0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f2d449f6718 CR3: 1d652000 CR4: 003506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400

Code disassembly (best guess):
  0:00 41 57add%al,0x57(%rcx)
  3:41 56   push   %r14
  5:41 55   push   %r13
  7:49 89 d5mov%rdx,%r13
  a:41 54   push   %r12
  c:55  push   %rbp
  d:48 89 fdmov%rdi,%rbp
 10:53  push   %rbx
 11:48 89 f3mov%rsi,%rbx
 14:e8 e8 eb a0 fa  callq  0xfaa0ec01
 19:48 89 eamov%rbp,%rdx
 1c:48 b8 00 00 00 00 00movabs $0xdc00,%rax
 23:fc ff df
 26:48 c1 ea 03 shr$0x3,%rdx
* 2a:   80 3c 02 00 cmpb   $0x0,(%rdx,%rax,1) <-- trapping 
instruction
 2e:0f 85 e8 01 00 00   jne0x21c
 34:4c 8b 65 00 mov0x0(%rbp),%r12
 38:4d 85 e4test   %r12,%r12
 3b:0f  .byte 0xf
 3c:84  .byte 0x84
 3d:b3 01   mov$0x1,%bl


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to 

[PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

2022-02-21 Thread Stefano Garzarella
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.

vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.

Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.

Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: sta...@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella 
---
 drivers/vhost/vsock.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d6ca1c7ad513..f00d2dfd72b7 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -629,16 +629,18 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
return ret;
 }
 
-static int vhost_vsock_stop(struct vhost_vsock *vsock)
+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
 {
size_t i;
int ret;
 
mutex_lock(>dev.mutex);
 
-   ret = vhost_dev_check_owner(>dev);
-   if (ret)
-   goto err;
+   if (check_owner) {
+   ret = vhost_dev_check_owner(>dev);
+   if (ret)
+   goto err;
+   }
 
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
struct vhost_virtqueue *vq = >vqs[i];
@@ -753,7 +755,7 @@ static int vhost_vsock_dev_release(struct inode *inode, 
struct file *file)
 * inefficient.  Room for improvement here. */
vsock_for_each_connected_socket(vhost_vsock_reset_orphans);
 
-   vhost_vsock_stop(vsock);
+   vhost_vsock_stop(vsock, false);
vhost_vsock_flush(vsock);
vhost_dev_stop(>dev);
 
@@ -868,7 +870,7 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned 
int ioctl,
if (start)
return vhost_vsock_start(vsock);
else
-   return vhost_vsock_stop(vsock);
+   return vhost_vsock_stop(vsock, true);
case VHOST_GET_FEATURES:
features = VHOST_VSOCK_FEATURES;
if (copy_to_user(argp, , sizeof(features)))
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-21 Thread Xuan Zhuo
On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang  wrote:
> On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo  wrote:
> >
> > On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang  wrote:
> > > On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > virtqueue_add() only supports virtual addresses, dma is completed in
> > > > virtqueue_add().
> > > >
> > > > In some scenarios (such as the AF_XDP scenario), DMA is completed in 
> > > > advance, so
> > > > it is necessary for us to support passing the DMA address to 
> > > > virtqueue_add().
> > >
> > > I'd suggest rename this feature as "unmanaged DMA".
> >
> > OK
> >
> > >
> > > >
> > > > Record this predma information in extra->flags, which can be skipped 
> > > > when
> > > > executing dma unmap.
> > >
> > > Question still, can we use per-virtqueue flag instead of per
> > > descriptor flag? If my memory is correct, the answer is yes in the
> > > discussion for the previous version.
> > >
> >
> > Yes.
> >
> > per-virtqueue? I guess it should be per-submit.
> >
> > This patch set only adds a flag to desc_extra[head].flags, so that we can 
> > know
> > if we need to unmap dma when we detach.
>
> I meant if we can manage to make it per virtqueue, there's no need to
> maintain per buffer flag.
>

Rethinking this question, I feel there is no essential difference between per
virtqueue and per sgs.

per virtqueue:
1. add buf:
a. check vq->premapped for map every sg
2. detach:
a. check vq->premaped for unmap

per sgs:
1. add buf:
a. check function parameter "premapped" for map every sg
b. add flag to extra[head].flag

2. detach:
a: check extra[head].flag for unmap


Thanks.


> So we know something that needs to be mapped by virtio core itself,
> e.g the indirect page. Other than this, all the rest could be
> pre-mapped.
>
> For vnet header, it could be mapped by virtio-net which could be still
> treated as pre mapped DMA since it's not the virtio ring code.
>
> Anything I miss here?
>
> Thanks
>
>
> >
> > Thanks.
> >
> > > Thanks
> > >
> > > >
> > > > v1:
> > > >1. All sgs requested at one time are required to be unified PREDMA, 
> > > > and several
> > > >   of them are not supported to be PREDMA
> > > >2. virtio_dma_map() is removed from this patch set and will be 
> > > > submitted
> > > >   together with the next time AF_XDP supports virtio dma
> > > >3. Added patch #2 #3 to remove the check for flags when performing 
> > > > unmap
> > > >   indirect desc
> > > >
> > > > Xuan Zhuo (6):
> > > >   virtio: rename vring_unmap_state_packed() to
> > > > vring_unmap_extra_packed()
> > > >   virtio: remove flags check for unmap split indirect desc
> > > >   virtio: remove flags check for unmap packed indirect desc
> > > >   virtio: virtqueue_add() support predma
> > > >   virtio: split: virtqueue_add_split() support dma address
> > > >   virtio: packed: virtqueue_add_packed() support dma address
> > > >
> > > >  drivers/virtio/virtio_ring.c | 199 ++-
> > > >  1 file changed, 126 insertions(+), 73 deletions(-)
> > > >
> > > > --
> > > > 2.31.0
> > > >
> > >
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] WARNING in vhost_dev_cleanup (2)

2022-02-21 Thread Stefano Garzarella

On Fri, Feb 18, 2022 at 12:23:10PM -0600, Mike Christie wrote:

On 2/18/22 11:53 AM, Mike Christie wrote:

On 2/17/22 3:48 AM, Stefano Garzarella wrote:


On Thu, Feb 17, 2022 at 8:50 AM Michael S. Tsirkin  wrote:


On Thu, Feb 17, 2022 at 03:39:48PM +0800, Jason Wang wrote:

On Thu, Feb 17, 2022 at 3:36 PM Michael S. Tsirkin  wrote:


On Thu, Feb 17, 2022 at 03:34:13PM +0800, Jason Wang wrote:

On Thu, Feb 17, 2022 at 10:01 AM syzbot
 wrote:


Hello,

syzbot found the following issue on:

HEAD commit:c5d9ae265b10 Merge tag 'for-linus' of git://git.kernel.org..
git tree:   upstream
console output: 
https://urldefense.com/v3/__https://syzkaller.appspot.com/x/log.txt?x=132e687c70__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9YisMihn$
kernel config:  
https://urldefense.com/v3/__https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9RjOhplp$
dashboard link: 
https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?extid=1e3ea63db39f2b4440e0__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9bBf5tv0$
compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for 
Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1e3ea63db39f2b444...@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 10828 at drivers/vhost/vhost.c:715 
vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Modules linked in:
CPU: 0 PID: 10828 Comm: syz-executor.0 Not tainted 
5.17.0-rc4-syzkaller-00051-gc5d9ae265b10 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715


Probably a hint that we are missing a flush.

Looking at vhost_vsock_stop() that is called by vhost_vsock_dev_release():

static int vhost_vsock_stop(struct vhost_vsock *vsock)
{
size_t i;
int ret;

mutex_lock(>dev.mutex);

ret = vhost_dev_check_owner(>dev);
if (ret)
goto err;

Where it could fail so the device is not actually stopped.

I wonder if this is something related.

Thanks



But then if that is not the owner then no work should be running, right?


Could it be a buggy user space that passes the fd to another process
and changes the owner just before the mutex_lock() above?

Thanks


Maybe, but can you be a bit more explicit? what is the set of
conditions you see that can lead to this?


I think the issue could be in the vhost_vsock_stop() as Jason mentioned,
but not related to fd passing, but related to the do_exit() function.

Looking the stack trace, we are in exit_task_work(), that is called
after exit_mm(), so the vhost_dev_check_owner() can fail because
current->mm should be NULL at that point.

It seems the fput work is queued by fput_many() in a worker queue, and
in some cases (maybe a lot of files opened?) the work is still queued
when we enter in do_exit().

It normally happens if userspace doesn't do a close() when the VM


Just one clarification. I meant to say it "always" happens when userspace
doesn't do a close.

It doesn't have anything to do with lots of files or something like that.
We are actually running the vhost device's release function from
do_exit->task_work_run and so all those __fputs are done from something
like qemu's context (current == that process).

We are *not* hitting the case:

do_exit->exit_files->put_files_struct->filp_close->fput->fput_many

and then in there hitting the schedule_delayed_work path. For that
the last __fput would be done from a workqueue thread and so the current
pointer would point to a completely different thread.




is shutdown and instead let's the kernel's reaper code cleanup. The qemu
vhost-scsi code doesn't do a close() during shutdown and so this is our
normal code path. It also happens when something like qemu is not
gracefully shutdown like during a crash.

So fire up qemu, start IO, then crash it or kill 9 it while IO is still
running and you can hit it.


Thank you very much for this detailed explanation!





That said, I don't know if we can simply remove that check in
vhost_vsock_stop(), or check if current->mm is NULL, to understand if
the process is exiting.



Should the caller do the vhost_dev_check_owner or tell vhost_vsock_stop
when to check?

- vhost_vsock_dev_ioctl always wants to check for ownership right?

- For vhost_vsock_dev_release ownership doesn't matter because we
always want to clean up or it doesn't hurt too much.

For the case where we just do open then close and no ioctls then
running vhost_vq_set_backend in vhost_vsock_stop is just a minor
hit of extra work. If we've done ioctls, but are now in
vhost_vsock_dev_release then we know for the graceful and ungraceful
case that nothing is going to be accessing this device in 

Re: [PATCH v1 0/6] virtio: support advance DMA

2022-02-21 Thread Xuan Zhuo
On Mon, 21 Feb 2022 14:55:43 +0800, Jason Wang  wrote:
> On Mon, Feb 21, 2022 at 2:45 PM Xuan Zhuo  wrote:
> >
> > On Mon, 21 Feb 2022 14:37:49 +0800, Jason Wang  wrote:
> > > On Mon, Feb 21, 2022 at 2:20 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Mon, 21 Feb 2022 13:59:06 +0800, Xuan Zhuo 
> > > >  wrote:
> > > > > On Mon, 21 Feb 2022 11:53:33 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Mon, Feb 21, 2022 at 11:46 AM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 21 Feb 2022 11:32:52 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Fri, Feb 18, 2022 at 5:00 PM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 17 Feb 2022 15:19:44 +0800, Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > > On Thu, Feb 10, 2022 at 4:51 PM Xuan Zhuo 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > virtqueue_add() only supports virtual addresses, dma is 
> > > > > > > > > > > completed in
> > > > > > > > > > > virtqueue_add().
> > > > > > > > > > >
> > > > > > > > > > > In some scenarios (such as the AF_XDP scenario), DMA is 
> > > > > > > > > > > completed in advance, so
> > > > > > > > > > > it is necessary for us to support passing the DMA address 
> > > > > > > > > > > to virtqueue_add().
> > > > > > > > > >
> > > > > > > > > > I'd suggest rename this feature as "unmanaged DMA".
> > > > > > > > >
> > > > > > > > > OK
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Record this predma information in extra->flags, which can 
> > > > > > > > > > > be skipped when
> > > > > > > > > > > executing dma unmap.
> > > > > > > > > >
> > > > > > > > > > Question still, can we use per-virtqueue flag instead of per
> > > > > > > > > > descriptor flag? If my memory is correct, the answer is yes 
> > > > > > > > > > in the
> > > > > > > > > > discussion for the previous version.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > per-virtqueue? I guess it should be per-submit.
> > > > > > > > >
> > > > > > > > > This patch set only adds a flag to desc_extra[head].flags, so 
> > > > > > > > > that we can know
> > > > > > > > > if we need to unmap dma when we detach.
> > > > > > > >
> > > > > > > > I meant if we can manage to make it per virtqueue, there's no 
> > > > > > > > need to
> > > > > > > > maintain per buffer flag.
> > > > > > > >
> > > > > > > > So we know something that needs to be mapped by virtio core 
> > > > > > > > itself,
> > > > > > > > e.g the indirect page. Other than this, all the rest could be
> > > > > > > > pre-mapped.
> > > > > > > >
> > > > > > > > For vnet header, it could be mapped by virtio-net which could 
> > > > > > > > be still
> > > > > > > > treated as pre mapped DMA since it's not the virtio ring code.
> > > > > > > >
> > > > > > > > Anything I miss here?
> > > > > > >
> > > > > > > I guess, your understanding is that after the queue is reset, the 
> > > > > > > queue is used
> > > > > > > by xsk(AF_XDP), then all commits to this vq are premapped amd 
> > > > > > > address.
> > > > > > >
> > > > > > > This is ok for rx.
> > > > > > >
> > > > > > > But for tx, just like XDP TX, although vq is used by xsk, the 
> > > > > > > kernel also passes
> > > > > > > skb to it at the same time. It is shared.
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > >
> > > > > > > We can guarantee that the sg of the sgs submitted at one time 
> > > > > > > uses the premapped
> > > > > > > dma address or virtual address uniformly. It is not guaranteed 
> > > > > > > that all the sgs
> > > > > > > to the vq are uniform
> > > > > >
> > > > > > Sorry, I don't understand here. We can let virtio-net do the mapping
> > > > > > even for TX, then from the virtio_ring point of view, it's still
> > > > > > pre-mapped?
> > > > > >
> > > > >
> > > > > Yes, we can do this. My previous thought was to keep the skb path 
> > > > > unchanged.
> > >
> > > We can listen from Michael and others but I think it would be simpler.
> > >
> > > And we can even make the pre-mapped per driver. E.g for virtio-net we
> > > just let the virtio-net driver do the DMA mapping. This allows us to
> > > do a lot of optimizations (e.g page flip) as what other networking
> > > drivers did.
> > >
> > > > >
> > > > > Then we can make it clear that in the case of xsk, after completing 
> > > > > the queue
> > > > > reset, all the addresses submitted to virtio are the addresses of the 
> > > > > completed
> > > > > dma, including the skb case, the dma map operation must be completed 
> > > > > first.
> > > > >
> > > > > In this case, I feel like we can do without this patch set.
> > > >
> > > > I originally thought that use_dma_api could be reused, but I found that 
> > > > this is
> > > > not the case. The logic of sg_phys() does not meet our ideas. We still 
> > > > have a
> > > > separate flag.
> > >
> > > Just to make sure I understand here, for this flag you mean
> > >
> > > 1)