On Sat, Apr 30, 2022 at 10:07 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > On 4/28/2022 7:30 PM, Jason Wang wrote: > > On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei....@oracle.com> wrote: > >> > >> > >> On 4/27/2022 1:38 AM, Jason Wang wrote: > >>> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei....@oracle.com> wrote: > >>>> > >>>> On 4/26/2022 9:28 PM, Jason Wang wrote: > >>>>> 在 2022/3/30 14:33, Si-Wei Liu 写道: > >>>>>> Hi, > >>>>>> > >>>>>> This patch series attempt to fix a few issues in vhost-vdpa > >>>>>> multiqueue functionality. > >>>>>> > >>>>>> Patch #1 is the formal submission for RFC patch in: > >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c23...@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$ > >>>>>> > >>>>>> Patch #2 and #3 were taken from a previous patchset posted on > >>>>>> qemu-devel: > >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-epere...@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$ > >>>>>> > >>>>>> albeit abandoned, two patches in that set turn out to be useful for > >>>>>> patch #4, which is to fix a QEMU crash due to race condition. > >>>>>> > >>>>>> Patch #5 through #7 are obviously small bug fixes. Please find the > >>>>>> description of each in the commit log. > >>>>>> > >>>>>> Thanks, > >>>>>> -Siwei > >>>>> Hi Si Wei: > >>>>> > >>>>> I think we need another version of this series? > >>>> Hi Jason, > >>>> > >>>> Apologies for the long delay. I was in the middle of reworking the patch > >>>> "virtio: don't read pending event on host notifier if disabled", but > >>>> found out that it would need quite some code change for the userspace > >>>> fallback handler to work properly (for the queue no. change case > >>>> specifically). > >>> We probably need this fix for -stable, so I wonder if we can have a > >>> workaround first and do refactoring on top? > >> Hmmm, a nasty fix but may well address the segfault is something like this: > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index 8ca0b80..3ac93a4 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct > >> VirtIODevice *vdev, uint8_t status) > >> int i; > >> uint8_t queue_status; > >> > >> + if (n->status_pending) > >> + return; > >> + > >> + n->status_pending = true; > >> virtio_net_vnet_endian_status(n, status); > >> virtio_net_vhost_status(n, status); > >> > >> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct > >> VirtIODevice *vdev, uint8_t status) > >> } > >> } > >> } > >> + n->status_pending = false; > >> } > >> > >> static void virtio_net_set_link_status(NetClientState *nc) > >> diff --git a/include/hw/virtio/virtio-net.h > >> b/include/hw/virtio/virtio-net.h > >> index eb87032..95efea8 100644 > >> --- a/include/hw/virtio/virtio-net.h > >> +++ b/include/hw/virtio/virtio-net.h > >> @@ -216,6 +216,7 @@ struct VirtIONet { > >> VirtioNetRssData rss_data; > >> struct NetRxPkt *rx_pkt; > >> struct EBPFRSSContext ebpf_rss; > >> + bool status_pending; > >> }; > >> > >> void virtio_net_set_netclient_name(VirtIONet *n, const char *name, > >> > >> To be honest, I am not sure if this is worth a full blown fix to make it > >> completely work. Without applying vq suspend patch (the one I posted in > >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a733...@oracle.com/__;!!ACWV5N9M2RV99hQ!L4qque3YpPr-CGp12NrNdMMT1HROfEY_Juw2vnfZXHjOhtT0XJCR9GB8cvWEbJL9Aeh-WhDogBVArJn91P0$ > >> ), > >> it's very hard for me to effectively verify my code change - it's very > >> easy for the guest vq index to be out of sync if not stopping the vq > >> once the vhost is up and running (I tested it with repeatedly set_link > >> off and on). > > Can we test via vmstop? > Yes, of coz, that's where the segfault happened. The tight loop of > set_link on/off doesn't even work for the single queue case, hence > that's why I doubted it ever worked for vhost-vdpa.
Right, this is something we need to check. Set_link should stop the vhost device anyhow, otherwise it should be a bug. > > > > >> I am not sure if there's real chance we can run into issue > >> in practice due to the incomplete fix, if we don't fix the vq > >> stop/suspend issue first. Anyway I will try, as other use case e.g, live > >> migration is likely to get stumbled on it, too. > > Ok, so I think we probably don't need the "nasty" fix above. Let's fix > > it with the issue of stop/resume. > Ok, then does below tentative code change suffice the need? i.e. it > would fail the request of changing queue pairs when the vhost-vdpa > backend falls back to the userspace handler, but it's probably the > easiest way to fix the vmstop segfault. Probably, let's see. Thanks > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index ed231f9..8ba9f09 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -1177,6 +1177,7 @@ static int virtio_net_handle_mq(VirtIONet *n, > uint8_t cmd, > struct virtio_net_ctrl_mq mq; > size_t s; > uint16_t queue_pairs; > + NetClientState *nc = qemu_get_queue(n->nic); > > s = iov_to_buf(iov, iov_cnt, 0, &mq, sizeof(mq)); > if (s != sizeof(mq)) { > @@ -1196,6 +1197,13 @@ static int virtio_net_handle_mq(VirtIONet *n, > uint8_t cmd, > return VIRTIO_NET_ERR; > } > > + /* avoid changing the number of queue_pairs for vdpa device in > + * userspace handler. > + * TODO: get userspace fallback to work with future patch */ > + if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > + return VIRTIO_NET_ERR; > + } > + > n->curr_queue_pairs = queue_pairs; > /* stop the backend before changing the number of queue_pairs to > avoid handling a > * disabled queue */ > > Thanks, > -Siwei > > > > Thanks > > > >> -Siwei > >> > >> > >>>> I have to drop it from the series and posted it later on > >>>> when ready. Will post a v2 with relevant patches removed. > >>> Thanks > >>> > >>>> Regards, > >>>> -Siwei > >>>> > >>>>> Thanks > >>>>> > >>>>> > >>>>>> --- > >>>>>> > >>>>>> Eugenio Pérez (2): > >>>>>> virtio-net: Fix indentation > >>>>>> virtio-net: Only enable userland vq if using tap backend > >>>>>> > >>>>>> Si-Wei Liu (5): > >>>>>> virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa > >>>>>> virtio: don't read pending event on host notifier if disabled > >>>>>> vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa > >>>>>> vhost-net: fix improper cleanup in vhost_net_start > >>>>>> vhost-vdpa: backend feature should set only once > >>>>>> > >>>>>> hw/net/vhost_net.c | 4 +++- > >>>>>> hw/net/virtio-net.c | 25 +++++++++++++++++++++---- > >>>>>> hw/virtio/vhost-vdpa.c | 2 +- > >>>>>> hw/virtio/virtio-bus.c | 3 ++- > >>>>>> hw/virtio/virtio.c | 21 +++++++++++++-------- > >>>>>> include/hw/virtio/virtio.h | 2 ++ > >>>>>> net/vhost-vdpa.c | 4 +++- > >>>>>> 7 files changed, 45 insertions(+), 16 deletions(-) > >>>>>> >