Re: [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK

Eugenio Perez Martin Mon, 16 Jan 2023 08:17:47 -0800

On Mon, Jan 16, 2023 at 7:37 AM Jason Wang <jasow...@redhat.com> wrote:
>
>
> 在 2023/1/13 16:19, Eugenio Perez Martin 写道:
> > On Fri, Jan 13, 2023 at 5:36 AM Jason Wang <jasow...@redhat.com> wrote:
> >> On Fri, Jan 13, 2023 at 1:25 AM Eugenio Pérez <epere...@redhat.com> wrote:
> >>> To restore the device at the destination of a live migration we send the
> >>> commands through control virtqueue. For a device to read CVQ it must
> >>> have received the DRIVER_OK status bit.
> >> This probably requires the support from the parent driver and requires
> >> some changes or fixes in the parent driver.
> >>
> >> Some drivers did:
> >>
> >> parent_set_status():
> >> if (DRIVER_OK)
> >>      if (queue_enable)
> >>          write queue_enable to the device
> >>
> >> Examples are IFCVF or even vp_vdpa at least. MLX5 seems to be fine.
> >>
> > I don't get your point here. No device should start reading CVQ (or
> > any other VQ) without having received DRIVER_OK.
>
>
> If I understand the code correctly:
>
> For CVQ, we do SET_VRING_ENABLE before DRIVER_OK, that's fine.
>
> For datapath_vq, we do SET_VRING_ENABLE after DRIVER_OK, this requires
> parent driver support (explained above)
>
>
> >
> > Some parent drivers do not support sending the queue enable command
> > after DRIVER_OK, usually because they clean part of the state like the
> > set by set_vring_base. Even vdpa_net_sim needs fixes here.
>
>
> Yes, so the question is:
>
> Do we need another backend feature for this? (otherwise thing may break
> silently)
>
>
> >
> > But my understanding is that it should be supported so I consider it a
> > bug.
>
>
> Probably, we need fine some proof in the spec, e.g in 3.1.1:
>
> """
>
> 7.Perform device-specific setup, including discovery of virtqueues for
> the device, optional per-bus setup, reading and possibly writing the
> device’s virtio configuration space, and population of virtqueues.
> 8.Set the DRIVER_OK status bit. At this point the device is “live”.
>
> """
>
> So if my understanding is correct, "discovery of virtqueues for the
> device" implies queue_enable here which is expected to be done before
> DRIVER_OK. But it doesn't say anything regrading to the behaviour of
> setting queue ready after DRIVER_OK.
>
> I'm not sure it's a real bug or not, may Michael and comment on this.
>


Right, input on this topic would be really appreciated.

>
> >   Especially after queue_reset patches. Is that what you mean?
>
>
> We haven't supported queue_reset yet in Qemu. But it allows to write 1
> to queue_enable after DRIVER_OK for sure.
>

I was not clear, I meant in the emulated device. I'm testing this
series with the proposal of _F_STATE.

>
> >
> >>> However this opens a window where the device could start receiving
> >>> packets in rx queue 0 before it receives the RSS configuration. To avoid
> >>> that, we will not send vring_enable until all configuration is used by
> >>> the device.
> >>>
> >>> As a first step, run vhost_set_vring_ready for all vhost_net backend after
> >>> all of them are started (with DRIVER_OK). This code should not affect
> >>> vdpa.
> >>>
> >>> Signed-off-by: Eugenio Pérez <epere...@redhat.com>
> >>> ---
> >>>   hw/net/vhost_net.c | 17 ++++++++++++-----
> >>>   1 file changed, 12 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>> index c4eecc6f36..3900599465 100644
> >>> --- a/hw/net/vhost_net.c
> >>> +++ b/hw/net/vhost_net.c
> >>> @@ -399,6 +399,18 @@ int vhost_net_start(VirtIODevice *dev, 
> >>> NetClientState *ncs,
> >>>           } else {
> >>>               peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >>>           }
> >>> +        r = vhost_net_start_one(get_vhost_net(peer), dev);
> >>> +        if (r < 0) {
> >>> +            goto err_start;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    for (int j = 0; j < nvhosts; j++) {
> >>> +        if (j < data_queue_pairs) {
> >>> +            peer = qemu_get_peer(ncs, j);
> >>> +        } else {
> >>> +            peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >>> +        }
> >> I fail to understand why we need to change the vhost_net layer? This
> >> is vhost-vDPA specific, so I wonder if we can limit the changes to e.g
> >> vhost_vdpa_dev_start()?
> >>
> > The vhost-net layer explicitly calls vhost_set_vring_enable before
> > vhost_dev_start, and this is exactly the behavior we want to avoid.
> > Even if we make changes to vhost_dev, this change is still needed.
>
>
> Note that the only user of vhost_set_vring_enable() is vhost-user where
> the semantic is different:
>
> It uses that to changes the number of active queues:
>
> static int peer_attach(VirtIONet *n, int index)
>
>          if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
> =>      vhost_set_vring_enable(nc->peer, 1);
>      }
>
> This is not the semantic of vhost-vDPA that tries to be complaint with
> virtio-spec. So I'm not sure how it can help here.
>

Right, but previous changes use enable callback to delay the enable of
the datapath virtqueues. I'll try to fit the changes in
virtio/vhost-vdpa though.

Thanks!

>
> >
> > And we want to explicitly enable CVQ first, which "only" vhost_net
> > knows which is.
>
>
> This should be known by net/vhost-vdpa.c.
>
>
> > To perform that in vhost_vdpa_dev_start would require
> > quirks, involving one or more of:
> > * Ignore vq enable calls if the device is not the CVQ one. How to
> > signal what is the CVQ? Can we trust it will be the last one for all
> > kind of devices?
> > * Enable queues that do not belong to the last vhost_dev from the enable 
> > call.
> > * Enable the rest of the queues from the last enable in reverse order.
> > * Intercalate the "net load" callback between enabling the last
> > vhost_vdpa device and enabling the rest of devices.
> > * Add an "enable priority" order?
>
>
> Haven't had time in thinking through, but it would be better if we can
> limit the changes in vhost-vdpa layer. E.g currently the
> VHOST_VDPA_SET_VRING_ENABLE is done at vhost_dev_start().
>
> Thanks
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>>           if (peer->vring_enable) {
> >>>               /* restore vring enable state */
> >>> @@ -408,11 +420,6 @@ int vhost_net_start(VirtIODevice *dev, 
> >>> NetClientState *ncs,
> >>>                   goto err_start;
> >>>               }
> >>>           }
> >>> -
> >>> -        r = vhost_net_start_one(get_vhost_net(peer), dev);
> >>> -        if (r < 0) {
> >>> -            goto err_start;
> >>> -        }
> >>>       }
> >>>
> >>>       return 0;
> >>> --
> >>> 2.31.1
> >>>
>

Re: [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK

Reply via email to