On Mon, Jan 16, 2023 at 7:37 AM Jason Wang <jasow...@redhat.com> wrote: > > > 在 2023/1/13 16:19, Eugenio Perez Martin 写道: > > On Fri, Jan 13, 2023 at 5:36 AM Jason Wang <jasow...@redhat.com> wrote: > >> On Fri, Jan 13, 2023 at 1:25 AM Eugenio Pérez <epere...@redhat.com> wrote: > >>> To restore the device at the destination of a live migration we send the > >>> commands through control virtqueue. For a device to read CVQ it must > >>> have received the DRIVER_OK status bit. > >> This probably requires the support from the parent driver and requires > >> some changes or fixes in the parent driver. > >> > >> Some drivers did: > >> > >> parent_set_status(): > >> if (DRIVER_OK) > >> if (queue_enable) > >> write queue_enable to the device > >> > >> Examples are IFCVF or even vp_vdpa at least. MLX5 seems to be fine. > >> > > I don't get your point here. No device should start reading CVQ (or > > any other VQ) without having received DRIVER_OK. > > > If I understand the code correctly: > > For CVQ, we do SET_VRING_ENABLE before DRIVER_OK, that's fine. > > For datapath_vq, we do SET_VRING_ENABLE after DRIVER_OK, this requires > parent driver support (explained above) > > > > > > Some parent drivers do not support sending the queue enable command > > after DRIVER_OK, usually because they clean part of the state like the > > set by set_vring_base. Even vdpa_net_sim needs fixes here. > > > Yes, so the question is: > > Do we need another backend feature for this? (otherwise thing may break > silently) > > > > > > But my understanding is that it should be supported so I consider it a > > bug. > > > Probably, we need fine some proof in the spec, e.g in 3.1.1: > > """ > > 7.Perform device-specific setup, including discovery of virtqueues for > the device, optional per-bus setup, reading and possibly writing the > device’s virtio configuration space, and population of virtqueues. > 8.Set the DRIVER_OK status bit. At this point the device is “live”. > > """ > > So if my understanding is correct, "discovery of virtqueues for the > device" implies queue_enable here which is expected to be done before > DRIVER_OK. But it doesn't say anything regrading to the behaviour of > setting queue ready after DRIVER_OK. > > I'm not sure it's a real bug or not, may Michael and comment on this. >
Right, input on this topic would be really appreciated. > > > Especially after queue_reset patches. Is that what you mean? > > > We haven't supported queue_reset yet in Qemu. But it allows to write 1 > to queue_enable after DRIVER_OK for sure. > I was not clear, I meant in the emulated device. I'm testing this series with the proposal of _F_STATE. > > > > >>> However this opens a window where the device could start receiving > >>> packets in rx queue 0 before it receives the RSS configuration. To avoid > >>> that, we will not send vring_enable until all configuration is used by > >>> the device. > >>> > >>> As a first step, run vhost_set_vring_ready for all vhost_net backend after > >>> all of them are started (with DRIVER_OK). This code should not affect > >>> vdpa. > >>> > >>> Signed-off-by: Eugenio Pérez <epere...@redhat.com> > >>> --- > >>> hw/net/vhost_net.c | 17 ++++++++++++----- > >>> 1 file changed, 12 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > >>> index c4eecc6f36..3900599465 100644 > >>> --- a/hw/net/vhost_net.c > >>> +++ b/hw/net/vhost_net.c > >>> @@ -399,6 +399,18 @@ int vhost_net_start(VirtIODevice *dev, > >>> NetClientState *ncs, > >>> } else { > >>> peer = qemu_get_peer(ncs, n->max_queue_pairs); > >>> } > >>> + r = vhost_net_start_one(get_vhost_net(peer), dev); > >>> + if (r < 0) { > >>> + goto err_start; > >>> + } > >>> + } > >>> + > >>> + for (int j = 0; j < nvhosts; j++) { > >>> + if (j < data_queue_pairs) { > >>> + peer = qemu_get_peer(ncs, j); > >>> + } else { > >>> + peer = qemu_get_peer(ncs, n->max_queue_pairs); > >>> + } > >> I fail to understand why we need to change the vhost_net layer? This > >> is vhost-vDPA specific, so I wonder if we can limit the changes to e.g > >> vhost_vdpa_dev_start()? > >> > > The vhost-net layer explicitly calls vhost_set_vring_enable before > > vhost_dev_start, and this is exactly the behavior we want to avoid. > > Even if we make changes to vhost_dev, this change is still needed. > > > Note that the only user of vhost_set_vring_enable() is vhost-user where > the semantic is different: > > It uses that to changes the number of active queues: > > static int peer_attach(VirtIONet *n, int index) > > if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) { > => vhost_set_vring_enable(nc->peer, 1); > } > > This is not the semantic of vhost-vDPA that tries to be complaint with > virtio-spec. So I'm not sure how it can help here. > Right, but previous changes use enable callback to delay the enable of the datapath virtqueues. I'll try to fit the changes in virtio/vhost-vdpa though. Thanks! > > > > > And we want to explicitly enable CVQ first, which "only" vhost_net > > knows which is. > > > This should be known by net/vhost-vdpa.c. > > > > To perform that in vhost_vdpa_dev_start would require > > quirks, involving one or more of: > > * Ignore vq enable calls if the device is not the CVQ one. How to > > signal what is the CVQ? Can we trust it will be the last one for all > > kind of devices? > > * Enable queues that do not belong to the last vhost_dev from the enable > > call. > > * Enable the rest of the queues from the last enable in reverse order. > > * Intercalate the "net load" callback between enabling the last > > vhost_vdpa device and enabling the rest of devices. > > * Add an "enable priority" order? > > > Haven't had time in thinking through, but it would be better if we can > limit the changes in vhost-vdpa layer. E.g currently the > VHOST_VDPA_SET_VRING_ENABLE is done at vhost_dev_start(). > > Thanks > > > > > > Thanks! > > > >> Thanks > >> > >>> if (peer->vring_enable) { > >>> /* restore vring enable state */ > >>> @@ -408,11 +420,6 @@ int vhost_net_start(VirtIODevice *dev, > >>> NetClientState *ncs, > >>> goto err_start; > >>> } > >>> } > >>> - > >>> - r = vhost_net_start_one(get_vhost_net(peer), dev); > >>> - if (r < 0) { > >>> - goto err_start; > >>> - } > >>> } > >>> > >>> return 0; > >>> -- > >>> 2.31.1 > >>> >