On Fri, May 05, 2023 at 11:03:16AM +0200, Hanna Czenczek wrote: > On 04.05.23 23:14, Stefan Hajnoczi wrote: > > On Thu, 4 May 2023 at 13:39, Hanna Czenczek <hre...@redhat.com> wrote: > > > On 11.04.23 17:05, Hanna Czenczek wrote: > > > > > > [...] > > > > > > > Hanna Czenczek (4): > > > > vhost: Re-enable vrings after setting features > > > > vhost-user: Interface for migration state transfer > > > > vhost: Add high-level state save/load functions > > > > vhost-user-fs: Implement internal migration > > > I’m trying to write v2, and my intention was to keep the code > > > conceptually largely the same, but include in the documentation change > > > thoughts and notes on how this interface is to be used in the future, > > > when e.g. vDPA “extensions” come over to vhost-user. My plan was to, > > > based on that documentation, discuss further. > > > > > > But now I’m struggling to even write that documentation because it’s not > > > clear to me what exactly the result of the discussion was, so I need to > > > stop even before that. > > > > > > So as far as I understand, we need/want SUSPEND/RESUME for two reasons: > > > 1. As a signal to the back-end when virt queues are no longer to be > > > processed, so that it is clear that it will not do that when asked for > > > migration state. > > > 2. Stateful devices that support SET_STATUS receive a status of 0 when > > > the VM is stopped, which supposedly resets the internal state. While > > > suspended, device state is frozen, so as far as I understand, SUSPEND > > > before SET_STATUS would have the status change be deferred until RESUME. > > I'm not sure about SUSPEND -> SET_STATUS 0 -> RESUME. I guess the > > device would be reset right away and it would either remain suspended > > or be resumed as part of reset :). > > > > Unfortunately the concepts of SUSPEND/RESUME and the Device Status > > Field are orthogonal and there is no spec that explains how they > > interact. > > Ah, OK. So I guess it’s up to the implementation to decide whether the > virtio device status counts as part of the “configuration” that “[it] must > not change”. > > > > I don’t want to hang myself up on 2 because it doesn’t really seem > > > important to this series, but: Why does a status of 0 reset the internal > > > state? [Note: This is all virtio_reset() seems to do, set the status to > > > 0.] The vhost-user specification only points to the virtio > > > specification, which doesn’t say anything to that effect. Instead, an > > > explicit device reset is mentioned, which would be > > > VHOST_USER_RESET_DEVICE, i.e. something completely different. Because > > > RESET_DEVICE directly contradicts SUSPEND’s description, I would like to > > > think that invoking RESET_DEVICE on a SUSPEND-ed device is just invalid. > > The vhost-user protocol didn't have the concept of the VIRTIO Device > > Status Field until SET_STATUS was added. > > > > In order to participate in the VIRTIO device lifecycle to some extent, > > the pre-SET_STATUS vhost-user protocol relied on vhost-user-specific > > messages like RESET_DEVICE. > > > > At the VIRTIO level, devices are reset by setting the Device Status > > Field to 0. > > (I didn’t find this in the virtio specification until today, turns out it’s > under 4.1.4.3 “Common configuration structure layout”, not under 2.1 “Device > Status Field”, where I was looking.) > > > All state is lost and the Device Initialization process > > must be followed to make the device operational again. > > > > Existing vhost-user backends don't implement SET_STATUS 0 (it's new). > > > > It's messy and not your fault. I think QEMU should solve this by > > treating stateful devices differently from non-stateful devices. That > > way existing vhost-user backends continue to work and new stateful > > devices can also be supported. > > It’s my understanding that SET_STATUS 0/RESET_DEVICE is problematic for > stateful devices. In a previous email, you wrote that these should > implement SUSPEND+RESUME so qemu can use those instead. But those are > separate things, so I assume we just use SET_STATUS 0 when stopping the VM > because this happens to also stop processing vrings as a side effect?
SET_STATUS 0 doesn't do anything in most existing vhost-user backends and QEMU's vhost code doesn't rely on it doing anything. It was added as an optimization hint for DPDK's vhost-user-net implementation without noticing that it breaks stateful devices (see commit 923b8921d210). > > I.e. I understand “treating stateful devices differently” to mean that qemu > should use SUSPEND+RESUME instead of SET_STATUS 0 when the back-end supports > it, and stateful back-ends should support it. > > > > Is it that a status 0 won’t explicitly reset the internal state, but > > > because it does mean that the driver is unbound, the state should > > > implicitly be reset? > > I think the fundamental problem is that transports like virtio-pci put > > registers back in their initialization state upon reset, so internal > > state is lost. > > > > The VIRTIO spec does not go into detail on device state across reset > > though, so I don't think much more can be said about the semantics. > > > > > Anyway. 1 seems to be the relevant point for migration. As far as I > > > understand, currently, a vhost-user back-end has no way of knowing when > > > to stop processing virt queues. Basically, rings can only transition > > > from stopped to started, but not vice versa. The vhost-user > > > specification has this bit: “Once the source has finished migration, > > > rings will be stopped by the source. No further update must be done > > > before rings are restarted.” It just doesn’t say how the front-end lets > > > the back-end know that the rings are (to be) stopped. So this seems > > > like a pre-existing problem for stateless migration. Unless this is > > > communicated precisely by setting the device status to 0? > > No, my understanding is different. The vhost-user spec says the > > backend must "stop [the] ring upon receiving > > ``VHOST_USER_GET_VRING_BASE``". > > Yes, I missed that part! > > > The "Ring states" section goes into > > more detail and adds the concept of enabled/disabled too. > > > > SUSPEND is stronger than GET_VRING_BASE though. GET_VRING_BASE only > > applies to a single virtqueue, whereas SUSPEND acts upon the entire > > device, including non-virtqueue aspects like Configuration Change > > Notifications (VHOST_USER_BACKEND_CONFIG_CHANGE_MSG). > > > > You can approximate SUSPEND today by sending GET_VRING_BASE for all > > virtqueues. I think in practice this does fully stop the device even > > if the spec doesn't require it. > > > > If we want minimal changes to vhost-user, then we could rely on > > GET_VRING_BASE to suspend and SET_VRING_ENABLE to resume. And > > SET_STATUS 0 must not be sent so that the device's state is not lost. > > So you mean that we’d use SUSPEND instead of SET_STATUS 0, but because we > have no SUSPEND, we’d ensure that GET_VRING_BASE is/was called on all > vrings? Yes. I prefer adding SUSPEND+RESUME to vhost-user, but if we were limited to today's vhost-user commands, then relying on GET_VRING_BASE and skipping SET_STATUS calls for vhost_dev_start/stop() would come close to achieving the behavior needed by stateful backends. Stefan
signature.asc
Description: PGP signature
_______________________________________________ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs