On Mon, May 8, 2023 at 7:51 PM Eugenio Perez Martin <epere...@redhat.com> wrote: > > On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek <hre...@redhat.com> wrote: > > > > On 05.05.23 16:37, Hanna Czenczek wrote: > > > On 05.05.23 16:26, Eugenio Perez Martin wrote: > > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek <hre...@redhat.com> > > >> wrote: > > >>> (By the way, thanks for the explanations :)) > > >>> > > >>> On 05.05.23 11:03, Hanna Czenczek wrote: > > >>>> On 04.05.23 23:14, Stefan Hajnoczi wrote: > > >>> [...] > > >>> > > >>>>> I think it's better to change QEMU's vhost code > > >>>>> to leave stateful devices suspended (but not reset) across > > >>>>> vhost_dev_stop() -> vhost_dev_start(), maybe by introducing > > >>>>> vhost_dev_suspend() and vhost_dev_resume(). Have you thought about > > >>>>> this aspect? > > >>>> Yes and no; I mean, I haven’t in detail, but I thought this is what’s > > >>>> meant by suspending instead of resetting when the VM is stopped. > > >>> So, now looking at vhost_dev_stop(), one problem I can see is that > > >>> depending on the back-end, different operations it does will do > > >>> different things. > > >>> > > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(), > > >>> which for vDPA will suspend the device, but for vhost-user will > > >>> reset it > > >>> (if F_STATUS is there). > > >>> > > >>> It disables all vrings, which doesn’t mean stopping, but may be > > >>> necessary, too. (I haven’t yet really understood the use of disabled > > >>> vrings, I heard that virtio-net would have a need for it.) > > >>> > > >>> It then also stops all vrings, though, so that’s OK. And because this > > >>> will always do GET_VRING_BASE, this is actually always the same > > >>> regardless of transport. > > >>> > > >>> Finally (for this purpose), it resets the device status via > > >>> vhost_ops->vhost_reset_status(). This is only implemented on vDPA, and > > >>> this is what resets the device there. > > >>> > > >>> > > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does > > >>> so in .vhost_reset_status. It would seem better to me if vhost-user > > >>> would also reset the device only in .vhost_reset_status, not in > > >>> .vhost_dev_start. .vhost_dev_start seems precisely like the place to > > >>> run SUSPEND/RESUME. > > >>> > > >> I think the same. I just saw It's been proposed at [1]. > > >> > > >>> Another question I have (but this is basically what I wrote in my last > > >>> email) is why we even call .vhost_reset_status here. If the device > > >>> and/or all of the vrings are already stopped, why do we need to reset > > >>> it? Naïvely, I had assumed we only really need to reset the device if > > >>> the guest changes, so that a new guest driver sees a freshly > > >>> initialized > > >>> device. > > >>> > > >> I don't know why we didn't need to call it :). I'm assuming the > > >> previous vhost-user net did fine resetting vq indexes, using > > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex > > >> devices. > > >> > > >> The guest can reset the device, or write 0 to the PCI config status, > > >> at any time. How does virtiofs handle it, being stateful? > > > > > > Honestly a good question because virtiofsd implements neither > > > SET_STATUS nor RESET_DEVICE. I’ll have to investigate that. > > > > > > I think when the guest resets the device, SET_VRING_BASE always comes > > > along some way or another, so that’s how the vrings are reset. Maybe > > > the internal state is reset only following more high-level FUSE > > > commands like INIT. > > > > So a meeting and one session of looking-into-the-code later: > > > > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens > > to serve the purpose. (German is currently on that.) > > > > In our meeting, German said the reset would occur when the memory > > regions are changed, but I can’t see that in the code. > > That would imply that the status is reset when the guest's memory is > added or removed? > > > I think it only > > happens implicitly through the SET_VRING_BASE call, which resets the > > internal avail/used pointers. > > > > [This doesn’t seem different from libvhost-user, though, which > > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to > > reset the device on RESET_OWNER, but really doesn’t (its > > vu_reset_device_exec() function just disables all vrings, doesn’t reset > > or even stop them).] > > > > Consequently, the internal state is never reset. It would be cleared on > > a FUSE Destroy message, but if you just force-reset the system, the > > state remains into the next reboot. Not even FUSE Init clears it, which > > seems weird. It happens to work because it’s still the same filesystem, > > so the existing state fits, but it kind of seems dangerous to keep e.g. > > files open. I don’t think it’s really exploitable because everything > > still goes through the guest kernel, but, well. We should clear the > > state on Init, and probably also implement SET_STATUS and clear the > > state there. > > > > I see. That's in the line of assuming GET_VRING_BASE is the last > message received from qemu. >
Actually, does it prevent device recovery after a failure in migration? Is the same state set for the device? Thanks!