On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek <hre...@redhat.com> wrote: > > On 05.05.23 16:37, Hanna Czenczek wrote: > > On 05.05.23 16:26, Eugenio Perez Martin wrote: > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek <hre...@redhat.com> > >> wrote: > >>> (By the way, thanks for the explanations :)) > >>> > >>> On 05.05.23 11:03, Hanna Czenczek wrote: > >>>> On 04.05.23 23:14, Stefan Hajnoczi wrote: > >>> [...] > >>> > >>>>> I think it's better to change QEMU's vhost code > >>>>> to leave stateful devices suspended (but not reset) across > >>>>> vhost_dev_stop() -> vhost_dev_start(), maybe by introducing > >>>>> vhost_dev_suspend() and vhost_dev_resume(). Have you thought about > >>>>> this aspect? > >>>> Yes and no; I mean, I haven’t in detail, but I thought this is what’s > >>>> meant by suspending instead of resetting when the VM is stopped. > >>> So, now looking at vhost_dev_stop(), one problem I can see is that > >>> depending on the back-end, different operations it does will do > >>> different things. > >>> > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(), > >>> which for vDPA will suspend the device, but for vhost-user will > >>> reset it > >>> (if F_STATUS is there). > >>> > >>> It disables all vrings, which doesn’t mean stopping, but may be > >>> necessary, too. (I haven’t yet really understood the use of disabled > >>> vrings, I heard that virtio-net would have a need for it.) > >>> > >>> It then also stops all vrings, though, so that’s OK. And because this > >>> will always do GET_VRING_BASE, this is actually always the same > >>> regardless of transport. > >>> > >>> Finally (for this purpose), it resets the device status via > >>> vhost_ops->vhost_reset_status(). This is only implemented on vDPA, and > >>> this is what resets the device there. > >>> > >>> > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does > >>> so in .vhost_reset_status. It would seem better to me if vhost-user > >>> would also reset the device only in .vhost_reset_status, not in > >>> .vhost_dev_start. .vhost_dev_start seems precisely like the place to > >>> run SUSPEND/RESUME. > >>> > >> I think the same. I just saw It's been proposed at [1]. > >> > >>> Another question I have (but this is basically what I wrote in my last > >>> email) is why we even call .vhost_reset_status here. If the device > >>> and/or all of the vrings are already stopped, why do we need to reset > >>> it? Naïvely, I had assumed we only really need to reset the device if > >>> the guest changes, so that a new guest driver sees a freshly > >>> initialized > >>> device. > >>> > >> I don't know why we didn't need to call it :). I'm assuming the > >> previous vhost-user net did fine resetting vq indexes, using > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex > >> devices. > >> > >> The guest can reset the device, or write 0 to the PCI config status, > >> at any time. How does virtiofs handle it, being stateful? > > > > Honestly a good question because virtiofsd implements neither > > SET_STATUS nor RESET_DEVICE. I’ll have to investigate that. > > > > I think when the guest resets the device, SET_VRING_BASE always comes > > along some way or another, so that’s how the vrings are reset. Maybe > > the internal state is reset only following more high-level FUSE > > commands like INIT. > > So a meeting and one session of looking-into-the-code later: > > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens > to serve the purpose. (German is currently on that.) > > In our meeting, German said the reset would occur when the memory > regions are changed, but I can’t see that in the code.
That would imply that the status is reset when the guest's memory is added or removed? > I think it only > happens implicitly through the SET_VRING_BASE call, which resets the > internal avail/used pointers. > > [This doesn’t seem different from libvhost-user, though, which > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to > reset the device on RESET_OWNER, but really doesn’t (its > vu_reset_device_exec() function just disables all vrings, doesn’t reset > or even stop them).] > > Consequently, the internal state is never reset. It would be cleared on > a FUSE Destroy message, but if you just force-reset the system, the > state remains into the next reboot. Not even FUSE Init clears it, which > seems weird. It happens to work because it’s still the same filesystem, > so the existing state fits, but it kind of seems dangerous to keep e.g. > files open. I don’t think it’s really exploitable because everything > still goes through the guest kernel, but, well. We should clear the > state on Init, and probably also implement SET_STATUS and clear the > state there. > I see. That's in the line of assuming GET_VRING_BASE is the last message received from qemu. Thanks! _______________________________________________ Virtio-fs mailing list Virtio-fs@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs