On 04.10.23 13:15, Stefan Hajnoczi wrote:
On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hre...@redhat.com> wrote:
On 04.10.23 03:45, Stefan Hajnoczi wrote:
The VHOST_USER_RESET_OWNER message is deprecated in the spec:

     This is no longer used. Used to be sent to request disabling all
     rings, but some back-ends interpreted it to also discard connection
     state (this interpretation would lead to bugs).  It is recommended
     that back-ends either ignore this message, or use it to disable all
     rings.
According to the spec, it is then indeed better to not call it in
vhost_user_reset_device, because it seems like it would be interpreted
as something completely different.

However, between the three back-end implementations of vhost-user I know
of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none
implement RESET_DEVICE.  libvhost-user and DPDK do implement
RESET_OWNER, though, and they both do it by resetting the device, not by
disabling any vring.  The vhost crate also implements RESET_OWNER, but
it doesn’t do anything but forward it as such to the actual device
implementation (virtiofsd doesn’t implement this function, so ignores
it).  It does document that it would disable all vrings, but does so in
the past and has marked it deprecated (ever since the method was
introduced in the fourth commit to the repository, making it extremely
unlikely that anyone would implement it).
Hi Hanna,
vhost-user-backend still seems to reset all vhost-user protocol state,
making RESET_OWNER unusable:
https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230

Have I missed something?

No, I missed that.  I overlooked that when grepping for reset_owner.

This implementation kind of does follow the original pre-2015 description of RESET_OWNER, but I can’t believe this code originated from pre-2015, which makes it really weird.  It’s strange that in a commit from April 2019 the vhost crate marked the function as (“no longer used”), and then it was implemented in September of 2019, notably as something completely different than “Used to be sent to request disabling all rings”.

Another thing I noticed is that while libvhost-user does call the function vu_reset_device_exec(), what it does is indeed disable all vrings instead of resetting anything, i.e. what the spec says (and what I didn’t think anyone did).

DPDK interestingly does do a reset, and this includes clearing protocol_features (in reset_device()).

So two things: First, I’d prefer (slightly) for the commit message to mention that while RESET_OWNER would not be usable for a reset if everything were according to spec, the main problem to me seems that RESET_OWNER was never clearly defined before being deprecated, so different implementations do different things regardless of what the spec says now, which means it’s basically burned and no front-end may ever issue it at all lest it wants to get the back-end into an implementation-defined state.

Second, I wonder what exactly you mean by saying that RESET_OWNER resetting all vhost-user protocol state were to make the command unusable for resetting.  The vhost-user-backend implementation doesn’t clear all state, but resets only three things: The @owned field (which I don’t think is really used outside of vhost/src/vhost_user/dummy_slave.rs (this name is begging for a conscious language overhaul...), which appears not really important?), the virtio features (this I would expect any device reset to do), and the vhost-user protocol features.  Now, yes, clearing the vhost-user protocol features doesn’t seem like something that should be done just because of a device reset. However, note that DPDK’s reset_device() does this, too.  What I’m getting at is that we don’t have a clear definition of what RESET_DEVICE is supposed to do either, i.e. it doesn’t explicitly say that protocol features are not to be reset.  Sure, that may be obvious, but we should clarify this so that if e.g. DPDK is to implement RESET_DEVICE, they will take care not use its reset_device() implementation, which does reset protocol_features.

(Also, now that I look at RESET_DEVICE – why does it say that it disables all vrings?  (Has done so since its introduction.)  Is this something that qemu expects and will handle, i.e. that after a guest-initiated reset, the rings are enabled when the guest wants to use the device again?  Also, it doesn’t say the rings are stopped, so basically, as it is *right now* (where we only have three ring states, stopped, started+disabled, started+enabled), disabling vrings implicitly means they must still be started, because they can’t be disabled when stopped.  I’m going to change that, but as it reads right now, RESET_DEVICE does not seem like the reset we want. We should really get to fix that, too, before back-ends start to actually implement it.)

Hanna


Reply via email to