On 04.10.23 13:15, Stefan Hajnoczi wrote:
On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hre...@redhat.com> wrote:
On 04.10.23 03:45, Stefan Hajnoczi wrote:
The VHOST_USER_RESET_OWNER message is deprecated in the spec:
This is no longer used. Used to be sent to request disabling all
rings, but some back-ends interpreted it to also discard connection
state (this interpretation would lead to bugs). It is recommended
that back-ends either ignore this message, or use it to disable all
rings.
According to the spec, it is then indeed better to not call it in
vhost_user_reset_device, because it seems like it would be interpreted
as something completely different.
However, between the three back-end implementations of vhost-user I know
of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none
implement RESET_DEVICE. libvhost-user and DPDK do implement
RESET_OWNER, though, and they both do it by resetting the device, not by
disabling any vring. The vhost crate also implements RESET_OWNER, but
it doesn’t do anything but forward it as such to the actual device
implementation (virtiofsd doesn’t implement this function, so ignores
it). It does document that it would disable all vrings, but does so in
the past and has marked it deprecated (ever since the method was
introduced in the fourth commit to the repository, making it extremely
unlikely that anyone would implement it).
Hi Hanna,
vhost-user-backend still seems to reset all vhost-user protocol state,
making RESET_OWNER unusable:
https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230
Have I missed something?
No, I missed that. I overlooked that when grepping for reset_owner.
This implementation kind of does follow the original pre-2015
description of RESET_OWNER, but I can’t believe this code originated
from pre-2015, which makes it really weird. It’s strange that in a
commit from April 2019 the vhost crate marked the function as (“no
longer used”), and then it was implemented in September of 2019, notably
as something completely different than “Used to be sent to request
disabling all rings”.
Another thing I noticed is that while libvhost-user does call the
function vu_reset_device_exec(), what it does is indeed disable all
vrings instead of resetting anything, i.e. what the spec says (and what
I didn’t think anyone did).
DPDK interestingly does do a reset, and this includes clearing
protocol_features (in reset_device()).
So two things: First, I’d prefer (slightly) for the commit message to
mention that while RESET_OWNER would not be usable for a reset if
everything were according to spec, the main problem to me seems that
RESET_OWNER was never clearly defined before being deprecated, so
different implementations do different things regardless of what the
spec says now, which means it’s basically burned and no front-end may
ever issue it at all lest it wants to get the back-end into an
implementation-defined state.
Second, I wonder what exactly you mean by saying that RESET_OWNER
resetting all vhost-user protocol state were to make the command
unusable for resetting. The vhost-user-backend implementation doesn’t
clear all state, but resets only three things: The @owned field (which I
don’t think is really used outside of
vhost/src/vhost_user/dummy_slave.rs (this name is begging for a
conscious language overhaul...), which appears not really important?),
the virtio features (this I would expect any device reset to do), and
the vhost-user protocol features. Now, yes, clearing the vhost-user
protocol features doesn’t seem like something that should be done just
because of a device reset. However, note that DPDK’s reset_device() does
this, too. What I’m getting at is that we don’t have a clear definition
of what RESET_DEVICE is supposed to do either, i.e. it doesn’t
explicitly say that protocol features are not to be reset. Sure, that
may be obvious, but we should clarify this so that if e.g. DPDK is to
implement RESET_DEVICE, they will take care not use its reset_device()
implementation, which does reset protocol_features.
(Also, now that I look at RESET_DEVICE – why does it say that it
disables all vrings? (Has done so since its introduction.) Is this
something that qemu expects and will handle, i.e. that after a
guest-initiated reset, the rings are enabled when the guest wants to use
the device again? Also, it doesn’t say the rings are stopped, so
basically, as it is *right now* (where we only have three ring states,
stopped, started+disabled, started+enabled), disabling vrings implicitly
means they must still be started, because they can’t be disabled when
stopped. I’m going to change that, but as it reads right now,
RESET_DEVICE does not seem like the reset we want. We should really get
to fix that, too, before back-ends start to actually implement it.)
Hanna