On Wed, Feb 07, 2024 at 11:18:54AM +0100, Kevin Wolf wrote:
Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben:
On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf <kw...@redhat.com> wrote:
>
> Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf <kw...@redhat.com> wrote:
> > >
> > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > status flag is set; with the current API of the kernel module, it is
> > > impossible to enable the opposite order in our block export code because
> > > userspace is not notified when a virtqueue is enabled.
> > >
> > > This requirement also mathces the normal initialisation order as done by
> > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > this.
> > >
> > > This changes vdpa-dev to use the normal order again and use the standard
> > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > used with vdpa-dev again after this fix.
> > >
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > Signed-off-by: Kevin Wolf <kw...@redhat.com>
> > > ---
> > >  hw/virtio/vdpa-dev.c   |  5 +----
> > >  hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > index eb9ecea83b..13e87f06f6 100644
> > > --- a/hw/virtio/vdpa-dev.c
> > > +++ b/hw/virtio/vdpa-dev.c
> > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
*vdev, Error **errp)
> > >
> > >      s->dev.acked_features = vdev->guest_features;
> > >
> > > -    ret = vhost_dev_start(&s->dev, vdev, false);
> > > +    ret = vhost_dev_start(&s->dev, vdev, true);
> > >      if (ret < 0) {
> > >          error_setg_errno(errp, -ret, "Error starting vhost");
> > >          goto err_guest_notifiers;
> > >      }
> > > -    for (i = 0; i < s->dev.nvqs; ++i) {
> > > -        vhost_vdpa_set_vring_ready(&s->vdpa, i);
> > > -    }
> > >      s->started = true;
> > >
> > >      /*
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 3a43beb312..c4574d56c5 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
unsigned idx)
> > >      return r;
> > >  }
> > >
> > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> > > +{
> > > +    struct vhost_vdpa *v = dev->opaque;
> > > +    unsigned int i;
> > > +    int ret;
> > > +
> > > +    for (i = 0; i < dev->nvqs; ++i) {
> > > +        ret = vhost_vdpa_set_vring_ready(v, i);
> > > +        if (ret < 0) {
> > > +            return ret;
> > > +        }
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > >                                         int fd)
> > >  {
> > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > >          .vhost_set_features = vhost_vdpa_set_features,
> > >          .vhost_reset_device = vhost_vdpa_reset_device,
> > >          .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > +        .vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > >          .vhost_get_config  = vhost_vdpa_get_config,
> > >          .vhost_set_config = vhost_vdpa_set_config,
> > >          .vhost_requires_shm_log = NULL,
> >
> > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > device in the destination of a live migration. To go back again to
> > this callback would cause the device to enable the dataplane before
> > virtqueues are configured again.
>
> Not that it makes a difference, but I don't think it would actually be
> going back. Even before your commit 6c482547, we were not making use of
> the generic callback but just called the function in a slightly
> different place (but less different than after commit 6c482547).
>
> But anyway... Why don't the other vhost backend need the same for
> vhost-net then? Do they just not support live migration?

They don't support control virtqueue. More specifically, control
virtqueue is handled directly in QEMU.

So the network device already has to special case vdpa instead of using
the same code for all vhost backends? :-/

> I don't know the code well enough to say where the problem is, but if
> vhost-vdpa networking code relies on the usual vhost operations not
> being implemented and bypasses VhostOps to replace it, that sounds like
> a design problem to me.

I don't follow this. What vhost operation is expected not to be implemented?

You were concerned about implementing .vhost_set_vring_enable in
vdpa_ops like my patch does. So it seems that the networking code
requires that it is not implemented?

On the other hand, for vhost-vdpa, the callback seems to be called in
exactly the right place where virtqueues need to be enabled, like for
other vhost devices.

> Maybe VhostOps needs a new operation to enable
> just a single virtqueue that can be used by the networking code instead?
>
> > How does VDUSE userspace knows how many queues should enable? Can't
> > the kernel perform the necessary actions after DRIVER_OK, like
> > configuring the kick etc?
>
> Not sure if I understand the question. The vdpa kernel interface always
> enables individual queues, so the VDUSE userspace will enable whatever
> queues it was asked to enable. The only restriction is that the queues
> need to be enabled before setting DRIVER_OK.
>
> The interface that enables all virtqueues at once seems to be just
> .vhost_set_vring_enable in QEMU.

It enables all virtqueues of the same vhost device in QEMU, but QEMU
creates one vhost_dev per each vhost-net virtqueue pair and another
one for CVQ. This goes back to the time where mq vhost-kernel net
devices were implemented by mergin many tap devices. net/vhost-vdpa.c
only enables the last one, which corresponds to CVQ, and then enables
the rest once all messages have been received.

So it's less about the granularity of .vhost_set_vring_enable, which
would just be right, but about the order in which it is called for the
different vhost_devs?

Can we influence the order in another way than just not implementing the
callback at all? I think net specific weirdness should be contained in
the net implementation and not affect generic vdpa code.

On the other hand, .vhost_set_vring_enable is also used for queue
reset (vhost_net_virtqueue_reset and vhost_net_virtqueue_restart). In
other words, it is called after the set DRIVER_OK. I guess it is fine
for VDUSE as long as it does not offer vring reset capabilities, but
future wise should we start going in that direction?

I don't actually know VDUSE very well, but that would probably make
sense.

Though for the moment, I just tried to simply attach a VDUSE device and
failed, so I'm just trying to fix the regression from your commit.

But kernels without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK should be
supported, right. Maybe we can add vhost_vdpa_set_vring_enable and
call vhost_dev_start with vrings parameters conditional to the feature
flag? That way the caller can enable it later or just enable all the
rings altogether.

Which specific caller do you mean here?

For vdpa-dev, I don't see any problem with just passing vrings=true
unconditionally. That is clearly the least problematic order even if
another order may in theory be allowed by the spec.

For vhost_net, I don't know. As far as I am concerned, there is no
reason to change its call and it could continue to pass vrings=false.

So vhost_dev_start() seems entirely unproblematic.

The only part that changes for net is that vhost_set_vring_enable() now
does something where it didn't do anything before. If that is a problem
for vdpa based network code, maybe we can just special case vdpa there
to not call vhost_ops->vhost_set_vring_enable? (Despite its name, it's a
network specific function; it should probably be called something like
vhost_net_set_vring_enable.)

With something like the below, net shouldn't see any difference any
more, right? (Happy to add a comment before it if you write it for me.)

+1 for this.
I proposed something similar, but I prefer your early return which is clearer:
https://lore.kernel.org/qemu-devel/xlk2pspyo4gwguxopm6k534nzjei5y3m6zbh2l6dagmuwpamtk@dtkgca6yppce/

Thanks,
Stefano


Kevin

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..fb6d13bd69 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -543,6 +543,10 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)

    nc->vring_enable = enable;

+    if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        return 0;
+    }
+
    if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
        return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
    }



Reply via email to