Re: [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK

Stefano Garzarella Fri, 13 Jan 2023 02:38:08 -0800

On Fri, Jan 13, 2023 at 11:03:17AM +0100, Eugenio Perez Martin wrote:

On Fri, Jan 13, 2023 at 10:51 AM Stefano Garzarella <sgarz...@redhat.com> wrote:


On Fri, Jan 13, 2023 at 09:19:00AM +0100, Eugenio Perez Martin wrote:
>On Fri, Jan 13, 2023 at 5:36 AM Jason Wang <jasow...@redhat.com> wrote:
>>
>> On Fri, Jan 13, 2023 at 1:25 AM Eugenio Pérez <epere...@redhat.com> wrote:
>> >
>> > To restore the device at the destination of a live migration we send the
>> > commands through control virtqueue. For a device to read CVQ it must
>> > have received the DRIVER_OK status bit.
>>
>> This probably requires the support from the parent driver and requires
>> some changes or fixes in the parent driver.
>>
>> Some drivers did:
>>
>> parent_set_status():
>> if (DRIVER_OK)
>>     if (queue_enable)
>>         write queue_enable to the device
>>
>> Examples are IFCVF or even vp_vdpa at least. MLX5 seems to be fine.
>>
>
>I don't get your point here. No device should start reading CVQ (or
>any other VQ) without having received DRIVER_OK.
>
>Some parent drivers do not support sending the queue enable command
>after DRIVER_OK, usually because they clean part of the state like the
>set by set_vring_base. Even vdpa_net_sim needs fixes here.
>
>But my understanding is that it should be supported so I consider it a
>bug. Especially after queue_reset patches. Is that what you mean?
>
>> >
>> > However this opens a window where the device could start receiving
>> > packets in rx queue 0 before it receives the RSS configuration. To avoid
>> > that, we will not send vring_enable until all configuration is used by
>> > the device.
>> >
>> > As a first step, run vhost_set_vring_ready for all vhost_net backend after
>> > all of them are started (with DRIVER_OK). This code should not affect
>> > vdpa.
>> >
>> > Signed-off-by: Eugenio Pérez <epere...@redhat.com>
>> > ---
>> >  hw/net/vhost_net.c | 17 ++++++++++++-----
>> >  1 file changed, 12 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> > index c4eecc6f36..3900599465 100644
>> > --- a/hw/net/vhost_net.c
>> > +++ b/hw/net/vhost_net.c
>> > @@ -399,6 +399,18 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
>> >          } else {
>> >              peer = qemu_get_peer(ncs, n->max_queue_pairs);
>> >          }
>> > +        r = vhost_net_start_one(get_vhost_net(peer), dev);
>> > +        if (r < 0) {
>> > +            goto err_start;
>> > +        }
>> > +    }
>> > +
>> > +    for (int j = 0; j < nvhosts; j++) {
>> > +        if (j < data_queue_pairs) {
>> > +            peer = qemu_get_peer(ncs, j);
>> > +        } else {
>> > +            peer = qemu_get_peer(ncs, n->max_queue_pairs);
>> > +        }
>>
>> I fail to understand why we need to change the vhost_net layer? This
>> is vhost-vDPA specific, so I wonder if we can limit the changes to e.g
>> vhost_vdpa_dev_start()?
>>
>
>The vhost-net layer explicitly calls vhost_set_vring_enable before
>vhost_dev_start, and this is exactly the behavior we want to avoid.
>Even if we make changes to vhost_dev, this change is still needed.

I'm working on something similar since I'd like to re-work the following
commit we merged just before 7.2 release:
     4daa5054c5 vhost: enable vrings in vhost_dev_start() for vhost-user
     devices

vhost-net wasn't the only one who enabled vrings independently, but it
was easy enough for others devices to avoid it and enable them in
vhost_dev_start().

Do you think can we avoid in some way this special behaviour of
vhost-net and enable the vrings in vhost_dev_start?


Actually looking forward to it :). If that gets merged before this
series, I think we could drop this patch.


I hope to send a RFC net week :-) let's see...


If I'm not wrong the enable/disable dance is used just by vhost-user
at the moment.


Yep, I got the same.

My doubts are that for vhost-user-net we enable only the firstVirtIONet->curr_queue_pairs queue IIUC. While for other devices (andmaybe also for vDPA devices) we enable all of them.

I need to figure out if it's safe to do this for vhost-user-net as well,otherwise I need to find a way to leave this behavior.


Maxime, could you give us some hints about the tests to use to check
that changes do not introduce regressions in vhost-user?


Yep, any help on how to test is very appreciated.

Thanks,
Stefano

Re: [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK

Reply via email to