poll_end() when removing AioHandler

Hanna Czenczek Tue, 23 Jan 2024 03:16:37 -0800

On 22.01.24 18:52, Hanna Czenczek wrote:

On 22.01.24 18:41, Hanna Czenczek wrote:
On 05.01.24 15:30, Fiona Ebner wrote:
Am 05.01.24 um 14:43 schrieb Fiona Ebner:
Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
On 1/3/24 12:40, Fiona Ebner wrote:
I'm happy to report that I cannot reproduce the CPU-usage-spikeissuewith the patch, but I did run into an assertion failure whentrying toverify that it fixes my original stuck-guest-IO issue. See belowfor thebacktrace [0]. Hanna wrote inhttps://issues.redhat.com/browse/RHEL-3934
I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
after the virtio_queue_aio_attach_host_notifier(vq, ctx) call,becauseboth virtio-scsi’s and virtio-blk’s .handle_output()implementationsacquire the device’s context, so this should be directlycallable from
any context.
I guess this is not true anymore now that the AioContext locking was
removed?
Good point and, in fact, even before it was much safer to use
virtio_queue_notify() instead. Not only does it use the eventnotifier
handler, but it also calls it in the right thread/AioContext just by
doing event_notifier_set().
But with virtio_queue_notify() using the event notifier, the
CPU-usage-spike issue is present:
Back to the CPU-usage-spike issue: I experimented around and itdoesn'tseem to matter whether I notify the virt queue before or afterattaching
the notifiers. But there's another functional difference. My patch
called virtio_queue_notify() which contains this block:
     if (vq->host_notifier_enabled) {
event_notifier_set(&vq->host_notifier);
     } else if (vq->handle_output) {
         vq->handle_output(vdev, vq);
In my testing, the first branch was taken, callingevent_notifier_set().
Hanna's patch uses virtio_queue_notify_vq() and there,
vq->handle_output() will be called. That seems to be the relevant
difference regarding the CPU-usage-spike issue.
I should mention that this is with a VirtIO SCSI disk. I alsoattempted
to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
didn't manage yet.
What I noticed is that in virtio_queue_host_notifier_aio_poll(),one of
the queues (but only one) will always show as nonempty. And then,
run_poll_handlers_once() will always detect progress which explainsthe
CPU usage.

The following shows
1. vq address
2. number of times vq was passed tovirtio_queue_host_notifier_aio_poll()
3. number of times the result of virtio_queue_host_notifier_aio_poll()
was true for the vq
0x555fd93f9c80 17162000 0
0x555fd93f9e48 17162000 6
0x555fd93f9ee0 17162000 0
0x555fd93f9d18 17162000 17162000
0x555fd93f9db0 17162000 0
0x555fd93f9f78 17162000 0
And for the problematic one, the reason it is seen as nonempty is:
0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
s->events_dropped is false in my testing, so
virtio_scsi_handle_event_vq() doesn't do anything.
Those values stay like this while the call counts above increase.
So something going wrong with the indices when the event notifieris set
from QEMU side (in the main thread)?

The guest is Debian 12 with a 6.1 kernel.
So, trying to figure out a new RFC version:
About the stack trace you, Fiona, posted: As far as I understand,that happens because virtio_blk_drained_end() callingvirtio_queue_notify_vq() wasn’t safe after all, and instead we needto use virtio_queue_notify(). Right?
However, you say using virtio_queue_notify() instead causes busyloops of doing nothing in virtio-scsi (what you describe above). Imean, better than a crash, but, you know. :)
I don’t know have any prior knowledge about the event handling,unfortunately. The fact that 8 buffers are available but we don’tuse any sounds OK to me; as far as I understand, we only use thosebuffers if we have any events to push into them, so as long as wedon’t, we won’t. Question is, should we not have its poll handlerreturn false if we don’t have any events (i.e. events_dropped isfalse)? Would that solve it?
Or actually, maybe we could just skip the virtio_queue_notify() callfor the event vq? I.e. have it be `if (vq !=VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`? I wouldn’t like that very much, (a) because this would make itslightly cumbersome to put that intovirtio_queue_aio_attach_host_notifier*(), and (b) in case that doesfix it, I do kind of feel like the real problem is that we usevirtio_queue_host_notifier_aio_poll() for the event vq, which tellsthe polling code to poll whenever the vq is non-empty, but we (AFAIU)expect the event vq to be non-empty all the time.

Turns out there was commit 38738f7dbbda90fbc161757b7f4be35b52205552(“virtio-scsi: don't waste CPU polling the event virtqueue”) by Stefan,which specifically intended to not usevirtio_queue_host_notifier_aio_poll() for the event vq. I think theproblem is that virtio_scsi_drained_end() should have taken care to usevirtio_queue_aio_attach_host_notifier_no_poll() for the event vq.

If we do that, I think running virtio_queue_notify() on the event vq tooshould be reasonable. We still want to check whether there are newbuffers available in case we have events_dropped. I don’t know whetherit’s really necessary, but without virtio_queue_host_notifier_aio_poll()installed, it shouldn’t hurt.


Hanna

Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler

Reply via email to