On Thu, Oct 09, 2025 at 05:25:47PM +0200, Kevin Wolf wrote:
> Am 10.09.2025 um 19:56 hat Stefan Hajnoczi geschrieben:
> > AioContext's glib integration only supports ppoll(2) file descriptor
> > monitoring. epoll(7) and io_uring(7) disable themselves and switch back
> > to ppoll(2) when the glib event loop is used. The main loop thread
> > cannot use epoll(7) or io_uring(7) because it always uses the glib event
> > loop.
> > 
> > Future QEMU features may require io_uring(7). One example is uring_cmd
> > support in FUSE exports. Each feature could create its own io_uring(7)
> > context and integrate it into the event loop, but this is inefficient
> > due to extra syscalls. It would be more efficient to reuse the
> > AioContext's existing fdmon-io_uring.c io_uring(7) context because
> > fdmon-io_uring.c will already be active on systems where Linux io_uring
> > is available.
> > 
> > In order to keep fdmon-io_uring.c's AioContext operational even when the
> > glib event loop is used, extend FDMonOps with an API similar to
> > GSourceFuncs so that file descriptor monitoring can integrate into the
> > glib event loop.
> > 
> > A quick summary of the GSourceFuncs API:
> > - prepare() is called each event loop iteration before waiting for file
> >   descriptors and timers.
> > - check() is called to determine whether events are ready to be
> >   dispatched after waiting.
> > - dispatch() is called to process events.
> > 
> > More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html
> > 
> > Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and
> > also implement epoll(7)- and io_uring(7)-specific file descriptor
> > monitoring code for glib event loops.
> > 
> > Note that it's still faster to use aio_poll() rather than the glib event
> > loop since glib waits for file descriptor activity with ppoll(2) and
> > does not support adaptive polling. But at least epoll(7) and io_uring(7)
> > now work in glib event loops.
> > 
> > Splitting this into multiple commits without temporarily breaking
> > AioContext proved difficult so this commit makes all the changes. The
> > next commit will remove the aio_context_use_g_source() API because it is
> > no longer needed.
> > 
> > Signed-off-by: Stefan Hajnoczi <[email protected]>
> > Reviewed-by: Eric Blake <[email protected]>
> 
> It looks to me like we get a lot of duplication between implementations
> of .wait and the new .gsource_* callbacks. We can probably clean this up
> later. In the meantime, we need to make sure that the implementations
> don't diverge.

Yes, glib's GSource and aio_poll() have different interfaces but end up
doing a lot of similar things.

> >  include/block/aio.h   | 36 ++++++++++++++++++
> >  util/aio-posix.h      |  5 +++
> >  tests/unit/test-aio.c |  7 +++-
> >  util/aio-posix.c      | 69 ++++++++-------------------------
> >  util/fdmon-epoll.c    | 52 ++++++++++++++++++++++---
> >  util/fdmon-io_uring.c | 44 +++++++++++++++++++++-
> >  util/fdmon-poll.c     | 88 ++++++++++++++++++++++++++++++++++++++++++-
> >  7 files changed, 239 insertions(+), 62 deletions(-)
> 
> > +static void fdmon_epoll_gsource_dispatch(AioContext *ctx,
> > +                                         AioHandlerList *ready_list)
> > +{
> > +    AioHandler *node;
> > +    int ret;
> > +    struct epoll_event events[128];
> > +
> > +    /* Collect events and process them */
> > +    ret = epoll_wait(ctx->epollfd, events, ARRAY_SIZE(events), 0);
> > +    if (ret <= 0) {
> > +        return;
> > +    }
> > +    for (int i = 0; i < ret; i++) {
> > +        int ev = events[i].events;
> > +        int revents = (ev & EPOLLIN ? G_IO_IN : 0) |
> > +                      (ev & EPOLLOUT ? G_IO_OUT : 0) |
> > +                      (ev & EPOLLHUP ? G_IO_HUP : 0) |
> > +                      (ev & EPOLLERR ? G_IO_ERR : 0);
> > +
> > +        node = events[i].data.ptr;
> > +        aio_add_ready_handler(ready_list, node, revents);
> > +    }
> > +}
> 
> Isn't this just fdmon_epoll_wait(ctx, ready_list, 0)?

Yes, it is. I'll reuse fdmon_epoll_wait() in the next patch series
revision.

> 
> > @@ -97,11 +102,92 @@ static void fdmon_poll_update(AioContext *ctx,
> >                                AioHandler *old_node,
> >                                AioHandler *new_node)
> >  {
> > -    /* Do nothing, AioHandler already contains the state we'll need */
> > +    if (old_node && !new_node) {
> > +        /*
> > +         * If the GSource is in the process of being destroyed then
> > +         * g_source_remove_poll() causes an assertion failure.  Skip 
> > removal in
> > +         * that case, because glib cleans up its state during destruction
> > +         * anyway.
> > +         */
> > +        if (!g_source_is_destroyed(&ctx->source)) {
> > +            g_source_remove_poll(&ctx->source, &old_node->pfd);
> > +        }
> > +    }
> > +
> > +    if (!old_node && new_node) {
> > +        g_source_add_poll(&ctx->source, &new_node->pfd);
> > +    }
> > +}
> 
> I think this changes the behaviour for the case where we update a node,
> i.e. old_node and new_node are both non-NULL. Previously we added the
> new node and removed the old one, now you're skipping this.
> 
> Both are referring to the same file descriptor, but we're not passing
> an integer here but a pointer to a heap-allocated GPollFD (which is
> contined in the AioHandler). The next thing that happens in the caller
> is that we free old_node.
> 
> Doesn't this cause use after free?

You are right. It should be just:

  if (old_node) { remove(); }
  if (new_node) { add(); }

> 
> > +static void fdmon_poll_gsource_prepare(AioContext *ctx)
> > +{
> > +    /* Do nothing */
> > +}
> > +
> > +static bool fdmon_poll_gsource_check(AioContext *ctx)
> > +{
> > +    AioHandler *node;
> > +    bool result = false;
> > +
> > +    /*
> > +     * We have to walk very carefully in case aio_set_fd_handler is
> > +     * called while we're walking.
> > +     */
> > +    qemu_lockcnt_inc(&ctx->list_lock);
> > +
> > +    QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
> > +        int revents = node->pfd.revents & node->pfd.events;
> > +
> > +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> > +            result = true;
> > +            break;
> > +        }
> > +        if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> > +            result = true;
> > +            break;
> > +        }
> > +    }
> > +
> > +    qemu_lockcnt_dec(&ctx->list_lock);
> > +
> > +    return result;
> > +}
> > +
> > +static void fdmon_poll_gsource_dispatch(AioContext *ctx,
> > +                                        AioHandlerList *ready_list)
> > +{
> > +    AioHandler *node;
> > +
> > +    QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
> > +        int revents;
> > +
> > +        revents = node->pfd.revents & node->pfd.events;
> > +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> > +            aio_add_ready_handler(ready_list, node, revents);
> > +        } else if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> > +            aio_add_ready_handler(ready_list, node, revents);
> > +        }
> 
> Why do we need these checks? Don't we already know that if an event is
> in node->pfd.events, there is a matching io_read/io_write callback, too?
> This is how aio_set_fd_handler() decides which events to monitor.
> 
> If we do need them, why doesn't fdmon_poll_wait()?

fdmon_poll_wait() is fine, I'll take the same approach here.

Attachment: signature.asc
Description: PGP signature

Reply via email to