On Thu, Oct 09, 2025 at 05:25:47PM +0200, Kevin Wolf wrote: > Am 10.09.2025 um 19:56 hat Stefan Hajnoczi geschrieben: > > AioContext's glib integration only supports ppoll(2) file descriptor > > monitoring. epoll(7) and io_uring(7) disable themselves and switch back > > to ppoll(2) when the glib event loop is used. The main loop thread > > cannot use epoll(7) or io_uring(7) because it always uses the glib event > > loop. > > > > Future QEMU features may require io_uring(7). One example is uring_cmd > > support in FUSE exports. Each feature could create its own io_uring(7) > > context and integrate it into the event loop, but this is inefficient > > due to extra syscalls. It would be more efficient to reuse the > > AioContext's existing fdmon-io_uring.c io_uring(7) context because > > fdmon-io_uring.c will already be active on systems where Linux io_uring > > is available. > > > > In order to keep fdmon-io_uring.c's AioContext operational even when the > > glib event loop is used, extend FDMonOps with an API similar to > > GSourceFuncs so that file descriptor monitoring can integrate into the > > glib event loop. > > > > A quick summary of the GSourceFuncs API: > > - prepare() is called each event loop iteration before waiting for file > > descriptors and timers. > > - check() is called to determine whether events are ready to be > > dispatched after waiting. > > - dispatch() is called to process events. > > > > More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html > > > > Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and > > also implement epoll(7)- and io_uring(7)-specific file descriptor > > monitoring code for glib event loops. > > > > Note that it's still faster to use aio_poll() rather than the glib event > > loop since glib waits for file descriptor activity with ppoll(2) and > > does not support adaptive polling. But at least epoll(7) and io_uring(7) > > now work in glib event loops. > > > > Splitting this into multiple commits without temporarily breaking > > AioContext proved difficult so this commit makes all the changes. The > > next commit will remove the aio_context_use_g_source() API because it is > > no longer needed. > > > > Signed-off-by: Stefan Hajnoczi <[email protected]> > > Reviewed-by: Eric Blake <[email protected]> > > It looks to me like we get a lot of duplication between implementations > of .wait and the new .gsource_* callbacks. We can probably clean this up > later. In the meantime, we need to make sure that the implementations > don't diverge.
Yes, glib's GSource and aio_poll() have different interfaces but end up
doing a lot of similar things.
> > include/block/aio.h | 36 ++++++++++++++++++
> > util/aio-posix.h | 5 +++
> > tests/unit/test-aio.c | 7 +++-
> > util/aio-posix.c | 69 ++++++++-------------------------
> > util/fdmon-epoll.c | 52 ++++++++++++++++++++++---
> > util/fdmon-io_uring.c | 44 +++++++++++++++++++++-
> > util/fdmon-poll.c | 88 ++++++++++++++++++++++++++++++++++++++++++-
> > 7 files changed, 239 insertions(+), 62 deletions(-)
>
> > +static void fdmon_epoll_gsource_dispatch(AioContext *ctx,
> > + AioHandlerList *ready_list)
> > +{
> > + AioHandler *node;
> > + int ret;
> > + struct epoll_event events[128];
> > +
> > + /* Collect events and process them */
> > + ret = epoll_wait(ctx->epollfd, events, ARRAY_SIZE(events), 0);
> > + if (ret <= 0) {
> > + return;
> > + }
> > + for (int i = 0; i < ret; i++) {
> > + int ev = events[i].events;
> > + int revents = (ev & EPOLLIN ? G_IO_IN : 0) |
> > + (ev & EPOLLOUT ? G_IO_OUT : 0) |
> > + (ev & EPOLLHUP ? G_IO_HUP : 0) |
> > + (ev & EPOLLERR ? G_IO_ERR : 0);
> > +
> > + node = events[i].data.ptr;
> > + aio_add_ready_handler(ready_list, node, revents);
> > + }
> > +}
>
> Isn't this just fdmon_epoll_wait(ctx, ready_list, 0)?
Yes, it is. I'll reuse fdmon_epoll_wait() in the next patch series
revision.
>
> > @@ -97,11 +102,92 @@ static void fdmon_poll_update(AioContext *ctx,
> > AioHandler *old_node,
> > AioHandler *new_node)
> > {
> > - /* Do nothing, AioHandler already contains the state we'll need */
> > + if (old_node && !new_node) {
> > + /*
> > + * If the GSource is in the process of being destroyed then
> > + * g_source_remove_poll() causes an assertion failure. Skip
> > removal in
> > + * that case, because glib cleans up its state during destruction
> > + * anyway.
> > + */
> > + if (!g_source_is_destroyed(&ctx->source)) {
> > + g_source_remove_poll(&ctx->source, &old_node->pfd);
> > + }
> > + }
> > +
> > + if (!old_node && new_node) {
> > + g_source_add_poll(&ctx->source, &new_node->pfd);
> > + }
> > +}
>
> I think this changes the behaviour for the case where we update a node,
> i.e. old_node and new_node are both non-NULL. Previously we added the
> new node and removed the old one, now you're skipping this.
>
> Both are referring to the same file descriptor, but we're not passing
> an integer here but a pointer to a heap-allocated GPollFD (which is
> contined in the AioHandler). The next thing that happens in the caller
> is that we free old_node.
>
> Doesn't this cause use after free?
You are right. It should be just:
if (old_node) { remove(); }
if (new_node) { add(); }
>
> > +static void fdmon_poll_gsource_prepare(AioContext *ctx)
> > +{
> > + /* Do nothing */
> > +}
> > +
> > +static bool fdmon_poll_gsource_check(AioContext *ctx)
> > +{
> > + AioHandler *node;
> > + bool result = false;
> > +
> > + /*
> > + * We have to walk very carefully in case aio_set_fd_handler is
> > + * called while we're walking.
> > + */
> > + qemu_lockcnt_inc(&ctx->list_lock);
> > +
> > + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
> > + int revents = node->pfd.revents & node->pfd.events;
> > +
> > + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> > + result = true;
> > + break;
> > + }
> > + if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> > + result = true;
> > + break;
> > + }
> > + }
> > +
> > + qemu_lockcnt_dec(&ctx->list_lock);
> > +
> > + return result;
> > +}
> > +
> > +static void fdmon_poll_gsource_dispatch(AioContext *ctx,
> > + AioHandlerList *ready_list)
> > +{
> > + AioHandler *node;
> > +
> > + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
> > + int revents;
> > +
> > + revents = node->pfd.revents & node->pfd.events;
> > + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
> > + aio_add_ready_handler(ready_list, node, revents);
> > + } else if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
> > + aio_add_ready_handler(ready_list, node, revents);
> > + }
>
> Why do we need these checks? Don't we already know that if an event is
> in node->pfd.events, there is a matching io_read/io_write callback, too?
> This is how aio_set_fd_handler() decides which events to monitor.
>
> If we do need them, why doesn't fdmon_poll_wait()?
fdmon_poll_wait() is fine, I'll take the same approach here.
signature.asc
Description: PGP signature
