On Wed, Nov 12, 2025 at 03:48:07PM -0600, Eric Blake wrote:
> On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> > From: Jie Song <[email protected]>
> > 
> > When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> > IOThread monitoring of the QMP fd by default. However, a race condition
> > exists during the initialization phase: the IOThread only removes the
> > main thread's fd watch when it reaches 
> > qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> > 
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> > 
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> > 
> > The fix introduces socket_chr_listener_cleanup() to destroy and unref
> > all existing IO sources on the socket chardev listener, guaranteeing
> > that no concurrent fd monitoring occurs during the transition to
> > IOThread handling.
> > 
> > Signed-off-by: Jie Song <[email protected]>
> > ---
> >  chardev/char-socket.c         | 18 ++++++++++++++++++
> >  include/chardev/char-socket.h |  2 ++
> >  monitor/qmp.c                 |  6 ++++++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 62852e3caf..073a9da855 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
> >      }
> >  }
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr)
> > +{
> > +    SocketChardev *s = SOCKET_CHARDEV(chr);
> > +
> > +    if (s->listener) {
> > +        QIONetListener *listener = s->listener;
> > +        size_t i;
> > +
> > +        for (i = 0; i < listener->nsioc; i++) {
> 
> This directly accesses listener->nsioc outside of net-listener.c.
> I've got a pending patch that frowns on this type of usage (here's the
> link to v2; v3 is coming soon):
> 
> https://lore.kernel.org/qemu-devel/[email protected]/T/#m69a13da54c24ad55351b6a004ec1c0cba7a7b49c
> 
> But it might be possible to do what you want without peeking inside
> the listener; have you tested calling
> qio_net_listener_set_client_func_full() to change the callback to NULL
> prior to doing the handover to iothread, and then reregistering
> tcp_chr_accept after that point?

Thinking further, I think we may still have a problem in
QIONetListener even with my v3 patches:

| static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
|                                               GIOCondition condition,
|                                               gpointer opaque)
| {
|     QIONetListener *listener = QIO_NET_LISTENER(opaque);
|     QIOChannelSocket *sioc;
|     QIONetListenerClientFunc io_func;
|     gpointer io_data;
|     GMainContext *context;
|     AioContext *aio_context;
| 
|     sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
|                                      NULL);

This unconditionally tries to accept() the client's socket...

|     if (!sioc) {
|         return TRUE;
|     }
| 
|     WITH_QEMU_LOCK_GUARD(&listener->lock) {
|         io_func = listener->io_func;
|         io_data = listener->io_data;
|         context = listener->context;
|         aio_context = listener->aio_context;
|     }
| 
|     trace_qio_net_listener_callback(listener, io_func, context, aio_context);
|     if (io_func) {
|         io_func(listener, sioc, io_data);

...and if accepted, only then does it trigger the callback to let the
user know the sioc to work with.  But if there is no io_func currently
registered...

|     }
| 
|     object_unref(OBJECT(sioc));

...the client connection is discarded, with the client being unable to
connect if it managed to land in the window when the netlistener had
no async callback registered.

I have to wonder if we should be changing the order in this function
to not attempt the qemu_channel_socket_accept() unless there is a
callback registered, so that a client that is pending service in the
window where the user code does not have an async callback installed
will still be in the queue to accept the moment an async function is
registered.  (Plus thinking of any ripple effects on whether we also
need to ensure that we aren't burning CPU in a busy loop on polling
but not clearing what the poll is looking for)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to