> Hi
>
> On Sat, Nov 22, 2025 at 7:33=E2=80=AFAM Jie Song <[email protected]> wrote:
> >
> > From: Jie Song <[email protected]>
> >
> > When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> > enables IOThread monitoring of the QMP fd by default. However, a race
> > condition exists during the initialization phase: the IOThread only remov=
> es
> > the main thread's fd watch when it reaches qio_net_listener_set_client_fu=
> nc_full(),
> > which may be delayed under high system load.
> >
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
>
> Ok, but do you have a backtrace of a hang to share?
>
> >
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> >
> > Signed-off-by: Jie Song <[email protected]>
> > ---
> > Changes in v3:
> > - Use a more general method to fix the problem.
> > - Link to v2:
> > https://lore.kernel.org/qemu-devel/20251117150142.131694-1-mail@jiesong=
> .me/
> > - Link to v1:
> > https://lore.kernel.org/qemu-devel/20251111150144.76751-1-mail@jiesong.=
> me/
> > ---
> > chardev/char-io.c | 8 ++++++++
> > chardev/char-socket.c | 9 +++++++++
> > include/chardev/char-io.h | 2 ++
> > include/chardev/char.h | 2 ++
> > monitor/qmp.c | 5 +++++
> > 5 files changed, 26 insertions(+)
> >
> > diff --git a/chardev/char-io.c b/chardev/char-io.c
> > index 3be17b51ca..998282e526 100644
> > --- a/chardev/char-io.c
> > +++ b/chardev/char-io.c
> > @@ -182,3 +182,11 @@ int io_channel_send(QIOChannel *ioc, const void *buf=
> , size_t len)
> > {
> > return io_channel_send_full(ioc, buf, len, NULL, 0);
> > }
> > +
> > +void remove_listaner_fd_in_watch(Chardev *chr)
> > +{
> > + ChardevClass *cc =3D CHARDEV_GET_CLASS(chr);
> > + if (cc->chr_listener_cleanup) {
> > + cc->chr_listener_cleanup(chr);
> > + }
> > +}
>
> I wonder if this code shouldn't just be added to remove_fd_in_watch()
> instead. It would need careful review of all existing users,
> nevermind.
>
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 26d2f11202..39b3a76638 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -1570,6 +1570,14 @@ char_socket_get_connected(Object *obj, Error **err=
> p)
> > return s->state =3D=3D TCP_CHARDEV_STATE_CONNECTED;
> > }
> >
> > +static void tcp_chr_listener_cleanup(Chardev *chr)
> > +{
> > + SocketChardev *s =3D SOCKET_CHARDEV(chr);
> > + if (s->listener)
> > + qio_net_listener_set_client_func_full(s->listener, NULL, NULL,
> > + NULL, chr->gcontext);
>
> Add braces
>
> > +}
> > +
> > static void char_socket_class_init(ObjectClass *oc, const void *data)
> > {
> > ChardevClass *cc =3D CHARDEV_CLASS(oc);
> > @@ -1587,6 +1595,7 @@ static void char_socket_class_init(ObjectClass *oc,=
> const void *data)
> > cc->chr_add_client =3D tcp_chr_add_client;
> > cc->chr_add_watch =3D tcp_chr_add_watch;
> > cc->chr_update_read_handler =3D tcp_chr_update_read_handler;
> > + cc->chr_listener_cleanup =3D tcp_chr_listener_cleanup;
> >
> > object_class_property_add(oc, "addr", "SocketAddress",
> > char_socket_get_addr, NULL,
> > diff --git a/include/chardev/char-io.h b/include/chardev/char-io.h
> > index ac379ea70e..087a250c70 100644
> > --- a/include/chardev/char-io.h
> > +++ b/include/chardev/char-io.h
> > @@ -43,4 +43,6 @@ int io_channel_send(QIOChannel *ioc, const void *buf, s=
> ize_t len);
> > int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len,
> > int *fds, size_t nfds);
> >
> > +void remove_listaner_fd_in_watch(Chardev *chr);
> > +
> > #endif /* CHAR_IO_H */
> > diff --git a/include/chardev/char.h b/include/chardev/char.h
> > index b65e9981c1..192cad67d4 100644
> > --- a/include/chardev/char.h
> > +++ b/include/chardev/char.h
> > @@ -307,6 +307,8 @@ struct ChardevClass {
> >
> > /* handle various events */
> > void (*chr_be_event)(Chardev *s, QEMUChrEvent event);
> > +
> > + void (*chr_listener_cleanup)(Chardev *chr);
> > };
> >
> > Chardev *qemu_chardev_new(const char *id, const char *typename,
> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > index cb99a12d94..e2b1c49ed6 100644
> > --- a/monitor/qmp.c
> > +++ b/monitor/qmp.c
> > @@ -537,6 +537,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Err=
> or **errp)
> > * e.g. the chardev is in client mode, with wait=3Don.
> > */
> > remove_fd_in_watch(chr);
> > + /*
> > + * Clean up listener IO sources early to prevent racy fd
> > + * handling between the main thread and the I/O thread.
> > + */
> > + remove_listaner_fd_in_watch(chr);
> > /*
> > * We can't call qemu_chr_fe_set_handlers() directly here
> > * since chardev might be running in the monitor I/O
> > --
> > 2.43.0
> >
> >
>
> otherwise, looks ok to me
> Reviewed-by: Marc-Andr=C3=A9 Lureau <[email protected]>
>
> --
> Marc-André Lureau
Hi, Marc-André.
Thank you for your review and the valuable feedback on the patch.
I’ll address the points raised and submit a new patch soon.
Additionally, I’d like to share a backtrace of the hang that we encountered.
Please find it below:
gdb --args ./qemu-system-x86_64 -S -no-user-config -nodefaults -nographic \
-machine none,accel=kvm:tcg \
-qmp unix:/tmp/qmp-xxx/qmp.monitor,server=on,wait=off
nc -U /tmp/qmp-xxx/qmp.monitor
```
...
(gdb) i threads
Id Target Id Frame
* 1 Thread 0x7ffff7a13c80 (LWP 4713) "qemu-system-x86" accept4 (fd=9,
addr=..., addr_len=0x5555577c6de0, flags=524288)
at ../sysdeps/unix/sysv/linux/accept4.c:29
2 Thread 0x7ffff76fd6c0 (LWP 4716) "qemu-system-x86" syscall () at
../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
3 Thread 0x7ffff6dfb6c0 (LWP 4737) "qemu-system-x86" accept4 (fd=9,
addr=..., addr_len=0x7fffe8000f90, flags=524288)
at ../sysdeps/unix/sysv/linux/accept4.c:29
(gdb) i b
Num Type Disp Enb Address What
1 breakpoint keep y 0x000055555608f58c in monitor_init_qmp at
../monitor/qmp.c:516
breakpoint already hit 1 time
2 breakpoint keep y 0x00007ffff792b8c0 in accept4 at
../sysdeps/unix/sysv/linux/accept4.c:29
breakpoint already hit 2 times
(gdb) bt
#0 accept4 (fd=9, addr=..., addr_len=0x5555577c6de0, flags=524288) at
../sysdeps/unix/sysv/linux/accept4.c:29
#1 0x000055555615a01d in qemu_accept (s=9, addr=0x5555577c6d60,
addrlen=0x5555577c6de0) at ../util/osdep.c:483
#2 0x0000555555f64ddb in qio_channel_socket_accept (ioc=0x5555577c46f0,
errp=0x0) at ../io/channel-socket.c:407
#3 0x0000555555f6e585 in qio_net_listener_channel_func (ioc=0x5555577c46f0,
condition=G_IO_IN, opaque=0x5555577c42b0)
at ../io/net-listener.c:64
#4 0x0000555555f67eee in qio_channel_fd_source_dispatch
(source=0x5555577c4be0, callback=0x555555f6e50e
<qio_net_listener_channel_func>, user_data=0x5555577c42b0)
at ../io/channel-watch.c:84
#5 0x00007ffff7bc049e in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007ffff7bc0710 in g_main_context_dispatch () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x000055555617f74e in glib_pollfds_poll () at ../util/main-loop.c:290
#8 0x000055555617f7e0 in os_host_main_loop_wait (timeout=-1) at
../util/main-loop.c:313
#9 0x000055555617f91b in main_loop_wait (nonblocking=0) at
../util/main-loop.c:592
#10 0x0000555555c3fe91 in qemu_main_loop () at ../system/runstate.c:903
#11 0x000055555608fff0 in qemu_default_main (opaque=0x0) at ../system/main.c:50
#12 0x00005555560900ae in main (argc=9, argv=0x7fffffffdd98) at
../system/main.c:93
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff6dfb6c0 (LWP 4737))]
#0 accept4 (fd=9, addr=..., addr_len=0x7fffe8000f90, flags=524288) at
../sysdeps/unix/sysv/linux/accept4.c:29
29 in ../sysdeps/unix/sysv/linux/accept4.c
(gdb) bt
#0 accept4 (fd=9, addr=..., addr_len=0x7fffe8000f90, flags=524288) at
../sysdeps/unix/sysv/linux/accept4.c:29
#1 0x000055555615a01d in qemu_accept (s=9, addr=0x7fffe8000f10,
addrlen=0x7fffe8000f90) at ../util/osdep.c:483
#2 0x0000555555f64ddb in qio_channel_socket_accept (ioc=0x5555577c46f0,
errp=0x0) at ../io/channel-socket.c:407
#3 0x0000555555f6e585 in qio_net_listener_channel_func (ioc=0x5555577c46f0,
condition=G_IO_IN, opaque=0x5555577c42b0)
at ../io/net-listener.c:64
#4 0x0000555555f67eee in qio_channel_fd_source_dispatch
(source=0x7fffe8000c50, callback=0x555555f6e50e
<qio_net_listener_channel_func>, user_data=0x5555577c42b0)
at ../io/channel-watch.c:84
#5 0x00007ffff7bc049e in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007ffff7c1f737 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#7 0x00007ffff7bc0f87 in g_main_loop_run () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#8 0x0000555555f971fa in iothread_run (opaque=0x555557573ce0) at
../iothread.c:70
#9 0x000055555616431b in qemu_thread_start (args=0x555557570590) at
../util/qemu-thread-posix.c:393
#10 0x00007ffff789caa4 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:447
#11 0x00007ffff7929c6c in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
```
Both the main thread and the io thread will call accept4, and the one called
later will get stuck.