On Wed, Nov 12, 2025 at 04:31:05PM -0600, Eric Blake wrote: > Without a mutex, NetListener can run into this data race between a > thread changing the async callback callback function to use when a > client connects, and the thread servicing polling of the listening > sockets: > > Thread 1: > qio_net_listener_set_client_func(lstnr, f1, ...); > => foreach sock: socket > => object_ref(lstnr) > => sock_src = qio_channel_socket_add_watch_source(sock, ...., > lstnr, object_unref); > > Thread 2: > poll() > => event POLLIN on socket > => ref(GSourceCallback) > => if (lstnr->io_func) // while lstnr->io_func is f1 > ...interrupt.. > > Thread 1: > qio_net_listener_set_client_func(lstnr, f2, ...); > => foreach sock: socket > => g_source_unref(sock_src) > => foreach sock: socket > => object_ref(lstnr) > => sock_src = qio_channel_socket_add_watch_source(sock, ...., > lstnr, object_unref); > > Thread 2: > => call lstnr->io_func(lstnr->io_data) // now sees f2 > => return dispatch(sock) > => unref(GSourceCallback) > => destroy-notify > => object_unref > > Found by inspection. This is a SEGFAULT waiting to happen if f2 can > become NULL because thread 1 deregisters the user's callback while > thread 2 is trying to service the callback. Other messes are also > theoretically possible, such as running callback f1 with an opaque > pointer that should only be passed to f2 (if the client code were to > use more than just a binary choice between a single async function or > NULL). > > Mitigating factor: if the code that modifies the QIONetListener can > only be reached by the same thread that is executing the polling and > async callbacks, then we are not in a two-thread race documented above > (even though poll can see two clients trying to connect in the same > window of time, any changes made to the listener by the first async > callback will be completed before the thread moves on to the second > client). However, QEMU is complex enough that I was unable to state > with certainty whether a QMP command (such as nbd-server-stop, which > does modify the net listener) can ever be serviced in a thread > distinct from the one trying to do the async callbacks. Similarly, I > did not spend the time trying to add sleeps or execute under gdb to > try and actually trigger the race in practice. > > At any rate, it is worth having the API be robust. To ensure that > modifying a NetListener can be safely done from any thread, add a > mutex that guarantees atomicity to all members of a listener object > related to callbacks. This problem has been present since > QIONetListener was introduced. > > Note that this does NOT prevent the case of a second round of the > user's old async callback being invoked with the old opaque data, even > when the user has already tried to change the async callback during > the first async callback; it is only about ensuring that there is no > sharding (the eventual io_func(io_data) call that does get made will > correspond to a particular combination that the user had requested at > some point in time, and not be sharded to a combination that never > existed in practice). In other words, this patch maintains the status > quo that a user's async callback function already needs to be robust > to parallel clients landing in the same window of poll servicing, even > when only one client is desired. > > CC: [email protected] > Fixes: 53047392 ("io: introduce a network socket listener API", v2.12.0) > Signed-off-by: Eric Blake <[email protected]> > > --- > v3: new patch > --- > include/io/net-listener.h | 1 + > io/net-listener.c | 58 +++++++++++++++++++++++++++++---------- > 2 files changed, 44 insertions(+), 15 deletions(-)
Reviewed-by: Daniel P. Berrangé <[email protected]> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
