On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > All callers to migration_incoming_state_destroy() other than > postcopy_ram_listen_thread() do this call with BQL held. > > Since migration_incoming_state_destroy() ultimately calls "load_cleanup" > SaveVMHandlers and it will soon call BQL-sensitive code it makes sense > to always call that function under BQL rather than to have it deal with > both cases (with BQL and without BQL). > Add the necessary bql_lock() and bql_unlock() to > postcopy_ram_listen_thread(). > > qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call > "load_state" SaveVMHandlers that are expecting BQL to be held. > > In principle, the only devices that should be arriving on migration > channel serviced by postcopy_ram_listen_thread() are those that are > postcopiable and whose load handlers are safe to be called without BQL > being held. > > But nothing currently prevents the source from sending data for "unsafe" > devices which would cause trouble there. > Add a TODO comment there so it's clear that it would be good to improve > handling of such (erroneous) case in the future. > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com> > --- > migration/migration.c | 16 ++++++++++++++++ > migration/savevm.c | 4 ++++ > 2 files changed, 20 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 9e9db26667f1..6b2a8af4231d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void) > struct MigrationIncomingState *mis = migration_incoming_get_current(); > > multifd_recv_cleanup(); > + > /* > * RAM state cleanup needs to happen after multifd cleanup, because > * multifd threads can use some of its states (receivedmap). > + * > + * This call also needs BQL held since it calls all registered > + * load_cleanup SaveVMHandlers and at least the VFIO implementation is > + * BQL-sensitive. > + * > + * In addition to the above, it also performs cleanup of load threads > + * thread pool. > + * This cleanup operation is BQL-sensitive as it requires unlocking BQL > + * so a thread possibly waiting for it could get unblocked and finally > + * exit. > + * The reason why a load thread may need to hold BQL in the first place > + * is because address space modification operations require it.
Hold on... This almost says exactly why load_cleanup() should _not_ take BQL... rather than should.. So I had a closer look at the latest code, it's about this: static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd) { /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */ bql_unlock(); WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) { while (multifd->load_bufs_thread_running) { multifd->load_bufs_thread_want_exit = true; qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond); qemu_cond_signal(&multifd->load_bufs_iter_done_cond); qemu_cond_wait(&multifd->load_bufs_thread_finished_cond, &multifd->load_bufs_mutex); } } bql_lock(); } It doesn't make much sense to me to take it only because we want to drop it unconditionally. Can we guarantee the function not taking BQL instead? I had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <- qemu_ram_block_writeback <- ram_load_cleanup), it looks ok. So the question is, is it safe to unlock BQL in whatever context (in coroutines, or in a bottom half)? If the answer is yes, we could make migration_incoming_state_destroy() always not taking BQL (and assert(!bql_locked()) instead). If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current version may not work either.. > + * > + * Check proper BQL state here rather than risk possible deadlock later. > */ > + assert(bql_locked()); > qemu_loadvm_state_cleanup(); > > if (mis->to_src_file) { > diff --git a/migration/savevm.c b/migration/savevm.c > index 7c1aa8ad7b9d..3e86b572cfa8 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque) > * in qemu_file, and thus we must be blocking now. > */ > qemu_file_set_blocking(f, true); > + > + /* TODO: sanity check that only postcopiable data will be loaded here */ > load_res = qemu_loadvm_state_main(f, mis); > > /* > @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > * (If something broke then qemu will have to exit anyway since it's > * got a bad migration state). > */ > + bql_lock(); > migration_incoming_state_destroy(); > + bql_unlock(); > > rcu_unregister_thread(); > mis->have_listen_thread = false; > -- Peter Xu