On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> 
> All callers to migration_incoming_state_destroy() other than
> postcopy_ram_listen_thread() do this call with BQL held.
> 
> Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> to always call that function under BQL rather than to have it deal with
> both cases (with BQL and without BQL).
> Add the necessary bql_lock() and bql_unlock() to
> postcopy_ram_listen_thread().
> 
> qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> "load_state" SaveVMHandlers that are expecting BQL to be held.
> 
> In principle, the only devices that should be arriving on migration
> channel serviced by postcopy_ram_listen_thread() are those that are
> postcopiable and whose load handlers are safe to be called without BQL
> being held.
> 
> But nothing currently prevents the source from sending data for "unsafe"
> devices which would cause trouble there.
> Add a TODO comment there so it's clear that it would be good to improve
> handling of such (erroneous) case in the future.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
> ---
>  migration/migration.c | 16 ++++++++++++++++
>  migration/savevm.c    |  4 ++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9e9db26667f1..6b2a8af4231d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      multifd_recv_cleanup();
> +
>      /*
>       * RAM state cleanup needs to happen after multifd cleanup, because
>       * multifd threads can use some of its states (receivedmap).
> +     *
> +     * This call also needs BQL held since it calls all registered
> +     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
> +     * BQL-sensitive.
> +     *
> +     * In addition to the above, it also performs cleanup of load threads
> +     * thread pool.
> +     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
> +     * so a thread possibly waiting for it could get unblocked and finally
> +     * exit.
> +     * The reason why a load thread may need to hold BQL in the first place
> +     * is because address space modification operations require it.

Hold on...

This almost says exactly why load_cleanup() should _not_ take BQL... rather
than should..

So I had a closer look at the latest code, it's about this:

static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
{
    /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
    bql_unlock();
    WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
        while (multifd->load_bufs_thread_running) {
            multifd->load_bufs_thread_want_exit = true;

            qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
            qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
            qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
                           &multifd->load_bufs_mutex);
        }
    }
    bql_lock();
}

It doesn't make much sense to me to take it only because we want to drop it
unconditionally. Can we guarantee the function not taking BQL instead?  I
had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.

So the question is, is it safe to unlock BQL in whatever context (in
coroutines, or in a bottom half)?

If the answer is yes, we could make migration_incoming_state_destroy()
always not taking BQL (and assert(!bql_locked()) instead).

If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
version may not work either..

> +     *
> +     * Check proper BQL state here rather than risk possible deadlock later.
>       */
> +    assert(bql_locked());
>      qemu_loadvm_state_cleanup();
>  
>      if (mis->to_src_file) {
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7c1aa8ad7b9d..3e86b572cfa8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * in qemu_file, and thus we must be blocking now.
>       */
>      qemu_file_set_blocking(f, true);
> +
> +    /* TODO: sanity check that only postcopiable data will be loaded here */
>      load_res = qemu_loadvm_state_main(f, mis);
>  
>      /*
> @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       * (If something broke then qemu will have to exit anyway since it's
>       * got a bad migration state).
>       */
> +    bql_lock();
>      migration_incoming_state_destroy();
> +    bql_unlock();
>  
>      rcu_unregister_thread();
>      mis->have_listen_thread = false;
> 

-- 
Peter Xu


Reply via email to