On 5.03.2025 17:37, Cédric Le Goater wrote:
On 3/5/25 17:15, Peter Xu wrote:
On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
On 5.03.2025 13:34, Peter Xu wrote:
On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>

All callers to migration_incoming_state_destroy() other than
postcopy_ram_listen_thread() do this call with BQL held.

Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
to always call that function under BQL rather than to have it deal with
both cases (with BQL and without BQL).
Add the necessary bql_lock() and bql_unlock() to
postcopy_ram_listen_thread().

qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
"load_state" SaveVMHandlers that are expecting BQL to be held.

In principle, the only devices that should be arriving on migration
channel serviced by postcopy_ram_listen_thread() are those that are
postcopiable and whose load handlers are safe to be called without BQL
being held.

But nothing currently prevents the source from sending data for "unsafe"
devices which would cause trouble there.
Add a TODO comment there so it's clear that it would be good to improve
handling of such (erroneous) case in the future.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
   migration/migration.c | 16 ++++++++++++++++
   migration/savevm.c    |  4 ++++
   2 files changed, 20 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 9e9db26667f1..6b2a8af4231d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
       struct MigrationIncomingState *mis = migration_incoming_get_current();
       multifd_recv_cleanup();
+
       /*
        * RAM state cleanup needs to happen after multifd cleanup, because
        * multifd threads can use some of its states (receivedmap).
+     *
+     * This call also needs BQL held since it calls all registered
+     * load_cleanup SaveVMHandlers and at least the VFIO implementation is
+     * BQL-sensitive.
+     *
+     * In addition to the above, it also performs cleanup of load threads
+     * thread pool.
+     * This cleanup operation is BQL-sensitive as it requires unlocking BQL
+     * so a thread possibly waiting for it could get unblocked and finally
+     * exit.
+     * The reason why a load thread may need to hold BQL in the first place
+     * is because address space modification operations require it.

Hold on...

This almost says exactly why load_cleanup() should _not_ take BQL... rather
than should..

So I had a closer look at the latest code, it's about this:

static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
{
      /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
      bql_unlock();
      WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
          while (multifd->load_bufs_thread_running) {
              multifd->load_bufs_thread_want_exit = true;

              qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
              qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
              qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
                             &multifd->load_bufs_mutex);
          }
      }
      bql_lock();
}

It doesn't make much sense to me to take it only because we want to drop it
unconditionally. Can we guarantee the function not taking BQL instead?  I
had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.

So the question is, is it safe to unlock BQL in whatever context (in
coroutines, or in a bottom half)?

If the answer is yes, we could make migration_incoming_state_destroy()
always not taking BQL (and assert(!bql_locked()) instead).

All the other callers of migration_incoming_state_destroy() are holding BQL:
process_incoming_migration_bh(), process_incoming_migration_co() (called on,
failure path only), load_snapshot() and qmp_xen_load_devices_state().

So AFAIK the safer way is to standardize on holding BQL when calling
that function.
If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
version may not work either..

I think the reason for BQL is to serialize access to the QEMU internals
which are not thread-safe.

So as long as these internals aren't touched when not holding BQL then
we should be safe - I don't see any particular state that's cached
around these BQL calls and would need explicit reloading after re-gaining
it.

OK, I checked with misterious force and looks like it's ok.

Would you please rephrase the comment, though?  I want to make it crystal
clear that what we're looking for is not holding BQL.. Maybe something like
this:

     /*
      * The VFIO load_cleanup() implementation is BQL-sensitive. It requires
      * BQL must NOT be taken when recycling load threads, so that it won't
      * block the load threads from making progress on address space
      * modification operations.
      *
      * To make it work, we could try to not take BQL for all load_cleanup(),
      * or conditionally unlock BQL only if bql_locked() in VFIO.
      *
      * Since most existing call sites take BQL for load_cleanup(), make
      * it simple by taking BQL always as the rule, so that VFIO can unlock
      * BQL and retake unconditionally.
      */

We may also want to update the subject.  Currently:

   "migration: postcopy_ram_listen_thread() should take BQL for some calls"

It's not accurate anymore, it could be:

   "migration: Always take BQL for migration_incoming_state_destroy()"

If with all above, please feel free to take:

Acked-by: Peter Xu <pet...@redhat.com>

I'm OK if it'll be touched up when merge too.

@Peter: Updated the comment and patch subject.
Added your review tag.

Maciej,

Could you please resend just that patch as a reply to the series ? No need to
resend the whole series.

@Cedric:
I've sent the updated patch.

I also updated the tree at GitLab with collected review tags and
updated version of this patch.

Thanks,

C.



Thanks,
Maciej


Reply via email to