On Fri, May 05, 2023 at 03:46:52PM +0200, Fiona Ebner wrote: > To fix it, ensure that the BQL is held during setup. To avoid changing > the behavior for migration too, introduce conditionals for the setup > callbacks that need the BQL and only take the lock if it's not already > held.
The major complexity of this patch is the "conditionally taking" part. Pure question: what is the benefit of not holding BQL always during save_setup(), if after all we have this coroutine issue to be solved? I can understand that it helps us to avoid taking BQL too long, but we'll need to take it anyway during ramblock dirty track initializations, and so far IIUC it's the major time to be consumed during setup(). Commit message of 9b0950375277467 says, "Only the migration_bitmap_sync() call needs the iothread lock". Firstly I think it's also covering enablement of dirty tracking: + qemu_mutex_lock_iothread(); + qemu_mutex_lock_ramlist(); + bytes_transferred = 0; + reset_ram_globals(); + memory_global_dirty_log_start(); migration_bitmap_sync(); + qemu_mutex_unlock_iothread(); And I think enablement itself can be slow too, maybe even slower than migration_bitmap_sync() especially with KVM_DIRTY_LOG_INITIALLY_SET supported in the kernel. Meanwhile I always got confused on why we need to sync dirty bitmap when setup at all. Say, what if we drop migration_bitmap_sync() here? After all, shouldn't all pages be dirty from start (ram_list_init_bitmaps())? This is slightly off-topic, but I'd like to know if someone can help answer. My whole point is still questioning whether we can unconditionally take bql during save_setup(). I could have missed something, though, where we want to do in setup() but avoid holding BQL. Help needed on figuring this out (and if there is, IMHO it'll be worthwhile to put that into comment of save_setup() hook). Thanks, -- Peter Xu