On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.
qemu_loadvm_state_main() needs BQL held since it ultimately calls
"load_state" SaveVMHandlers.
migration_incoming_state_destroy() needs BQL held since it ultimately calls
"load_cleanup" SaveVMHandlers.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
migration/savevm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/migration/savevm.c b/migration/savevm.c
index b0b74140daea..0ceea9638cc1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
* in qemu_file, and thus we must be blocking now.
*/
qemu_file_set_blocking(f, true);
+ bql_lock();
load_res = qemu_loadvm_state_main(f, mis);
+ bql_unlock();
Doesn't that leave that held for a heck of a long time?
Yes, and it effectively broke "postcopy recover" test but I
think the reason for that is qemu_loadvm_state_main() and
its children don't drop BQL while waiting for I/O.
I've described this case in more detail in my reply to Fabiano here:
https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
I still think that "load_state" SaveVMHandlers need to be called
with BQL held since implementations apparently expect it that way:
for example, I think PCI device configuration restore calls
address space manipulation methods which abort() if called
without BQL held.
I have previously even submitted a patch to explicitly document
"load_state" SaveVMHandler as requiring BQL (which was also
included in the previous version of this patch set) and it
received a "Reviewed-by:" tag:
https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/87o732bti7....@suse.de/
It's also worth noting that COLO equivalent of postcopy
incoming thread (colo_process_incoming_thread()) explicitly
takes BQL around qemu_loadvm_state_main():
bql_lock();
cpu_synchronize_all_states();
ret = qemu_loadvm_state_main(mis->from_src_file, mis);
bql_unlock();
That RAM loading has to happen in parallel with the loading of
devices doesn't it - especially if one of the devices
being loaded touches RAM.
(I wish this series had a description in the cover letter!)
I guess you mean "more detailed description" since there's
a paragraph about this patch in this series cover letter change log:
* postcopy_ram_listen_thread() now takes BQL around function calls that
ultimately call migration methods requiring BQL.
This fixes one of QEMU tests failing when explicitly BQL-sensitive code
is added later to these methods.
Dave
Thanks,
Maciej