Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 4.02.2025 16:39, Peter Xu wrote: On Tue, Feb 04, 2025 at 03:57:37PM +0100, Maciej S. Szmigiero wrote: The vfio_migration_cleanup() used to just close a migration FD, while RAM might end up calling qemu_ram_msync(), which sounds like something that should be called under BQL. But I am not sure whether that lack of BQL around qemu_ram_msync() actually causes problems. I believe msync() is thread-safe. So that doesn't need BQL, AFAICT. Personally I actually prefer not having the BQL requirement if ever possible in any vmstate hooks. I think the only challenge here is if VFIO will start to need BQL for some specific code path that you added in this series, it means VFIO needs to detect bql_locked() to make sure it won't deadlock.. and only take BQL if it's not taken. From that POV, it might be easier for you to define that hook as "always do cleanup() with BQL" globally, just to avoid one bql_locked() usage in vfio specific hook. We pay that with slow RAM sync in corner cases like pmem that could potentially block VM from making progress (e.g. vcpu concurrently accessing MMIO regions). Not only the VFIO load_cleanup hook is BQL-sensitive in this patch set but also the load threads pool cleanup handler. And migration_incoming_state_destroy() was already called with BQL everywhere but in the postcopy thread so I think that any BQL-related performance issue would already be uncovered by its other callers. Thanks, Thanks, Maciej
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Tue, Feb 04, 2025 at 03:57:37PM +0100, Maciej S. Szmigiero wrote: > The vfio_migration_cleanup() used to just close a migration FD, while > RAM might end up calling qemu_ram_msync(), which sounds like something > that should be called under BQL. > > But I am not sure whether that lack of BQL around qemu_ram_msync() > actually causes problems. I believe msync() is thread-safe. So that doesn't need BQL, AFAICT. Personally I actually prefer not having the BQL requirement if ever possible in any vmstate hooks. I think the only challenge here is if VFIO will start to need BQL for some specific code path that you added in this series, it means VFIO needs to detect bql_locked() to make sure it won't deadlock.. and only take BQL if it's not taken. >From that POV, it might be easier for you to define that hook as "always do cleanup() with BQL" globally, just to avoid one bql_locked() usage in vfio specific hook. We pay that with slow RAM sync in corner cases like pmem that could potentially block VM from making progress (e.g. vcpu concurrently accessing MMIO regions). Thanks, -- Peter Xu
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 4.02.2025 00:02, Peter Xu wrote: On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote: On 3.02.2025 21:36, Peter Xu wrote: On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: On 3.02.2025 20:58, Peter Xu wrote: On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/[email protected]/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/[email protected]/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 The commit message may still need some touch ups, e.g.: postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. This senten
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 10:41:43PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 21:36, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > > > On 3.02.2025 20:58, Peter Xu wrote: > > > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero ([email protected]) wrote: > > > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > > > * Maciej S. Szmigiero ([email protected]) wrote: > > > > > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it > > > > > > > > > needs to > > > > > > > > > take BQL around function calls to migration methods requiring > > > > > > > > > BQL. > > > > > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately > > > > > > > > > calls > > > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it > > > > > > > > > ultimately calls > > > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > > > > > > > > > > > > --- > > > > > > > > > migration/savevm.c | 4 > > > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > > > --- a/migration/savevm.c > > > > > > > > > +++ b/migration/savevm.c > > > > > > > > > @@ -2013,7 +2013,9 @@ static void > > > > > > > > > *postcopy_ram_listen_thread(void *opaque) > > > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > > > */ > > > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > > > +bql_lock(); > > > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > > > +bql_unlock(); > > > > > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > > > > > I've described this case in more detail in my reply to Fabiano > > > > > > > here: > > > > > > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > > > > > > > > > While it might be the cause in this case, my feeling is it's more > > > > > > fundamental > > > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > > > thread. As the destination is running, after it loads it's devices > > > > > > and as it starts up the destination will be still loading RAM > > > > > > (and other postcopiable devices) potentially for quite a while. > > > > > > Holding the bql around the ram listen thread means that the > > > > > > execution of the destination won't be able to take that lock > > > > > > until the postcopy load has finished; so while that might apparently > > > > > > complete, it'll lead to the destination stalling until that's > > > > > > finished > > > > > > which defeats the whole point of postcopy. > > > > > > That last one probably won't fail a test but it will lead to a long > > > > > > stall > > > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > > > changing. > > > > > > > > > > Okay, I understand the postcopy case/flow now. > > > > > Thanks for explaining it clearly. > > > > > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > > > with BQL held since implementations apparently expect it that way: > > > > > > > for example, I think PCI device configuration restore calls > > > > > > > address space manipulation methods which abort() if called > > > > > > > without BQL held. > > > > > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > > > that the postcopy_ram_listen_thread is reading from are those > > > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > > > Those load handlers are safe to be run while the other devices > > > > > > are being changed. Note the *should* - you could add a check > > > > > > to fail if any other device arrives on that channel. > > > > > > > > > > I think ultimately there should be either an explicit check, or, > > > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > > > that runs without BQL held. > > > > > > > > To me those are bugs happening during postcopy, so those abort()s in > > > > memory.c are indeed for catching these issues too. > > > > > > > > > Since the current state of just running these SaveVMHandlers > > > > > without BQL in this case and hoping
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 3.02.2025 21:36, Peter Xu wrote: On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: On 3.02.2025 20:58, Peter Xu wrote: On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/[email protected]/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/[email protected]/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 The commit message may still need some touch ups, e.g.: postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. This sentence is still not correct, IMHO. As Dave explained, the ram load thread is designed to run without BQL at least for t
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote: > On 3.02.2025 20:58, Peter Xu wrote: > > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > > > * Maciej S. Szmigiero ([email protected]) wrote: > > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > > > * Maciej S. Szmigiero ([email protected]) wrote: > > > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it > > > > > > > needs to > > > > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it > > > > > > > ultimately calls > > > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > > > --- > > > > > > > migration/savevm.c | 4 > > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > > > --- a/migration/savevm.c > > > > > > > +++ b/migration/savevm.c > > > > > > > @@ -2013,7 +2013,9 @@ static void > > > > > > > *postcopy_ram_listen_thread(void *opaque) > > > > > > > * in qemu_file, and thus we must be blocking now. > > > > > > > */ > > > > > > > qemu_file_set_blocking(f, true); > > > > > > > +bql_lock(); > > > > > > > load_res = qemu_loadvm_state_main(f, mis); > > > > > > > +bql_unlock(); > > > > > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > > > think the reason for that is qemu_loadvm_state_main() and > > > > > its children don't drop BQL while waiting for I/O. > > > > > > > > > > I've described this case in more detail in my reply to Fabiano here: > > > > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > > > > > While it might be the cause in this case, my feeling is it's more > > > > fundamental > > > > here - it's the whole reason that postcopy has a separate ram listen > > > > thread. As the destination is running, after it loads it's devices > > > > and as it starts up the destination will be still loading RAM > > > > (and other postcopiable devices) potentially for quite a while. > > > > Holding the bql around the ram listen thread means that the > > > > execution of the destination won't be able to take that lock > > > > until the postcopy load has finished; so while that might apparently > > > > complete, it'll lead to the destination stalling until that's finished > > > > which defeats the whole point of postcopy. > > > > That last one probably won't fail a test but it will lead to a long > > > > stall > > > > if you give it a nice big guest with lots of RAM that it's rapidly > > > > changing. > > > > > > Okay, I understand the postcopy case/flow now. > > > Thanks for explaining it clearly. > > > > > > > > I still think that "load_state" SaveVMHandlers need to be called > > > > > with BQL held since implementations apparently expect it that way: > > > > > for example, I think PCI device configuration restore calls > > > > > address space manipulation methods which abort() if called > > > > > without BQL held. > > > > > > > > However, the only devices that *should* be arriving on the channel > > > > that the postcopy_ram_listen_thread is reading from are those > > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > > > Those load handlers are safe to be run while the other devices > > > > are being changed. Note the *should* - you could add a check > > > > to fail if any other device arrives on that channel. > > > > > > I think ultimately there should be either an explicit check, or, > > > as you suggest in the paragraph below, a separate SaveVMHandler > > > that runs without BQL held. > > > > To me those are bugs happening during postcopy, so those abort()s in > > memory.c are indeed for catching these issues too. > > > > > Since the current state of just running these SaveVMHandlers > > > without BQL in this case and hoping that nothing breaks is > > > clearly sub-optimal. > > > > > > > > I have previously even submitted a patch to explicitly document > > > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > > > included in the previous version of this patch set) and it > > > > > received a "Reviewed-by:" tag: > > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 3.02.2025 20:58, Peter Xu wrote: On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/[email protected]/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/[email protected]/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 The commit message may still need some touch ups, e.g.: postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. This sentence is still not correct, IMHO. As Dave explained, the ram load thread is designed to run without BQL at least for the major workloads it runs. So what's your proposed wording of this commit then? I don't worry on src sending s
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote: > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: > > * Maciej S. Szmigiero ([email protected]) wrote: > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > > > * Maciej S. Szmigiero ([email protected]) wrote: > > > > > From: "Maciej S. Szmigiero" > > > > > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to > > > > > take BQL around function calls to migration methods requiring BQL. > > > > > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > > > "load_state" SaveVMHandlers. > > > > > > > > > > migration_incoming_state_destroy() needs BQL held since it ultimately > > > > > calls > > > > > "load_cleanup" SaveVMHandlers. > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero > > > > > --- > > > > >migration/savevm.c | 4 > > > > >1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > index b0b74140daea..0ceea9638cc1 100644 > > > > > --- a/migration/savevm.c > > > > > +++ b/migration/savevm.c > > > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void > > > > > *opaque) > > > > > * in qemu_file, and thus we must be blocking now. > > > > > */ > > > > >qemu_file_set_blocking(f, true); > > > > > +bql_lock(); > > > > >load_res = qemu_loadvm_state_main(f, mis); > > > > > +bql_unlock(); > > > > > > > > Doesn't that leave that held for a heck of a long time? > > > > > > Yes, and it effectively broke "postcopy recover" test but I > > > think the reason for that is qemu_loadvm_state_main() and > > > its children don't drop BQL while waiting for I/O. > > > > > > I've described this case in more detail in my reply to Fabiano here: > > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > While it might be the cause in this case, my feeling is it's more > > fundamental > > here - it's the whole reason that postcopy has a separate ram listen > > thread. As the destination is running, after it loads it's devices > > and as it starts up the destination will be still loading RAM > > (and other postcopiable devices) potentially for quite a while. > > Holding the bql around the ram listen thread means that the > > execution of the destination won't be able to take that lock > > until the postcopy load has finished; so while that might apparently > > complete, it'll lead to the destination stalling until that's finished > > which defeats the whole point of postcopy. > > That last one probably won't fail a test but it will lead to a long stall > > if you give it a nice big guest with lots of RAM that it's rapidly > > changing. > > Okay, I understand the postcopy case/flow now. > Thanks for explaining it clearly. > > > > I still think that "load_state" SaveVMHandlers need to be called > > > with BQL held since implementations apparently expect it that way: > > > for example, I think PCI device configuration restore calls > > > address space manipulation methods which abort() if called > > > without BQL held. > > > > However, the only devices that *should* be arriving on the channel > > that the postcopy_ram_listen_thread is reading from are those > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). > > Those load handlers are safe to be run while the other devices > > are being changed. Note the *should* - you could add a check > > to fail if any other device arrives on that channel. > > I think ultimately there should be either an explicit check, or, > as you suggest in the paragraph below, a separate SaveVMHandler > that runs without BQL held. To me those are bugs happening during postcopy, so those abort()s in memory.c are indeed for catching these issues too. > Since the current state of just running these SaveVMHandlers > without BQL in this case and hoping that nothing breaks is > clearly sub-optimal. > > > > I have previously even submitted a patch to explicitly document > > > "load_state" SaveVMHandler as requiring BQL (which was also > > > included in the previous version of this patch set) and it > > > received a "Reviewed-by:" tag: > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ > > > https://lore.kernel.org/qemu-devel/[email protected]/ > > > > It happens! > > You could make this safer by having a load_state and a load_state_postcopy > > member, and only mark the load_state as requiring the lock. > > To not digress too much from the subject of this patch set > (multifd VFIO device state transfer) for now I've just updated the > TODO comment around that qemu_loadvm_state_main(), so hopefully this > discussion won't get forgotten: > https://gitlab
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 2.02.2025 13:45, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/[email protected]/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. Okay, I understand the postcopy case/flow now. Thanks for explaining it clearly. I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. I think ultimately there should be either an explicit check, or, as you suggest in the paragraph below, a separate SaveVMHandler that runs without BQL held. Since the current state of just running these SaveVMHandlers without BQL in this case and hoping that nothing breaks is clearly sub-optimal. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/[email protected]/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. To not digress too much from the subject of this patch set (multifd VFIO device state transfer) for now I've just updated the TODO comment around that qemu_loadvm_state_main(), so hopefully this discussion won't get forgotten: https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79 (..) That RAM loading has to happen in parallel with the loading of devices doesn't it - especially if one of the devices being loaded touches RAM. (I wish this series had a description in the cover letter!) I guess you mean "more detailed description" since there's a paragraph about this patch in this series cover letter change log: * postcopy_ram_listen_thread() now takes BQL around function calls that ultimately call migration methods requiring BQL. This fixes one of QEMU tests failing when explicitly BQL-sensitive code is added later to these methods. I meant a higher level description of what the series is doing. There was a general overview what this series
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
* Maciej S. Szmigiero ([email protected]) wrote: > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: > > * Maciej S. Szmigiero ([email protected]) wrote: > > > From: "Maciej S. Szmigiero" > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to > > > take BQL around function calls to migration methods requiring BQL. > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > > > "load_state" SaveVMHandlers. > > > > > > migration_incoming_state_destroy() needs BQL held since it ultimately > > > calls > > > "load_cleanup" SaveVMHandlers. > > > > > > Signed-off-by: Maciej S. Szmigiero > > > --- > > > migration/savevm.c | 4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > index b0b74140daea..0ceea9638cc1 100644 > > > --- a/migration/savevm.c > > > +++ b/migration/savevm.c > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void > > > *opaque) > > >* in qemu_file, and thus we must be blocking now. > > >*/ > > > qemu_file_set_blocking(f, true); > > > +bql_lock(); > > > load_res = qemu_loadvm_state_main(f, mis); > > > +bql_unlock(); > > > > Doesn't that leave that held for a heck of a long time? > > Yes, and it effectively broke "postcopy recover" test but I > think the reason for that is qemu_loadvm_state_main() and > its children don't drop BQL while waiting for I/O. > > I've described this case in more detail in my reply to Fabiano here: > https://lore.kernel.org/qemu-devel/[email protected]/ While it might be the cause in this case, my feeling is it's more fundamental here - it's the whole reason that postcopy has a separate ram listen thread. As the destination is running, after it loads it's devices and as it starts up the destination will be still loading RAM (and other postcopiable devices) potentially for quite a while. Holding the bql around the ram listen thread means that the execution of the destination won't be able to take that lock until the postcopy load has finished; so while that might apparently complete, it'll lead to the destination stalling until that's finished which defeats the whole point of postcopy. That last one probably won't fail a test but it will lead to a long stall if you give it a nice big guest with lots of RAM that it's rapidly changing. > I still think that "load_state" SaveVMHandlers need to be called > with BQL held since implementations apparently expect it that way: > for example, I think PCI device configuration restore calls > address space manipulation methods which abort() if called > without BQL held. However, the only devices that *should* be arriving on the channel that the postcopy_ram_listen_thread is reading from are those that are postcopiable (i.e. RAM and hmm block's dirty_bitmap). Those load handlers are safe to be run while the other devices are being changed. Note the *should* - you could add a check to fail if any other device arrives on that channel. > I have previously even submitted a patch to explicitly document > "load_state" SaveVMHandler as requiring BQL (which was also > included in the previous version of this patch set) and it > received a "Reviewed-by:" tag: > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ > https://lore.kernel.org/qemu-devel/[email protected]/ It happens! You could make this safer by having a load_state and a load_state_postcopy member, and only mark the load_state as requiring the lock. > It's also worth noting that COLO equivalent of postcopy > incoming thread (colo_process_incoming_thread()) explicitly > takes BQL around qemu_loadvm_state_main(): > > bql_lock(); > > cpu_synchronize_all_states(); > > ret = qemu_loadvm_state_main(mis->from_src_file, mis); > > bql_unlock(); > It's not a straight equivalent; it's about a decade since I've thought about COLO, so I can't quite remember when that thread runs. > > That RAM loading has to happen in parallel with the loading of > > devices doesn't it - especially if one of the devices > > being loaded touches RAM. > > > > (I wish this series had a description in the cover letter!) > > I guess you mean "more detailed description" since there's > a paragraph about this patch in this series cover letter change log: > > * postcopy_ram_listen_thread() now takes BQL around function calls that > > ultimately call migration methods requiring BQL. > > This fixes one of QEMU tests failing when explicitly BQL-sensitive code > > is added later to these methods. I meant a higher level description of what the series is doing. Dave > > > Dave > > Thanks, > Maciej > > -- -Open up your eyes, open up your mind, open up your code
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
On 2.02.2025 03:06, Dr. David Alan Gilbert wrote: * Maciej S. Szmigiero ([email protected]) wrote: From: "Maciej S. Szmigiero" postcopy_ram_listen_thread() is a free running thread, so it needs to take BQL around function calls to migration methods requiring BQL. qemu_loadvm_state_main() needs BQL held since it ultimately calls "load_state" SaveVMHandlers. migration_incoming_state_destroy() needs BQL held since it ultimately calls "load_cleanup" SaveVMHandlers. Signed-off-by: Maciej S. Szmigiero --- migration/savevm.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index b0b74140daea..0ceea9638cc1 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); +bql_lock(); load_res = qemu_loadvm_state_main(f, mis); +bql_unlock(); Doesn't that leave that held for a heck of a long time? Yes, and it effectively broke "postcopy recover" test but I think the reason for that is qemu_loadvm_state_main() and its children don't drop BQL while waiting for I/O. I've described this case in more detail in my reply to Fabiano here: https://lore.kernel.org/qemu-devel/[email protected]/ I still think that "load_state" SaveVMHandlers need to be called with BQL held since implementations apparently expect it that way: for example, I think PCI device configuration restore calls address space manipulation methods which abort() if called without BQL held. I have previously even submitted a patch to explicitly document "load_state" SaveVMHandler as requiring BQL (which was also included in the previous version of this patch set) and it received a "Reviewed-by:" tag: https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/ https://lore.kernel.org/qemu-devel/[email protected]/ It's also worth noting that COLO equivalent of postcopy incoming thread (colo_process_incoming_thread()) explicitly takes BQL around qemu_loadvm_state_main(): bql_lock(); cpu_synchronize_all_states(); ret = qemu_loadvm_state_main(mis->from_src_file, mis); bql_unlock(); That RAM loading has to happen in parallel with the loading of devices doesn't it - especially if one of the devices being loaded touches RAM. (I wish this series had a description in the cover letter!) I guess you mean "more detailed description" since there's a paragraph about this patch in this series cover letter change log: * postcopy_ram_listen_thread() now takes BQL around function calls that ultimately call migration methods requiring BQL. This fixes one of QEMU tests failing when explicitly BQL-sensitive code is added later to these methods. Dave Thanks, Maciej
Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
* Maciej S. Szmigiero ([email protected]) wrote: > From: "Maciej S. Szmigiero" > > postcopy_ram_listen_thread() is a free running thread, so it needs to > take BQL around function calls to migration methods requiring BQL. > > qemu_loadvm_state_main() needs BQL held since it ultimately calls > "load_state" SaveVMHandlers. > > migration_incoming_state_destroy() needs BQL held since it ultimately calls > "load_cleanup" SaveVMHandlers. > > Signed-off-by: Maciej S. Szmigiero > --- > migration/savevm.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/migration/savevm.c b/migration/savevm.c > index b0b74140daea..0ceea9638cc1 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > * in qemu_file, and thus we must be blocking now. > */ > qemu_file_set_blocking(f, true); > +bql_lock(); > load_res = qemu_loadvm_state_main(f, mis); > +bql_unlock(); Doesn't that leave that held for a heck of a long time? That RAM loading has to happen in parallel with the loading of devices doesn't it - especially if one of the devices being loaded touches RAM. (I wish this series had a description in the cover letter!) Dave > /* > * This is tricky, but, mis->from_src_file can change after it > @@ -2073,7 +2075,9 @@ static void *postcopy_ram_listen_thread(void *opaque) > * (If something broke then qemu will have to exit anyway since it's > * got a bad migration state). > */ > +bql_lock(); > migration_incoming_state_destroy(); > +bql_unlock(); > > rcu_unregister_thread(); > mis->have_listen_thread = false; > -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
