* Stefan Hajnoczi (stefa...@redhat.com) wrote: > On Mon, Oct 03, 2022 at 10:52:33AM +0200, Emanuele Giuseppe Esposito wrote: > > > > > > Am 30/09/2022 um 17:45 schrieb Kevin Wolf: > > > Am 30.09.2022 um 14:17 hat Emanuele Giuseppe Esposito geschrieben: > > >> Am 29/09/2022 um 17:30 schrieb Kevin Wolf: > > >>> Am 09.06.2022 um 15:44 hat Emanuele Giuseppe Esposito geschrieben: > > >>>> Remove usage of aio_context_acquire by always submitting work items > > >>>> to the current thread's ThreadPool. > > >>>> > > >>>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > > >>>> Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com> > > >>> > > >>> The thread pool is used by things outside of the file-* block drivers, > > >>> too. Even outside the block layer. Not all of these seem to submit work > > >>> in the same thread. > > >>> > > >>> > > >>> For example: > > >>> > > >>> postcopy_ram_listen_thread() -> qemu_loadvm_state_main() -> > > >>> qemu_loadvm_section_start_full() -> vmstate_load() -> > > >>> vmstate_load_state() -> spapr_nvdimm_flush_post_load(), which has: > > >>> > > >>> ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context()); > ^^^^^^^^^^^^^^^^^^^ > > aio_get_thread_pool() isn't thread safe either: > > ThreadPool *aio_get_thread_pool(AioContext *ctx) > { > if (!ctx->thread_pool) { > ctx->thread_pool = thread_pool_new(ctx); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Two threads could race in aio_get_thread_pool(). > > I think post-copy is broken here: it's calling code that was only > designed to be called from the main loop thread. > > I have CCed Juan and David.
In theory the path that you describe there shouldn't happen - although there is perhaps not enough protection on the load side to stop it happening if presented with a bad stream. This is documented in docs/devel/migration.rst under 'Destination behaviour'; but to recap, during postcopy load we have a problem that we need to be able to load incoming iterative (ie. RAM) pages during the loading of normal devices, because the loading of a device may access RAM that's not yet been transferred. To do that, the device state of all the non-iterative devices (which I think includes your spapr_nvdimm) is serialised into a separate migration stream and sent as a 'package'. We read the package off the stream on the main thread, but don't process it until we fire off the 'listen' thread - which you spotted the creation of above; the listen thread now takes over reading the migration stream to process RAM pages, and since it's in the same format, it calls qemu_loadvm_state_main() - but it doesn't expect any devices in that other than the RAM devices; it's just expecting RAM. In parallel with that, the main thread carries on loading the contents of the 'package' - and that contains your spapr_nvdimm device (and any other 'normal' devices); but that's OK because that's the main thread. Now if something was very broken and sent a header for the spapr-nvdimm down the main thread rather than into the package then, yes, we'd trigger your case, but that shouldn't happen. Dave > > >>> ... > > >>> thread_pool_submit_aio(pool, flush_worker_cb, state, > > >>> spapr_nvdimm_flush_completion_cb, state); > > >>> > > >>> So it seems to me that we may be submitting work for the main thread > > >>> from a postcopy migration thread. > > >>> > > >>> I believe the other direct callers of thread_pool_submit_aio() all > > >>> submit work for the main thread and also run in the main thread. > > >>> > > >>> > > >>> For thread_pool_submit_co(), pr_manager_execute() calls it with the pool > > >>> it gets passed as a parameter. This is still bdrv_get_aio_context(bs) in > > >>> hdev_co_ioctl() and should probably be changed the same way as for the > > >>> AIO call in file-posix, i.e. use qemu_get_current_aio_context(). > > >>> > > >>> > > >>> We could consider either asserting in thread_pool_submit_aio() that we > > >>> are really in the expected thread, or like I suggested for LinuxAio drop > > >>> the pool parameter and always get it from the current thread (obviously > > >>> this is only possible if migration could in fact schedule the work on > > >>> its current thread - if it schedules it on the main thread and then > > >>> exits the migration thread (which destroys the thread pool), that > > >>> wouldn't be good). > > >> > > >> Dumb question: why not extend the already-existing poll->lock to cover > > >> also the necessary fields like pool->head that are accessed by other > > >> threads (only case I could find with thread_pool_submit_aio is the one > > >> you pointed above)? > > > > > > Other people are more familiar with this code, but I believe this could > > > have performance implications. I seem to remember that this code is > > > careful to avoid locking to synchronise between worker threads and the > > > main thread. > > > > > > But looking at the patch again, I have actually a dumb question, too: > > > The locking you're removing is in thread_pool_completion_bh(). As this > > > is a BH, it's running the the ThreadPool's context either way, no matter > > > which thread called thread_pool_submit_aio(). > > > > > > I'm not sure what this aio_context_acquire/release pair is actually > > > supposed to protect. Paolo's commit 1919631e6b5 introduced it. Was it > > > just more careful than it needs to be? > > > > > > > I think the goal is still to protect pool->head, but if so the > > aiocontext lock is put in the wrong place, because as you said the bh is > > always run in the thread pool context. Otherwise it seems to make no sense. > > > > On the other side, thread_pool_submit_aio could be called by other > > threads on behalf of the main loop, which means pool->head could be > > modified (iothread calls thread_pool_submit_aio) while being read by the > > main loop (another worker thread schedules thread_pool_completion_bh). > > > > What are the performance implications? I mean, if the aiocontext lock in > > the bh is actually useful and the bh really has to wait to take it, > > being taken in much more places throughout the block layer won't be > > better than extending the poll->lock I guess. > > thread_pool_submit_aio() is missing documentation on how it is supposed > to be called. > > Taking pool->lock is conservative and fine in the short-term. > > In the longer term we need to clarify how thread_pool_submit_aio() is > supposed to be used and remove locking to protect pool->head if > possible. > > A bunch of the event loop APIs are thread-safe (aio_set_fd_handler(), > qemu_schedule_bh(), etc) so it's somewhat natural to make > thread_pool_submit_aio() thread-safe too. However, it would be nice to > avoid synchronization and existing callers mostly call it from the same > event loop thread that runs the BH and we can avoid locking in that > case. > > Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK