On Tue, May 18, 2021 at 12:07:50PM +0200, Emanuele Giuseppe Esposito wrote: > This serie of patches aims to reduce the usage of the global > AioContexlock in block-copy, by introducing smaller granularity > locks thus on making the block layer thread safe. > > This serie depends on Paolo's coroutine_sleep API and my previous > serie that brings thread safety to the smaller API used by block-copy, > like ratelimit, progressmeter abd co-shared-resource. > > What's missing for block-copy to be fully thread-safe is fixing > the CoSleep API to allow cross-thread sleep and wakeup. > Paolo is working on it and will post the patches once his new > CoSleep API is accepted. > > Patch 1 introduces the .method field instead of .use_copy_range > and .copy_size, so that it can be later used as atomic. > Patch 2-3 provide comments and refactoring in preparation to > the locks added in patch 4 on BlockCopyTask, patch 5-6 on > BlockCopyCallState and 7 BlockCopyState. > > Based-on: <20210517100548.28806-1-pbonz...@redhat.com> > Based-on: <20210518094058.25952-1-eespo...@redhat.com> > Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com>
Here is my understanding of thread-safety in your https://gitlab.com/eesposit/qemu.git dataplane_new branch: Reading the code was much more satisfying than trying to review the patches. That's probably because I'm not familiar with the block-copy.c implementation and needed the context. I noticed you already addressed some of Vladimir's comments in your git branch, so that may have helped too. The backup block job and the backup-top filter driver have a BlockCopyState. QEMU threads that call bdrv_co_preadv/pwritev() invoke the block_copy() coroutine function, so BlockCopyState needs to be protected between threads. Additionally, the backup block job invokes block_copy_async() to perform a background copy operation. The relationships are as follows: BlockCopyState - shared state that any thread can access BlockCopyCallState - per-block_copy()/block_copy_async() state, not accessed by other coroutines/threads BlockCopyTask - per-aiotask state, find_conflicting_task_locked() accesses this for a given BlockCopyState What is the purpose of the BlockCopyState->calls list? Existing issue: the various block_copy_call_finished/succeeded/failed/cancelled/status() APIs are a bit extreme. They all end up being called by block/backup.c in succession when it seems like a single call should be enough to report the status. It's not immediately obvious to me that BlockCopyCallState needs to be thread-safe. So I wondered why finished needs to be atomic. Then I realized the set_speed/cancel code runs in the monitor, so at least block_copy_call_cancel() and block_copy_kick() need to be thread-safe. I guess the BlockCopyCallState status APIs were made thread-safe for consistency even though it's not needed at the moment? Please add doc comments to block-copy.h explaining the thread-safety of the APIs (it might be as simple as "all APIs are thread-safe" at the top of the file). Summarizing everything, this series adds BlockCopyState->lock to protect shared state and makes BlockCopyCallState's status atomic so it can be queried from threads other than the one performing the copy operation. Stefan
signature.asc
Description: PGP signature