On Mon, Jul 10, 2017 at 06:27:27PM +0200, Paolo Bonzini wrote: > On 10/07/2017 18:24, Stefan Hajnoczi wrote: > > On Thu, Jul 06, 2017 at 06:38:28PM +0200, Paolo Bonzini wrote: > >> Snapshots are only created/destroyed/loaded under the BQL, while no > >> other I/O is happening. Snapshot information could be accessed while > >> other I/O is happening, but also under the BQL so they cannot be > >> modified concurrently. The AioContext lock is overkill. If needed, > >> in the future the BQL could be split to a separate lock covering all > >> snapshot operations, and the create/destroy/goto callbacks changed > >> to run in a coroutine (so the driver can do mutual exclusion as usual). > >> > >> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > >> --- > >> block/snapshot.c | 28 +--------------------------- > >> blockdev.c | 43 ++++++++++++------------------------------- > >> hmp.c | 7 ------- > >> include/block/block_int.h | 5 +++++ > >> include/block/snapshot.h | 4 +--- > >> migration/savevm.c | 22 ---------------------- > >> monitor.c | 10 ++-------- > >> 7 files changed, 21 insertions(+), 98 deletions(-) > >> > >> diff --git a/block/snapshot.c b/block/snapshot.c > >> index a46564e7b7..08c59d6166 100644 > >> --- a/block/snapshot.c > >> +++ b/block/snapshot.c > >> @@ -384,9 +384,7 @@ int > >> bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs, > >> } > >> > >> > >> -/* Group operations. All block drivers are involved. > >> - * These functions will properly handle dataplane (take > >> aio_context_acquire > >> - * when appropriate for appropriate block drivers) */ > >> +/* Group operations. All block drivers are involved. */ > > > > Perhaps "These functions must be called under the BQL"? > > > > My concern with this patch series in general is that it will lead to > > bugs due to inconsistencies and lack of locking documentation: > > > > bdrv_all_delete_snapshot() is called by hmp_delvm() outside a > > bdrv_drained_begin() region. That's okay because internally > > bdrv_snapshot_delete() will call bdrv_drained_begin() for the crucial > > operations that require a quiesced BDS. > > > > Compare that with bdrv_all_goto_snapshot(), which is called inside a > > bdrv_drained_begin() region by load_snapshot(). Internally it doesn't > > drain. > > I think generally we should move bdrv_drained_begin/end calls _out_ of > block.c and into qmp_*. If you agree, I can add this before this patch.
Yes, I think drained regions should be in callers. That way callers can perform multiple operations in sequence and the result is atomic. > > Previously the bdrv_all_*() functions behaved consistently. We could > > say that they will acquire AioContexts themselves. Now they behave > > inconsistently and while the code currently happens to work, there is no > > structure that will keep it working as it is modified. > > > > I think we're reaching a point where every BlockDriver callback and > > every bdrv_*() function needs the following information: > > > > 1. Must be called under BQL? > > 2. Can I/O requests be in flight? > > 3. Is it thread-safe? > > > > Otherwise it will be a nightmare to modify the code since these > > constraints are not enforced by the compiler and undocumented. > > Good point. Are (1) and (3) different ways to say the same thing or do > you have other differences in mind? There is a difference between (1) and (3). If a function is thread-safe then callers can invoke it without any constraints. If a function requires a specific lock to be held then that it puts a constraint on the caller. We do have thread-safe functions in the block layer: simple functions that fetch a BDS field atomically. There is no need to hold the BQL for them.
signature.asc
Description: PGP signature