Am 14.12.2018 um 12:54 hat Denis Plotnikov geschrieben: > On 13.12.2018 15:20, Kevin Wolf wrote: > > Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben: > >> Sounds it should be so, but it doesn't work that way and that's why: > >> when doing mirror we may resume postponed coroutines too early when the > >> underlying bs is protected from writing at and thus we encounter the > >> assert on a write request execution at bdrv_co_write_req_prepare when > >> resuming the postponed coroutines. > >> > >> The thing is that the bs is protected for writing before execution of > >> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls > >> bdrv_replace_child_noperm which, in turn, calls child->role->drained_end > >> where one of the callbacks is blk_root_drained_end which check > >> if(--blk->quiesce_counter == 0) and runs the postponed requests > >> (coroutines) if the coundition is true. > > > > Hm, so something is messed up with the drain sections in the mirror > > driver. We have: > > > > bdrv_drained_begin(target_bs); > > bdrv_replace_node(to_replace, target_bs, &local_err); > > bdrv_drained_end(target_bs); > > > > Obviously, the intention was to keep the BlockBackend drained during > > bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0 > > inside bdrv_replace_node() when target_bs is drained? > > > > Looking at bdrv_replace_child_noperm(), it seems that the function has > > a bug: Even if old_bs and new_bs are both drained, the quiesce_counter > > for the parent reaches 0 for a moment because we call .drained_end for > > the old child first and .drained_begin for the new one later. > > > > So it seems the fix would be to reverse the order and first call > > .drained_begin for the new child and then .drained_end for the old > > child. Sounds like a good new testcase for tests/test-bdrv-drain.c, too. > Yes, it's true, but it's not enough...
Did you ever implement the changes suggested so far, so that we could continue from there? Or should I try and come up with something myself? > In mirror_exit_common() we actively manipulate with block driver states. > When we replaced a node in the snippet you showed we can't allow the > postponed coroutines to run because the block tree isn't ready to > receive the requests yet. > To be ready, we need to insert a proper block driver state to the block > backend which is done here > > blk_remove_bs(bjob->blk); > blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort); > blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << << > > bs_opaque->job = NULL; > > bdrv_drained_end(src); Did you actually encounter a bug here or is this just theory? bjob->blk is the BlockBackend of the job and isn't in use at this point any more. We only insert the old node in it again because block_job_free() must set bs->job = NULL, and it gets bs with blk_bs(bjob->blk). So if there is an actual bug here, I don't understand it yet. > If the tree isn't ready and we resume the coroutines, we'll end up with > the request landed in a wrong block driver state. > > So, we explicitly should stop all activities on all the driver states > and its parents and allow the activities when everything is ready to go. > > Why explicitly, because the block driver states may belong to different > block backends at the moment of the manipulation beginning. > > So, it seems we need to disable all their contexts until the > manipulation ends. If there actually is a bug, it is certainly not solved by calling aio_disable_external() (it is bad enough that this even exists), but by keeping the node drained. Kevin