On Thu 16 Nov 2017 05:09:59 PM CET, Anton Nefedov wrote: >>>> I have the impression that one major source of headaches is the >>>> fact that the reopen queue contains nodes that don't need to be >>>> reopened at all. Ideally this should be detected early on in >>>> bdrv_reopen_queue(), so there's no chance that the queue contains >>>> nodes used by a different block job. If we had that then op >>>> blockers should be enough to prevent these things. Or am I missing >>>> something? >>>> >>> After applying Max's patch I tried the similar approach; that is >>> keep BDSes referenced while they are in the reopen queue. Now I get >>> the stream job hanging. Somehow one blk_root_drained_begin() is not >>> paired with blk_root_drained_end(). So the job stays paused. >> >> I can see this if I apply Max's patch and keep refs to BDSs in the >> reopen queue: >> >> #0 block_job_pause (...) at blockjob.c:130 >> #1 0x000055c143cb586d in block_job_drained_begin (...) at blockjob.c:227 >> #2 0x000055c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887 >> #3 0x000055c143cb69db in block_job_create (...) at blockjob.c:678 >> #4 0x000055c143d17c0c in mirror_start_job (...) at block/mirror.c:1177 >> >> There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that >> doesn't seem to be paired. > > My understanding for now is > > 1. in bdrv_drain_all_begin(), BDS gets drained with > bdrv_parent_drained_begin(), all the parents get > blk_root_drained_begin(), pause their jobs. > 2. one of the jobs finishes and deletes BB. > 3. in bdrv_drain_all_end(), bdrv_parent_drained_end() is never > called because even though BDS still exists (referenced in the > queue), it cannot be accessed as bdrv_next() takes BDS from the > global BB list (and the corresponding BB is deleted).
Yes, I was debugging this and I got to a similar conclusion. With test_stream_commit from iotest 030 I can see that 1. the block-stream job is created first, then stream_run begins and starts copying the data. 2. block-commit starts and tries to reopen the top image in read-write mode. This pauses the stream block job and drains all BDSs. 3. The draining causes the stream job to finish, it is deferred to the main loop, then stream_complete finishes and unrefs the block job, deleting it. At the point of deletion the pause count was still > 0 (because of step (2)) > Max's patch v1 could have helped: > http://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg01835.html This doesn't solve the problem. > Or, perhaps another approach, keep BlockJob referenced while it is > paused (by block_job_pause/resume_all()). That should prevent it from > deleting the BB. Yes, I tried this and it actually solves the issue. But I still think that the problem is that block jobs are allowed to finish when they are paused. Adding block_job_pause_point(&s->common) at the end of stream_run() fixes the problem too. Berto