On Wed, Mar 13, 2024 at 04:30:00PM +0100, Kevin Wolf wrote: > Calling job_pause_point() while holding the graph reader lock > potentially results in a deadlock: bdrv_graph_wrlock() first drains > everything, including the mirror job, which pauses it. The job is only > unpaused at the end of the drain section, which is when the graph writer > lock has been successfully taken. However, if the job happens to be > paused at a pause point where it still holds the reader lock, the writer > lock can't be taken as long as the job is still paused. > > Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. > > Cc: qemu-sta...@nongnu.org > Buglink: https://issues.redhat.com/browse/RHEL-28125 > Fixes: 004915a96a7a40e942ac85e6d22518cbcd283506 > Signed-off-by: Kevin Wolf <kw...@redhat.com> > --- > include/qemu/job.h | 2 +- > block/mirror.c | 10 ++++++---- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/include/qemu/job.h b/include/qemu/job.h > index 9ea98b5927..2b873f2576 100644 > --- a/include/qemu/job.h > +++ b/include/qemu/job.h > @@ -483,7 +483,7 @@ void job_enter(Job *job); > * > * Called with job_mutex *not* held. > */ > -void coroutine_fn job_pause_point(Job *job); > +void coroutine_fn GRAPH_UNLOCKED job_pause_point(Job *job); > > /** > * @job: The job that calls the function. > diff --git a/block/mirror.c b/block/mirror.c > index 5145eb53e1..1bdce3b657 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -479,9 +479,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t > offset, > return bytes_handled; > } > > -static void coroutine_fn GRAPH_RDLOCK mirror_iteration(MirrorBlockJob *s) > +static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s) > { > - BlockDriverState *source = s->mirror_top_bs->backing->bs; > + BlockDriverState *source; > MirrorOp *pseudo_op; > int64_t offset; > /* At least the first dirty chunk is mirrored in one iteration. */ > @@ -489,6 +489,10 @@ static void coroutine_fn GRAPH_RDLOCK > mirror_iteration(MirrorBlockJob *s) > bool write_zeroes_ok = > bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)); > int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES); > > + bdrv_graph_co_rdlock(); > + source = s->mirror_top_bs->backing->bs;
Is bdrv_ref(source) needed here so that source cannot go away if someone else write locks the graph and removes it? Or maybe something else protects against that. Either way, please add a comment that explains why this is safe. > + bdrv_graph_co_rdunlock(); > + > bdrv_dirty_bitmap_lock(s->dirty_bitmap); > offset = bdrv_dirty_iter_next(s->dbi); > if (offset < 0) { > @@ -1066,9 +1070,7 @@ static int coroutine_fn mirror_run(Job *job, Error > **errp) > mirror_wait_for_free_in_flight_slot(s); > continue; > } else if (cnt != 0) { > - bdrv_graph_co_rdlock(); > mirror_iteration(s); > - bdrv_graph_co_rdunlock(); > } > } > > -- > 2.44.0 >
signature.asc
Description: PGP signature