On Thu, Mar 24, 2022 at 10:09 AM Hanna Reitz <hre...@redhat.com> wrote: > > When the stream block job cuts out the nodes between top and base in > stream_prepare(), it does not drain the subtree manually; it fetches the > base node, and tries to insert it as the top node's backing node with > bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so > the actual base node might change (because the base node is actually not > part of the stream job) before the old base node passed to > bdrv_set_backing_hd() is installed. > > This has two implications: > > First, the stream job does not keep a strong reference to the base node. > Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g. > because some other block job is drained to finish), we will get a > use-after-free. We should keep a strong reference to that node.
Does that introduce any possibility of deadlock if we're now keeping a strong reference? I guess not, the job can just delete its own reference and everything's ... fine, right? > > Second, even with such a strong reference, the problem remains that the > base node might change before bdrv_set_backing_hd() actually runs and as > a result the wrong base node is installed. ow > > Both effects can be seen in 030's TestParallelOps.test_overlapping_5() > case, which has five nodes, and simultaneously streams from the middle > node to the top node, and commits the middle node down to the base node. > As it is, this will sometimes crash, namely when we encounter the > above-described use-after-free. Is there a BZ# or is this an occasional failure in 030 you saw? What does failure look like? Does it require anything special to trigger? > > Taking a strong reference to the base node, we no longer get a crash, > but the resuling block graph is less than ideal: The expected result is > obviously that all middle nodes are cut out and the base node is the > immediate backing child of the top node. However, if stream_prepare() > takes a strong reference to its base node (the middle node), and then > the commit job finishes in bdrv_set_backing_hd(), supposedly dropping > that middle node, the stream job will just reinstall it again. > > Therefore, we need to keep the whole subtree drained in > stream_prepare(), so that the graph modification it performs is > effectively atomic, i.e. that the base node it fetches is still the base > node when bdrv_set_backing_hd() sets it as the top node's backing node. > > Verify this by asserting in said 030's test case that the base node is > always the top node's immediate backing child when both jobs are done. > (Off-topic: Sometimes I dream about having a block graph rendering tool that can update step-by-step, so I can visualize what these block operations look like. My "working memory" is kind of limited and I get sidetracked too easily tracing code. That we have the ability to render at a single point is pretty nice, but it's still hard to get images from intermediate steps when things happen in tight sequence without the chance for intervention.) > Signed-off-by: Hanna Reitz <hre...@redhat.com> > --- > v2: > - Oops, the base can be NULL. Would have noticed if I had ran all test > cases from 030, and not just test_overlapping_5()... > That means that keeping a strong reference to the base node must be > conditional, based on whether there even is a base node or not. > (I mean, technically we do not even need to keep a strong reference to > that node, given that we are in a drained section, but I believe it is > better style to do it anyway.) > --- > block/stream.c | 15 ++++++++++++++- > tests/qemu-iotests/030 | 5 +++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/block/stream.c b/block/stream.c > index 3acb59fe6a..694709bd25 100644 > --- a/block/stream.c > +++ b/block/stream.c > @@ -64,7 +64,13 @@ static int stream_prepare(Job *job) > bdrv_cor_filter_drop(s->cor_filter_bs); > s->cor_filter_bs = NULL; > > + bdrv_subtree_drained_begin(s->above_base); > + > base = bdrv_filter_or_cow_bs(s->above_base); > + if (base) { > + bdrv_ref(base); > + } > + > unfiltered_base = bdrv_skip_filters(base); > > if (bdrv_cow_child(unfiltered_bs)) { > @@ -75,14 +81,21 @@ static int stream_prepare(Job *job) > base_fmt = unfiltered_base->drv->format_name; > } > } > + > bdrv_set_backing_hd(unfiltered_bs, base, &local_err); > ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt, > false); > if (local_err) { > error_report_err(local_err); > - return -EPERM; > + ret = -EPERM; > + goto out; > } > } > > +out: > + if (base) { > + bdrv_unref(base); > + } > + bdrv_subtree_drained_end(s->above_base); > return ret; > } > > diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 > index 567bf1da67..14112835ed 100755 > --- a/tests/qemu-iotests/030 > +++ b/tests/qemu-iotests/030 > @@ -436,6 +436,11 @@ class TestParallelOps(iotests.QMPTestCase): > self.vm.run_job(job='node4', auto_dismiss=True) > self.assert_no_active_block_jobs() > > + # Assert that node0 is now the backing node of node4 > + result = self.vm.qmp('query-named-block-nodes') > + node4 = next(node for node in result['return'] if node['node-name'] > == 'node4') > + self.assertEqual(node4['image']['backing-image']['filename'], > self.imgs[0]) > + > # Test a block-stream and a block-commit job in parallel > # Here the stream job is supposed to finish quickly in order to reproduce > # the scenario that triggers the bug fixed in 3d5d319e1221 and > 1a63a907507 > -- > 2.35.1 > Seems reasonable, but the best I can give right now is an ACK because I'm a little rusty with block graph ops ... --js