On Wed, Dec 15, 2021 at 04:31:26PM +0100, Kevin Wolf wrote: > Am 15.12.2021 um 12:28 hat Stefan Hajnoczi geschrieben: > > On Tue, Dec 14, 2021 at 03:59:49PM +0100, Kevin Wolf wrote: > > > Am 14.12.2021 um 15:35 hat Stefan Hajnoczi geschrieben: > > > > The BlockBackend root child can change when aio_poll() is invoked. This > > > > happens when a temporary filter node is removed upon blockjob > > > > completion, for example. > > > > > > > > Functions in block/block-backend.c must be aware of this when using a > > > > blk_bs() pointer across aio_poll() because the BlockDriverState refcnt > > > > may reach 0, resulting in a stale pointer. > > > > > > > > One example is scsi_device_purge_requests(), which calls blk_drain() to > > > > wait for in-flight requests to cancel. If the backup blockjob is active, > > > > then the BlockBackend root child is a temporary filter BDS owned by the > > > > blockjob. The blockjob can complete during bdrv_drained_begin() and the > > > > last reference to the BDS is released when the temporary filter node is > > > > removed. This results in a use-after-free when blk_drain() calls > > > > bdrv_drained_end(bs) on the dangling pointer. > > > > > > > > Explicitly hold a reference to bs across block APIs that invoke > > > > aio_poll(). > > > > > > > > Signed-off-by: Stefan Hajnoczi <[email protected]> > > > > --- > > > > v2: > > > > - Audit block/block-backend.c and fix additional cases > > > > --- > > > > block/block-backend.c | 11 +++++++++++ > > > > 1 file changed, 11 insertions(+) > > > > > > > > diff --git a/block/block-backend.c b/block/block-backend.c > > > > index 12ef80ea17..a40ad7fa92 100644 > > > > --- a/block/block-backend.c > > > > +++ b/block/block-backend.c > > > > @@ -828,10 +828,12 @@ void blk_remove_bs(BlockBackend *blk) > > > > notifier_list_notify(&blk->remove_bs_notifiers, blk); > > > > if (tgm->throttle_state) { > > > > bs = blk_bs(blk); > > > > + bdrv_ref(bs); > > > > bdrv_drained_begin(bs); > > > > throttle_group_detach_aio_context(tgm); > > > > throttle_group_attach_aio_context(tgm, qemu_get_aio_context()); > > > > bdrv_drained_end(bs); > > > > + bdrv_unref(bs); > > > > } > > > > > > > > blk_update_root_state(blk); > > > > > > This hunk is unnecessary, we still hold a reference that is only given > > > up a few lines down with bdrv_root_unref_child(root). > > > > That's not the only place where the reference can be dropped: > > bdrv_drop_filter() removes the filter node from the graph. > > > > Here is a case where it happens: block/backup.c:backup_clean() -> > > bdrv_cbw_drop() -> bdrv_drop_filter() -> bdrv_replace_node_common() -> > > bdrv_replace_child_commit(). After we reach this bdrv_unref() is called > > a few times and all references are dropped because the node is no longer > > in the graph. > > > > This happens during blk_remove_bs() -> bdrv_drained_begin(), so the bs > > pointer in the above hunk can be stale. > > Is the scenario that blk->root doesn't go away, but the node it > references changes?
Yes, blk->root remains non-NULL but its value changes across aio_poll(). > So the unref below will be for a different node than > we're draining here? Yes. > If so, let's add a comment that blk_bs(blk) can change after the drain, > and maybe move the BlockDriverState *bs declaration inside the if block > because the variable is invalid after it anyway. Sounds good. > Can it also happen that instead of removing a node from the chain, a new > one is inserted and we actually end up having drained the wrong node > before switching the context for tgm? I think a new node can theoretically be inserted. I don't know if that can happen in practice since insertions typically happen due to monitor commands and monitor commands don't execute during aio_poll(). Stefan
signature.asc
Description: PGP signature
