On Jun 4 14:07, Fiona Ebner wrote: > The bdrv_refresh_limits() function and driver implementations are > called with the graph lock held. The implementation for the 'compress' > filter calls bdrv_get_info(), which is a generated coroutine wrapper > and thus polls. This can lead to a deadlock when issuing a > blockdev-snapshot QMP command, when bdrv_refresh_limits() is called in > bdrv_append() while the graph lock is held exclusively. This deadlock > was introduced with commit 5661a00d40 ("block: Call transaction > callbacks with lock held"). > > As a solution, this reverts commit 3d47eb0a2a ("block: > Convert bdrv_get_info() to co_wrapper_mixed"). To do this, it is > necessary to have callers of bdrv_get_info() take the graph lock > themselves. None of the driver implementations rely on being run in > coroutine context and none of the callers rely on the function being > a coroutine. > > All callers except one either already hold the graph lock or can claim > the graph lock via bdrv_graph_rdlock_main_loop(). As part of this, > bdrv_get_default_bitmap_granularity() is annotated with GRAPH_RDLOCK > and its callers adapted where necessary. > > The single exception is the caller nvme_ns_init_format(), which can > run as a callback in an IO thread, but can also be reached via the QOM > realize handler nvme_ns_realize(). For this caller, a > bdrv_get_info_unlocked() coroutine wrapper is introduced that must be > called with the graph unlocked. >
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c > index 6df2e8e7c5..ee3eabb1aa 100644 > --- a/hw/nvme/ns.c > +++ b/hw/nvme/ns.c > @@ -50,7 +50,7 @@ void nvme_ns_init_format(NvmeNamespace *ns) > > npdg = ns->blkconf.discard_granularity / ns->lbasz; > > - ret = bdrv_get_info(blk_bs(ns->blkconf.blk), &bdi); > + ret = bdrv_get_info_unlocked(blk_bs(ns->blkconf.blk), &bdi); > if (ret >= 0 && bdi.cluster_size > ns->blkconf.discard_granularity) { > npdg = bdi.cluster_size / ns->lbasz; > } Acked-by: Klaus Jensen <k.jen...@samsung.com> FWIW, if there is a better way to get the cluster size I'd very much like to change what we do in hw/nvme for that. We need it to infer/compute the "preferred deallocation granularity". But maybe, it's just not the correct way to do this, and we shouldn't try to do it at all and not report a preferred dealllocation granularity. It does seem brittle, or?
signature.asc
Description: PGP signature