On 14.02.19 03:25, Eric Blake wrote: > On 2/13/19 4:53 PM, Max Reitz wrote: >> What bs->file and bs->backing mean depends on the node. For filter >> nodes, both signify a node that will eventually receive all R/W >> accesses. For format nodes, bs->file contains metadata and data, and >> bs->backing will not receive writes -- instead, writes are COWed to >> bs->file. Usually. >> >> In any case, it is not trivial to guess what a child means exactly with >> our currently limited form of expression. It is better to introduce >> some functions that actually guarantee a meaning: >> >> - bdrv_filtered_cow_child() will return the child that receives requests >> filtered through COW. That is, reads may or may not be forwarded >> (depending on the overlay's allocation status), but writes never go to >> this child. >> >> - bdrv_filtered_rw_child() will return the child that receives requests >> filtered through some very plain process. Reads and writes issued to >> the parent will go to the child as well (although timing, etc. may be >> modified). >> >> - All drivers but quorum (but quorum is pretty opaque to the general >> block layer anyway) always only have one of these children: All read >> requests must be served from the filtered_rw_child (if it exists), so >> if there was a filtered_cow_child in addition, it would not receive >> any requests at all. >> (The closest here is mirror, where all requests are passed on to the >> source, but with write-blocking, write requests are "COWed" to the >> target. But that just means that the target is a special child that >> cannot be introspected by the generic block layer functions, and that >> source is a filtered_rw_child.) >> Therefore, we can also add bdrv_filtered_child() which returns that >> one child (or NULL, if there is no filtered child). >> >> Also, many places in the current block layer should be skipping filters >> (all filters or just the ones added implicitly, it depends) when going >> through a block node chain. They do not do that currently, but this >> patch makes them. >> >> One example for this is qemu-img map, which should skip filters and only >> look at the COW elements in the graph. The change to iotest 204's >> reference output shows how using blkdebug on top of a COW node used to >> make qemu-img map disregard the rest of the backing chain, but with this >> patch, the allocation in the base image is reported correctly. >> >> Furthermore, a note should be made that sometimes we do want to access >> bs->backing directly. This is whenever the operation in question is not >> about accessing the COW child, but the "backing" child, be it COW or >> not. This is the case in functions such as bdrv_open_backing_file() or >> whenever we have to deal with the special behavior of @backing as a >> blockdev option, which is that it does not default to null like all >> other child references do. >> >> Finally, the query functions (query-block and query-named-block-nodes) >> are modified to return any filtered child under "backing", not just >> bs->backing or COW children. This is so that filters do not interrupt >> the reported backing chain. This changes the output of iotest 184, as >> the throttled node now appears as a backing child. >> >> Signed-off-by: Max Reitz <mre...@redhat.com> >> --- > >> +++ b/qapi/block-core.json >> @@ -2417,6 +2417,10 @@ >> # On successful completion the image file is updated to drop the backing >> file >> # and the BLOCK_JOB_COMPLETED event is emitted. >> # >> +# In case @device is a filter node, block-stream modifies the first >> non-filter >> +# overlay node below it to point to base's backing node (or NULL if @base >> was >> +# not specified) instead of modifying @device itself. > > Maybe s/In case/If/
Works for me. >> +/* >> + * For a backing chain, return the first non-filter backing image. >> + */ >> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs) >> +{ >> + return >> bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))); > > Quite a mouthful, but looks correct. > > >> +++ b/block/io.c >> @@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const >> BlockLimits *src) >> void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) >> { >> BlockDriver *drv = bs->drv; >> + BlockDriverState *storage_bs; >> + BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs); > > If the backing file is filtered by a blkdebug layer that intentionally > is trying to advertise alternative block sizes... > >> Error *local_err = NULL; >> >> + /* >> + * FIXME: There should be a function for this, and in fact there >> + * will be as of a follow-up patch. >> + */ >> + storage_bs = >> + child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs); >> + >> memset(&bs->bl, 0, sizeof(bs->bl)); >> >> if (!drv) { >> @@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error >> **errp) >> drv->bdrv_aio_preadv) ? 1 : 512; >> >> /* Take some limits from the children as a default */ >> - if (bs->file) { >> - bdrv_refresh_limits(bs->file->bs, &local_err); >> + if (storage_bs) { >> + bdrv_refresh_limits(storage_bs, &local_err); >> if (local_err) { >> error_propagate(errp, local_err); >> return; >> } >> - bdrv_merge_limits(&bs->bl, &bs->file->bs->bl); >> + bdrv_merge_limits(&bs->bl, &storage_bs->bl); >> } else { >> bs->bl.min_mem_alignment = 512; >> bs->bl.opt_mem_alignment = getpagesize(); >> @@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error >> **errp) >> bs->bl.max_iov = IOV_MAX; >> } >> >> - if (bs->backing) { >> - bdrv_refresh_limits(bs->backing->bs, &local_err); >> + if (cow_bs) { >> + bdrv_refresh_limits(cow_bs, &local_err); > > ...then this change means that the active layer no longer picks up the > blkdebug block sizes, but the original COW layer sizes. How so? I don't see how this changes current behavior with just COW/blkdebug nodes at all. For blkdebug, bs->file is the storage_bs. For COW nodes, bs->file is the storage_bs and bs->backing is the cow_bs. So these code changes do nothing in that case. (They only change something for filter nodes that use bs->backing for their filtered node; because that is translated to storage_bs now.) Note that bdrv_filtered_cow_bs() does not skip filters (i.e. the blkdebug node). That would be bdrv_backing_chain_next(). > Is that > intentional? I don't think it is fatal to the patch (as blkdebug is not > used in production, but only in testing), but it may cause some > head-scratching when trying to test behaviors of a COW child with > different block sizes than the active layer by using blkdebug on top of > the COW child. I guess I'll find out soon enough (on my todo list is > fixing NBD to never split an NBD_CMD_READ or NBD_CMD_BLOCK_STATUS reply > below the granularity advertised at the active layer, even if the > backing file has a smaller granularity - and using blkdebug to force the > backing file granularity was my original plan of attack - but it is not > until this patch is applied that NBD can even locate the bitmap in a > backing file when a filter is interposed) > > Reviewed-by: Eric Blake <ebl...@redhat.com> Thanks! Max
signature.asc
Description: OpenPGP digital signature