Alberto Garcia <be...@igalia.com> writes:

> I've been debugging a couple of problems related to the recently
> merged bdrv_reopen() overhaul code.
>
> 1. bs->children is not updated correctly
> ----------------------------------------
> The problem is described in this e-mail:
>
>    https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg06813.html
>
> In short, changing an image's backing hd does not replace the pointer
> in the bs->children list.
>
> The children list is a feature added recently in 6e93e7c41fdfdee30.
> In addition to bs->backing_hd and bs->file it also includes other
> driver-specific children for cases like Quorum.

I was involved in discussions leading up to the children list, but I
couldn't find the time to review actual patches, so I only have
high-level remarks at this time.

Obvious: the BDS graph is linked together with pointers.

Two pointers are special: BDS members @backing_hd and @file.  By
convention:

1. COW drivers use @file for the delta and @backing_hd for the base.

   Example: "qcow2" uses them that way.

2. Non-COW drivers don't use @backing_hd (it remains null), and they may
   use @file.

   Example: "raw" uses @file to point to its underlying BDS.
   Example: "file" doesn't use @file (it remains null).

3. If a driver needs children that don't fit the above, it has suitable
   members in its private state.

   Example "blkverify" has a member @test_file.

Due to 3., generic code cannot find all children.  This is unfortunate.

Possible solutions:

A. Driver callbacks to help iterate over children.

B. Change the data structure so that generic code can iterate over
   children.

C. Augment the data structure so that generic code can iterate over
   children.

Solution A seems relatively straightforward to do, but clumsy.  B is a
lot of code churn, and the result might be less clear, say bs->child[0]
and bs->child[1] instead of bs->file and bs->backing_hd.  C begs
consistency questions.

Our children list is C.  How do we ensure consistency?

One way to make ensuring it easier is another level of indirection: have
the children list nodes point to the child pointer instead of the child.
Then we need to mess with the children list only initially, and when
child pointers get born or die.

Without that, we need to mess with it when child pointers change.  We
can require all changes to go through a helper function that updates the
children list.  Perhaps something like

    void bdrv_set_child_internal(BlockDriverState *bs, size_t offs,
                                 BlockDriverState *new_child)
    {
        BlockDriverState **child_ptr = (BlockDriverState **)((char *)bs + offs);

        if (*child_ptr) {
            remove *child_ptr from bs->children
        }
        *child_ptr = new_child
        if (new_child) {
            add *child_ptr to bs->children
        }
    }

    #define bdrv_set_child(bs, member, new_child) \
        bdrv_set_child_internal(bs, offsetof(bs, member), new_child)
    // build time assertion bs->member is of type BlockDriverState * omitted

Caution: when the same child occurs multiple times, this may destroy any
association of "actual child pointer" with "position in child list".

Aside: why is it a children list and not an array?  Are members deleted
or inserted that often?

> However there's no way to remove items from that list. It seems that
> this was discussed when the patch was first published, but no one saw
> a case where this could break:
>
>    https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02284.html
>
> The problem is that it can break: the block-stream command removes a
> BDS's backing image (optionally replacing it with a new one), so the
> pointer in bs->children becomes invalid.

Anything changing child pointers, be it @file, @backing_hd or in private
state, must update the children list consistently.

> I wrote a patch that updates bs->children when bdrv_set_backing_hd()
> is called.

Sounds like a correct approach for updating @backing_hd.  What about the
other child pointers?

>            It also makes sure that the same children is not added
> twice to the same parent (that can happen due to bdrv_set_backing_hd()
> being called in bdrv_open_backing_file()).

As Kevin pointed out, the same child can legitimately occur multiple
times in the children list.

> I think this is enough to solve this problem, but I haven't checked
> all other cases of chidren added using bdrv_attach_child(). Anyway the
> assumption that once a BDS is added to that list it will always be
> valid seems very broad to me.
>
>
> 2. bdrv_reopen_queue() includes the complete backing chain
> ----------------------------------------------------------
> Calling bdrv_reopen() on a particular BlockDriverState now adds its
> whole backing chain to the queue (formerly I think it was just
> bs->file).

I'm not sure I understand.  Can you give an example with before and
after behavior?

> I don't know why this behavior is necessary, but there are surely
> things that I'm overlooking.
>
> However this breaks one of the features of my intermediate block
> streaming patchset: the ability to start several block-stream
> operations in parallel as long as the affected chains don't overlap.
>
> This breaks iotest 030, as described here:
>
>    https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg06273.html
>
> Now, this feature was just a nice side effect of the ability to stream
> to intermediate images, and is of secondary importance to me; so if I
> can no longer assume that bdrv_reopen() is not going to touch the
> whole backing chain, I can just remove it very easily and still leave
> the main functionality intact.

Does your patch address this in any way?

> Comments are welcome.
>
> Thanks,
>
> Berto
>
> Alberto Garcia (1):
>   block: update BlockDriverState's children in bdrv_set_backing_hd()
>
>  block.c | 40 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 38 insertions(+), 2 deletions(-)

Reply via email to