Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-09-12 Thread Alberto Garcia
On Thu 21 Jun 2018 03:06:22 PM CEST, Kevin Wolf wrote:
>> > Actually, do we ever use bdrv_reopen() for flags other than
>> > read-only?  Maybe we should get rid of that flags nonsense and
>> > simply make it a bdrv_reopen_set_readonly() taking a boolean.
>> 
>> I think that's a good idea. There's however one case where other
>> flags are changed: reopen_backing_file() in block/replication.c that
>> also clears BDRV_O_INACTIVE. I'd need to take a closer look to that
>> code to see what we can do about it.
>
> Uh, and that works?
>
> Clearing BDRV_O_INACTIVE with bdrv_reopen() (which means, without
> calling bdrv_invalidate_cache()) is surely suspicious. Probably this
> code is buggy.
>
> Another reason for removing flags from the interface: We didn't do any
> sanity checks for the flag changes.

I'm working on removing the flags parameter from bdrv_reopen() and
friends, and I think this is the only clearly strange thing on the way.

The documentation is in https://wiki.qemu.org/Features/BlockReplication
or in docs/block-replication.txt.

As far as I can see there's the following backing chain:

  [secondary] <- [hidden] <- [active]

There's an NBD server writing on [secondary], a backup job copying data
from [secondary] to [hidden], and the VM writing to [active].

The reopen_backing_file() function in question simply puts the two
backing images in read-write mode (clearing BDRV_O_INACTIVE along the
way) so the NBD server and the backup job can write to them.

This is done by replication_start(), which can only be reached with the
QMP command "xen-set-replication". There's a comment that says "The
caller of the function MUST make sure vm stopped" but no one seems to
check that the vm is indeed stopped (?). This API was btw added half a
year after the replication driver, and before this point the only user
of replication_start() was a unit test.

Without having tested the COLO / replication code before, I don't see
any reason why it would make sense to clear the BDRV_O_INACTIVE flag on
the backing files, or why that flag would be there in the first place
(and I'm not even talking about whether that flag is supposed to be
set/cleared with a simple bdrv_reopen()). So I'm thinking to simply
remove it from there.

If anyone else has more information I'd be happy to hear it.

Berto



Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-21 Thread Kevin Wolf
Am 20.06.2018 um 14:35 hat Alberto Garcia geschrieben:
> On Wed 20 Jun 2018 12:58:55 PM CEST, Kevin Wolf wrote:
> > Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben:
> >> >> Wait, I think the description I gave is inaccurate:
> >> >> 
> >> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the
> >> >> backing image name (c->role->update_filename()). If we're doing this in
> >> >> an intermediate node then it needs to be reopened in read-write mode,
> >> >> while keeping the rest of the options.
> >> >> 
> >> >> But then bdrv_reopen_commit() realizes that the backing file (from
> >> >> reopen_state->options) is not the current one (because there's a
> >> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
> >> >> the root cause of the crash.
> >> >
> >> > How did the other (the real?) backing file get into
> >> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to
> >> > change anything except the read-only flag, so we should surely have
> >> > the node name of bdrv_mirror_top there?
> >> 
> >> No, it doesn't (try to) change anything else. It cannot do it:
> >> bdrv_reopen() only takes flags, but keeps the current options. And the
> >> current options have 'backing' set to a thing different from the
> >> bdrv_mirror_top node.
> >
> > Okay, so in theory this is expected to just work.
> >
> > Actually, do we ever use bdrv_reopen() for flags other than read-only?
> > Maybe we should get rid of that flags nonsense and simply make it a
> > bdrv_reopen_set_readonly() taking a boolean.
> 
> I think that's a good idea. There's however one case where other flags
> are changed: reopen_backing_file() in block/replication.c that also
> clears BDRV_O_INACTIVE. I'd need to take a closer look to that code to
> see what we can do about it.

Uh, and that works?

Clearing BDRV_O_INACTIVE with bdrv_reopen() (which means, without
calling bdrv_invalidate_cache()) is surely suspicious. Probably this
code is buggy.

Another reason for removing flags from the interface: We didn't do any
sanity checks for the flag changes.

> And there's of course qemu-io's "reopen" command, which does take
> options but keeps the previous values.

It provides -r/-w for changing readonly and -c to change the cache mode
flags. Both should be easy to convert to QDict options.

> > I think we should already remove nested options of children from the
> > dicts in bdrv_open_inherit(). I'm less sure about node-name
> > references.  Maybe instead of keeing the dicts up-to-date each time we
> > modify the graph, we should just get rid of those in the dicts as
> > well, and instead add a function that reconstructs the full dict from
> > bs->options and the currently attached children. This requires that
> > the child name and the option name match, but I think that's
> > true. (Mostly at least - what about quorum? But the child node
> > handling of quorum is broken anyway.)
> 
> Yes, removing nested options sounds like a good idea. In what cases do
> we need the full qdict, though?

Not sure, maybe we don't even need them now that we decided that the
default is leaving the reference unchanged.

There's the creation of json: filenames, maybe we need it there. We'd
also certainly need to get the full QDict if we wanted to convert the
options to a QAPI object before we pass them to the drivers.

Kevin



Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-20 Thread Alberto Garcia
On Wed 20 Jun 2018 12:58:55 PM CEST, Kevin Wolf wrote:
> Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben:
>> >> Wait, I think the description I gave is inaccurate:
>> >> 
>> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the
>> >> backing image name (c->role->update_filename()). If we're doing this in
>> >> an intermediate node then it needs to be reopened in read-write mode,
>> >> while keeping the rest of the options.
>> >> 
>> >> But then bdrv_reopen_commit() realizes that the backing file (from
>> >> reopen_state->options) is not the current one (because there's a
>> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
>> >> the root cause of the crash.
>> >
>> > How did the other (the real?) backing file get into
>> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to
>> > change anything except the read-only flag, so we should surely have
>> > the node name of bdrv_mirror_top there?
>> 
>> No, it doesn't (try to) change anything else. It cannot do it:
>> bdrv_reopen() only takes flags, but keeps the current options. And the
>> current options have 'backing' set to a thing different from the
>> bdrv_mirror_top node.
>
> Okay, so in theory this is expected to just work.
>
> Actually, do we ever use bdrv_reopen() for flags other than read-only?
> Maybe we should get rid of that flags nonsense and simply make it a
> bdrv_reopen_set_readonly() taking a boolean.

I think that's a good idea. There's however one case where other flags
are changed: reopen_backing_file() in block/replication.c that also
clears BDRV_O_INACTIVE. I'd need to take a closer look to that code to
see what we can do about it.

And there's of course qemu-io's "reopen" command, which does take
options but keeps the previous values.

>> > > So my first impression is that we should not try to change backing
>> > > files if a reopen was not requested by the user (blockdev-reopen)
>> > > and perhaps we should forbid it when there are implicit nodes
>> > > involved.
>> > Changing the behaviour depending on who the caller is sounds like a
>> > hack that relies on coincidence and may break sooner or later.
>> 
>> The main difference between the user calling blockdev-reopen and the
>> code doing bdrv_reopen() internally is that in the former case all
>> existing options are ignored (keep_old_opts = false) and in the latter
>> they are kept.
>> 
>> This latter case can have unintended consequences, and I think they're
>> all related to the fact that bs->explicit_options also keeps the options
>> of its children. So if you have
>> 
>>C <- B <- A
>> 
>> and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and
>> you reopen A (keeping its options) then everything goes fine. However if
>> you remove B from the chain (using e.g. block-stream) then you can't
>> reopen A anymore because backing.* no longer matches the current backing
>> file.
>> 
>> I suppose that we would need to remove the children options from
>> bs->explicit_options in that case? If this all happens because of the
>> user calling blockdev-reopen then we don't need to worry about it becase
>> bs->explicit_options are ignored.
>
> So the problem is that bs->explicit_options (and probably bs->options)
> aren't consistent with the actual state of the graph. The fix for that
> is likely not in bdrv_reopen().

Probably not because the graph can be changed by other means (e.g
block-stream as I just said).

> I think we should already remove nested options of children from the
> dicts in bdrv_open_inherit(). I'm less sure about node-name
> references.  Maybe instead of keeing the dicts up-to-date each time we
> modify the graph, we should just get rid of those in the dicts as
> well, and instead add a function that reconstructs the full dict from
> bs->options and the currently attached children. This requires that
> the child name and the option name match, but I think that's
> true. (Mostly at least - what about quorum? But the child node
> handling of quorum is broken anyway.)

Yes, removing nested options sounds like a good idea. In what cases do
we need the full qdict, though?

>> At the moment there's 
>> 
>> Berto
>
> And it's very good to have a Berto at the moment, but I think you
> wanted to add something else there? ;-)

I think it was just a leftover from a previous paragraph :-)

Berto



Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-20 Thread Kevin Wolf
Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben:
> >> Wait, I think the description I gave is inaccurate:
> >> 
> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the
> >> backing image name (c->role->update_filename()). If we're doing this in
> >> an intermediate node then it needs to be reopened in read-write mode,
> >> while keeping the rest of the options.
> >> 
> >> But then bdrv_reopen_commit() realizes that the backing file (from
> >> reopen_state->options) is not the current one (because there's a
> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
> >> the root cause of the crash.
> >
> > How did the other (the real?) backing file get into
> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to
> > change anything except the read-only flag, so we should surely have
> > the node name of bdrv_mirror_top there?
> 
> No, it doesn't (try to) change anything else. It cannot do it:
> bdrv_reopen() only takes flags, but keeps the current options. And the
> current options have 'backing' set to a thing different from the
> bdrv_mirror_top node.

Okay, so in theory this is expected to just work.

Actually, do we ever use bdrv_reopen() for flags other than read-only?
Maybe we should get rid of that flags nonsense and simply make it a
bdrv_reopen_set_readonly() taking a boolean.

> > > So my first impression is that we should not try to change backing
> > > files if a reopen was not requested by the user (blockdev-reopen)
> > > and perhaps we should forbid it when there are implicit nodes
> > > involved.
> > Changing the behaviour depending on who the caller is sounds like a
> > hack that relies on coincidence and may break sooner or later.
> 
> The main difference between the user calling blockdev-reopen and the
> code doing bdrv_reopen() internally is that in the former case all
> existing options are ignored (keep_old_opts = false) and in the latter
> they are kept.
> 
> This latter case can have unintended consequences, and I think they're
> all related to the fact that bs->explicit_options also keeps the options
> of its children. So if you have
> 
>C <- B <- A
> 
> and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and
> you reopen A (keeping its options) then everything goes fine. However if
> you remove B from the chain (using e.g. block-stream) then you can't
> reopen A anymore because backing.* no longer matches the current backing
> file.
> 
> I suppose that we would need to remove the children options from
> bs->explicit_options in that case? If this all happens because of the
> user calling blockdev-reopen then we don't need to worry about it becase
> bs->explicit_options are ignored.

So the problem is that bs->explicit_options (and probably bs->options)
aren't consistent with the actual state of the graph. The fix for that
is likely not in bdrv_reopen().

I think we should already remove nested options of children from the
dicts in bdrv_open_inherit(). I'm less sure about node-name references.
Maybe instead of keeing the dicts up-to-date each time we modify the
graph, we should just get rid of those in the dicts as well, and instead
add a function that reconstructs the full dict from bs->options and the
currently attached children. This requires that the child name and the
option name match, but I think that's true. (Mostly at least - what
about quorum? But the child node handling of quorum is broken anyway.)

I'm almost sure Max has an opinion about this, too. :-)

> >> Ah, ok, I think that's related to the crash that I mentioned earlier
> >> with block-commit. Yes, it makes sense that we forbid that. I suppose
> >> that we can do this already with BLK_PERM_GRAPH_MOD ?
> >
> > Possibly. The thing with BLK_PERM_GRAPH_MOD is that nobody knows what
> > its meaning has to be so that it's actually useful, so we're not using
> > it in any meaningful way yet.
> >
> > I suspect BLK_PERM_GRAPH_MOD is the wrong tool because what we
> > actually want to protect against modification is not a BDS, but a
> > BdrvChild. But I may be wrong there.
> 
> I'll take a closer look and see what I come up with.

Okay. Maybe don't implement anything yet, but just try to come up with a
concept for discussion.

> At the moment there's 
> 
> Berto

And it's very good to have a Berto at the moment, but I think you wanted
to add something else there? ;-)

Kevin



Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-19 Thread Alberto Garcia
>> Wait, I think the description I gave is inaccurate:
>> 
>> commit_complete() calls bdrv_drop_intermediate(), and that updates the
>> backing image name (c->role->update_filename()). If we're doing this in
>> an intermediate node then it needs to be reopened in read-write mode,
>> while keeping the rest of the options.
>> 
>> But then bdrv_reopen_commit() realizes that the backing file (from
>> reopen_state->options) is not the current one (because there's a
>> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
>> the root cause of the crash.
>
> How did the other (the real?) backing file get into
> reopen_state->options? I don't think bdrv_drop_intermediate() wants to
> change anything except the read-only flag, so we should surely have
> the node name of bdrv_mirror_top there?

No, it doesn't (try to) change anything else. It cannot do it:
bdrv_reopen() only takes flags, but keeps the current options. And the
current options have 'backing' set to a thing different from the
bdrv_mirror_top node.

> > So my first impression is that we should not try to change backing
> > files if a reopen was not requested by the user (blockdev-reopen)
> > and perhaps we should forbid it when there are implicit nodes
> > involved.
> Changing the behaviour depending on who the caller is sounds like a
> hack that relies on coincidence and may break sooner or later.

The main difference between the user calling blockdev-reopen and the
code doing bdrv_reopen() internally is that in the former case all
existing options are ignored (keep_old_opts = false) and in the latter
they are kept.

This latter case can have unintended consequences, and I think they're
all related to the fact that bs->explicit_options also keeps the options
of its children. So if you have

   C <- B <- A

and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and
you reopen A (keeping its options) then everything goes fine. However if
you remove B from the chain (using e.g. block-stream) then you can't
reopen A anymore because backing.* no longer matches the current backing
file.

I suppose that we would need to remove the children options from
bs->explicit_options in that case? If this all happens because of the
user calling blockdev-reopen then we don't need to worry about it becase
bs->explicit_options are ignored.

>> >> >> 2) If the 'backing' option is omitted altogether then the existing
>> >> >> backing file (if any) is kept. Unlike blockdev-add this will _not_
>> >> >> open the default backing file specified in the image header.
>> >> >
>> >> > Hm... This is certainly an inconsistency. Is it unavoidable?
>> >> 
>> >> I don't think we want to open new nodes on reopen(), but one easy way
>> >> to avoid this behavior is simply to return an error if the user omits
>> >> the 'backing' option.
>> >
>> > Hm, yes, not opening new nodes is a good point.
>> >
>> > As long as find a good solution that can be consistently applied to
>> > all BlockdevRef, it should be okay. So I don't necessarily object to
>> > the special casing and just leaving them as they are by default.
>> >
>> >> But this raises a few questions. Let's say you have an image with a
>> >> backing file and you open it with blockdev-add omitting the 'backing'
>> >> option. That means the default backing file is opened.
>> >> 
>> >>   - Do you still have to specify 'backing' on reopen? I suppose you
>> >> don't have to.
>> >
>> > You would have to. I agree it's ugly (not the least because you probably
>> > didn't assign an explicit node name, though -drive allows specifying
>> > only the node name, but still using the filename from the image file).
>> 
>> Yes it's a bit ugly, but we wouldn't be having a special case with the
>> 'backing' option.
>
> I think files I'm meanwhile tending towards your current solution
> (i.e.  default is leaving the reference as it is and you must use null
> to get rid of a backing file).

It's not necessarily a bad option. The only one that I don't like is
opening the default backing file if 'backing' is omitted.

>> >> > Do we need some way for e.g. block jobs to forbid changing the
>> >> > backing file of the subchain they are operating on?
>> >> 
>> >> Are you thinking of any particular scenario?
>> >
>> > Not specifically, but I think e.g. the commit job might get confused
>> > if you break its backing chain because it assumes that base is
>> > somewhere in the backing chain of top, and also that it called
>> > block_job_add_bdrv() on everything in the subchain it is working on.
>> >
>> > It just feels like we'd allow to break any such assumptions if we
>> > don't block graph changes there.
>> 
>> Ah, ok, I think that's related to the crash that I mentioned earlier
>> with block-commit. Yes, it makes sense that we forbid that. I suppose
>> that we can do this already with BLK_PERM_GRAPH_MOD ?
>
> Possibly. The thing with BLK_PERM_GRAPH_MOD is that nobody knows what
> its meaning has to be so that it's actually useful, so 

Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-19 Thread Kevin Wolf
Am 19.06.2018 um 14:27 hat Alberto Garcia geschrieben:
> On Mon 18 Jun 2018 06:12:29 PM CEST, Kevin Wolf wrote:
> >> >> This patch allows the user to change the backing file of an image
> >> >> that is being reopened. Here's what it does:
> >> >> 
> >> >>  - In bdrv_reopen_queue_child(): if the 'backing' option points to an
> >> >>image different from the current backing file then it means that
> >> >>the latter is going be detached so it must not be put in the reopen
> >> >>queue.
> >> >> 
> >> >>  - In bdrv_reopen_prepare(): check that the value of 'backing' points
> >> >>to an existing node or is null. If it points to an existing node it
> >> >>also needs to make sure that replacing the backing file will not
> >> >>create a cycle in the node graph (i.e. you cannot reach the parent
> >> >>from the new backing file).
> >> >> 
> >> >>  - In bdrv_reopen_commit(): perform the actual node replacement by
> >> >>calling bdrv_set_backing_hd(). It may happen that there are
> >> >>temporary implicit nodes between the BDS that we are reopening and
> >> >>the backing file that we want to replace (e.g. a commit fiter node),
> >> >>so we must skip them.
> >> >
> >> > I think we shouldn't do this. blockdev-reopen is an advanced command
> >> > that requires knowledge of the graph at the node level. Therefore we can
> >> > expect the management tool to consider filter nodes when it reconfigures
> >> > the graph. This is important because it makes a difference whether a
> >> > new node is inserted above or below the filter node.
> >> 
> >> But those nodes that the management tool knows about would not be
> >> implicit, or would they?
> >
> > Hm, you're right, they are only implicit if no node name was given. So
> > it's not as bad as I thought.
> >
> > I still think of bs->implicit as a hack to maintain compatibility for
> > legacy commands rather than something to use in new commands.
> 
> Yes, and I'm still not 100% convinced that skipping the implicit nodes
> as I'm doing here is the proper solution, so maybe I'll need to come up
> with something else.
> 
> >> The reason why I did this is because there's several places in the
> >> QEMU codebase where bdrv_reopen() is called while there's a temporary
> >> implicit node. For example, on exit bdrv_commit() needs to call
> >> bdrv_set_backing_hd() to remove intermediate nodes from the
> >> chain. This happens while bdrv_mirror_top is still there, so if we
> >> don't skip it then QEMU crashes.
> >
> > But isn't that bdrv_set_backing_hd() a separate operation? I would
> > understand that it matters if you changed the code so that it would
> > use bdrv_reopen() in order to change the backing file, but that's not
> > what it does, is it?
> 
> Wait, I think the description I gave is inaccurate:
> 
> commit_complete() calls bdrv_drop_intermediate(), and that updates the
> backing image name (c->role->update_filename()). If we're doing this in
> an intermediate node then it needs to be reopened in read-write mode,
> while keeping the rest of the options.
> 
> But then bdrv_reopen_commit() realizes that the backing file (from
> reopen_state->options) is not the current one (because there's a
> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
> the root cause of the crash.

How did the other (the real?) backing file get into
reopen_state->options? I don't think bdrv_drop_intermediate() wants to
change anything except the read-only flag, so we should surely have the
node name of bdrv_mirror_top there?

> So my first impression is that we should not try to change backing files
> if a reopen was not requested by the user (blockdev-reopen) and perhaps
> we should forbid it when there are implicit nodes involved.

Changing the behaviour depending on who the caller is sounds like a hack
that relies on coincidence and may break sooner or later.

> >> >> 2) If the 'backing' option is omitted altogether then the existing
> >> >> backing file (if any) is kept. Unlike blockdev-add this will _not_
> >> >> open the default backing file specified in the image header.
> >> >
> >> > Hm... This is certainly an inconsistency. Is it unavoidable?
> >> 
> >> I don't think we want to open new nodes on reopen(), but one easy way
> >> to avoid this behavior is simply to return an error if the user omits
> >> the 'backing' option.
> >
> > Hm, yes, not opening new nodes is a good point.
> >
> > As long as find a good solution that can be consistently applied to
> > all BlockdevRef, it should be okay. So I don't necessarily object to
> > the special casing and just leaving them as they are by default.
> >
> >> But this raises a few questions. Let's say you have an image with a
> >> backing file and you open it with blockdev-add omitting the 'backing'
> >> option. That means the default backing file is opened.
> >> 
> >>   - Do you still have to specify 'backing' on reopen? I suppose you
> >> don't have to.
> >
> > You would have to. I 

Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-19 Thread Alberto Garcia
On Mon 18 Jun 2018 06:12:29 PM CEST, Kevin Wolf wrote:
>> >> This patch allows the user to change the backing file of an image
>> >> that is being reopened. Here's what it does:
>> >> 
>> >>  - In bdrv_reopen_queue_child(): if the 'backing' option points to an
>> >>image different from the current backing file then it means that
>> >>the latter is going be detached so it must not be put in the reopen
>> >>queue.
>> >> 
>> >>  - In bdrv_reopen_prepare(): check that the value of 'backing' points
>> >>to an existing node or is null. If it points to an existing node it
>> >>also needs to make sure that replacing the backing file will not
>> >>create a cycle in the node graph (i.e. you cannot reach the parent
>> >>from the new backing file).
>> >> 
>> >>  - In bdrv_reopen_commit(): perform the actual node replacement by
>> >>calling bdrv_set_backing_hd(). It may happen that there are
>> >>temporary implicit nodes between the BDS that we are reopening and
>> >>the backing file that we want to replace (e.g. a commit fiter node),
>> >>so we must skip them.
>> >
>> > I think we shouldn't do this. blockdev-reopen is an advanced command
>> > that requires knowledge of the graph at the node level. Therefore we can
>> > expect the management tool to consider filter nodes when it reconfigures
>> > the graph. This is important because it makes a difference whether a
>> > new node is inserted above or below the filter node.
>> 
>> But those nodes that the management tool knows about would not be
>> implicit, or would they?
>
> Hm, you're right, they are only implicit if no node name was given. So
> it's not as bad as I thought.
>
> I still think of bs->implicit as a hack to maintain compatibility for
> legacy commands rather than something to use in new commands.

Yes, and I'm still not 100% convinced that skipping the implicit nodes
as I'm doing here is the proper solution, so maybe I'll need to come up
with something else.

>> The reason why I did this is because there's several places in the
>> QEMU codebase where bdrv_reopen() is called while there's a temporary
>> implicit node. For example, on exit bdrv_commit() needs to call
>> bdrv_set_backing_hd() to remove intermediate nodes from the
>> chain. This happens while bdrv_mirror_top is still there, so if we
>> don't skip it then QEMU crashes.
>
> But isn't that bdrv_set_backing_hd() a separate operation? I would
> understand that it matters if you changed the code so that it would
> use bdrv_reopen() in order to change the backing file, but that's not
> what it does, is it?

Wait, I think the description I gave is inaccurate:

commit_complete() calls bdrv_drop_intermediate(), and that updates the
backing image name (c->role->update_filename()). If we're doing this in
an intermediate node then it needs to be reopened in read-write mode,
while keeping the rest of the options.

But then bdrv_reopen_commit() realizes that the backing file (from
reopen_state->options) is not the current one (because there's a
bdrv_mirror_top implicit filter) and attempts to remove it. And that's
the root cause of the crash.

So my first impression is that we should not try to change backing files
if a reopen was not requested by the user (blockdev-reopen) and perhaps
we should forbid it when there are implicit nodes involved.

>> >> 2) If the 'backing' option is omitted altogether then the existing
>> >> backing file (if any) is kept. Unlike blockdev-add this will _not_
>> >> open the default backing file specified in the image header.
>> >
>> > Hm... This is certainly an inconsistency. Is it unavoidable?
>> 
>> I don't think we want to open new nodes on reopen(), but one easy way
>> to avoid this behavior is simply to return an error if the user omits
>> the 'backing' option.
>
> Hm, yes, not opening new nodes is a good point.
>
> As long as find a good solution that can be consistently applied to
> all BlockdevRef, it should be okay. So I don't necessarily object to
> the special casing and just leaving them as they are by default.
>
>> But this raises a few questions. Let's say you have an image with a
>> backing file and you open it with blockdev-add omitting the 'backing'
>> option. That means the default backing file is opened.
>> 
>>   - Do you still have to specify 'backing' on reopen? I suppose you
>> don't have to.
>
> You would have to. I agree it's ugly (not the least because you probably
> didn't assign an explicit node name, though -drive allows specifying
> only the node name, but still using the filename from the image file).

Yes it's a bit ugly, but we wouldn't be having a special case with the
'backing' option.

>>   - If you don't have to, can QEMU tell if bs->backing is the original
>> backing file or a new one? I suppose it can, by checking for
>> 'backing / backing.*' in bs->options.
>> 
>>   - If you omit 'backing', does the backing file still get reopened? And
>> more importantly, does it keep its 

Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-18 Thread Kevin Wolf
Am 18.06.2018 um 17:06 hat Alberto Garcia geschrieben:
> On Mon 18 Jun 2018 04:38:01 PM CEST, Kevin Wolf wrote:
> > Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben:
> >> This patch allows the user to change the backing file of an image that
> >> is being reopened. Here's what it does:
> >> 
> >>  - In bdrv_reopen_queue_child(): if the 'backing' option points to an
> >>image different from the current backing file then it means that
> >>the latter is going be detached so it must not be put in the reopen
> >>queue.
> >> 
> >>  - In bdrv_reopen_prepare(): check that the value of 'backing' points
> >>to an existing node or is null. If it points to an existing node it
> >>also needs to make sure that replacing the backing file will not
> >>create a cycle in the node graph (i.e. you cannot reach the parent
> >>from the new backing file).
> >> 
> >>  - In bdrv_reopen_commit(): perform the actual node replacement by
> >>calling bdrv_set_backing_hd(). It may happen that there are
> >>temporary implicit nodes between the BDS that we are reopening and
> >>the backing file that we want to replace (e.g. a commit fiter node),
> >>so we must skip them.
> >
> > I think we shouldn't do this. blockdev-reopen is an advanced command
> > that requires knowledge of the graph at the node level. Therefore we can
> > expect the management tool to consider filter nodes when it reconfigures
> > the graph. This is important because it makes a difference whether a
> > new node is inserted above or below the filter node.
> 
> But those nodes that the management tool knows about would not be
> implicit, or would they?

Hm, you're right, they are only implicit if no node name was given. So
it's not as bad as I thought.

I still think of bs->implicit as a hack to maintain compatibility for
legacy commands rather than something to use in new commands.

> The reason why I did this is because there's several places in the QEMU
> codebase where bdrv_reopen() is called while there's a temporary
> implicit node. For example, on exit bdrv_commit() needs to call
> bdrv_set_backing_hd() to remove intermediate nodes from the chain. This
> happens while bdrv_mirror_top is still there, so if we don't skip it
> then QEMU crashes.

But isn't that bdrv_set_backing_hd() a separate operation? I would
understand that it matters if you changed the code so that it would use
bdrv_reopen() in order to change the backing file, but that's not what
it does, is it?

> >> Although x-blockdev-reopen is meant to be used like blockdev-add,
> >> there's two important things that must be taken into account:
> >> 
> >> 1) The only way to set a new backing file is by using a reference to
> >>an existing node (previously added with e.g. blockdev-add).
> >>If 'backing' contains a dictionary with a new set of options
> >>({"driver": "qcow2", "file": { ... }}) then it is interpreted that
> >>the _existing_ backing file must be reopened with those options.
> >
> > This sounds a bit odd at first, but it makes perfect sense in fact.
> >
> > Maybe we should enforce that child nodes that were openend by reference
> > first must be reopened with a reference, and only those that were defined
> > inline (and therefore inherit options from the parent) can be reopened
> > inline?
> 
> I think that should be doable, yes.
> 
> >> 2) If the 'backing' option is omitted altogether then the existing
> >>backing file (if any) is kept. Unlike blockdev-add this will _not_
> >>open the default backing file specified in the image header.
> >
> > Hm... This is certainly an inconsistency. Is it unavoidable?
> 
> I don't think we want to open new nodes on reopen(), but one easy way to
> avoid this behavior is simply to return an error if the user omits the
> 'backing' option.

Hm, yes, not opening new nodes is a good point.

As long as find a good solution that can be consistently applied to all
BlockdevRef, it should be okay. So I don't necessarily object to the
special casing and just leaving them as they are by default.

> But this raises a few questions. Let's say you have an image with a
> backing file and you open it with blockdev-add omitting the 'backing'
> option. That means the default backing file is opened.
> 
>   - Do you still have to specify 'backing' on reopen? I suppose you
> don't have to.

You would have to. I agree it's ugly (not the least because you probably
didn't assign an explicit node name, though -drive allows specifying
only the node name, but still using the filename from the image file).

>   - If you don't have to, can QEMU tell if bs->backing is the original
> backing file or a new one? I suppose it can, by checking for
> 'backing / backing.*' in bs->options.
> 
>   - If you omit 'backing', does the backing file still get reopened? And
> more importantly, does it keep its current options or are they
> supposed to be reset to their default values?

If you make omitting it an 

Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-18 Thread Alberto Garcia
On Mon 18 Jun 2018 04:38:01 PM CEST, Kevin Wolf wrote:
> Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben:
>> This patch allows the user to change the backing file of an image that
>> is being reopened. Here's what it does:
>> 
>>  - In bdrv_reopen_queue_child(): if the 'backing' option points to an
>>image different from the current backing file then it means that
>>the latter is going be detached so it must not be put in the reopen
>>queue.
>> 
>>  - In bdrv_reopen_prepare(): check that the value of 'backing' points
>>to an existing node or is null. If it points to an existing node it
>>also needs to make sure that replacing the backing file will not
>>create a cycle in the node graph (i.e. you cannot reach the parent
>>from the new backing file).
>> 
>>  - In bdrv_reopen_commit(): perform the actual node replacement by
>>calling bdrv_set_backing_hd(). It may happen that there are
>>temporary implicit nodes between the BDS that we are reopening and
>>the backing file that we want to replace (e.g. a commit fiter node),
>>so we must skip them.
>
> I think we shouldn't do this. blockdev-reopen is an advanced command
> that requires knowledge of the graph at the node level. Therefore we can
> expect the management tool to consider filter nodes when it reconfigures
> the graph. This is important because it makes a difference whether a
> new node is inserted above or below the filter node.

But those nodes that the management tool knows about would not be
implicit, or would they?

The reason why I did this is because there's several places in the QEMU
codebase where bdrv_reopen() is called while there's a temporary
implicit node. For example, on exit bdrv_commit() needs to call
bdrv_set_backing_hd() to remove intermediate nodes from the chain. This
happens while bdrv_mirror_top is still there, so if we don't skip it
then QEMU crashes.

>> Although x-blockdev-reopen is meant to be used like blockdev-add,
>> there's two important things that must be taken into account:
>> 
>> 1) The only way to set a new backing file is by using a reference to
>>an existing node (previously added with e.g. blockdev-add).
>>If 'backing' contains a dictionary with a new set of options
>>({"driver": "qcow2", "file": { ... }}) then it is interpreted that
>>the _existing_ backing file must be reopened with those options.
>
> This sounds a bit odd at first, but it makes perfect sense in fact.
>
> Maybe we should enforce that child nodes that were openend by reference
> first must be reopened with a reference, and only those that were defined
> inline (and therefore inherit options from the parent) can be reopened
> inline?

I think that should be doable, yes.

>> 2) If the 'backing' option is omitted altogether then the existing
>>backing file (if any) is kept. Unlike blockdev-add this will _not_
>>open the default backing file specified in the image header.
>
> Hm... This is certainly an inconsistency. Is it unavoidable?

I don't think we want to open new nodes on reopen(), but one easy way to
avoid this behavior is simply to return an error if the user omits the
'backing' option.

But this raises a few questions. Let's say you have an image with a
backing file and you open it with blockdev-add omitting the 'backing'
option. That means the default backing file is opened.

  - Do you still have to specify 'backing' on reopen? I suppose you
don't have to.

  - If you don't have to, can QEMU tell if bs->backing is the original
backing file or a new one? I suppose it can, by checking for
'backing / backing.*' in bs->options.

  - If you omit 'backing', does the backing file still get reopened? And
more importantly, does it keep its current options or are they
supposed to be reset to their default values?

>> +/* If the 'backing' option is set and it points to a different
>> + * node then it means that we want to replace the current one,
>> + * so we shouldn't put it in the reopen queue */
>> +if (child->role == _backing && qdict_haskey(options, 
>> "backing")) {
>
> I think checking child->name for "backing" makes more sense when we
> also look for the "backing" key in the options QDict. This would make
> generalising it for other children easier (which we probably should
> do, whether in this patch or a follow-up).

Sounds reasonable.

> Do we need some way for e.g. block jobs to forbid changing the backing
> file of the subchain they are operating on?

Are you thinking of any particular scenario?

Berto



Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-18 Thread Kevin Wolf
Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben:
> This patch allows the user to change the backing file of an image that
> is being reopened. Here's what it does:
> 
>  - In bdrv_reopen_queue_child(): if the 'backing' option points to an
>image different from the current backing file then it means that
>the latter is going be detached so it must not be put in the reopen
>queue.
> 
>  - In bdrv_reopen_prepare(): check that the value of 'backing' points
>to an existing node or is null. If it points to an existing node it
>also needs to make sure that replacing the backing file will not
>create a cycle in the node graph (i.e. you cannot reach the parent
>from the new backing file).
> 
>  - In bdrv_reopen_commit(): perform the actual node replacement by
>calling bdrv_set_backing_hd(). It may happen that there are
>temporary implicit nodes between the BDS that we are reopening and
>the backing file that we want to replace (e.g. a commit fiter node),
>so we must skip them.

I think we shouldn't do this. blockdev-reopen is an advanced command
that requires knowledge of the graph at the node level. Therefore we can
expect the management tool to consider filter nodes when it reconfigures
the graph. This is important because it makes a difference whether a
new node is inserted above or below the filter node.

> Although x-blockdev-reopen is meant to be used like blockdev-add,
> there's two important things that must be taken into account:
> 
> 1) The only way to set a new backing file is by using a reference to
>an existing node (previously added with e.g. blockdev-add).
>If 'backing' contains a dictionary with a new set of options
>({"driver": "qcow2", "file": { ... }}) then it is interpreted that
>the _existing_ backing file must be reopened with those options.

This sounds a bit odd at first, but it makes perfect sense in fact.

Maybe we should enforce that child nodes that were openend by reference
first must be reopened with a reference, and only those that were defined
inline (and therefore inherit options from the parent) can be reopened
inline?

> 2) If the 'backing' option is omitted altogether then the existing
>backing file (if any) is kept. Unlike blockdev-add this will _not_
>open the default backing file specified in the image header.

Hm... This is certainly an inconsistency. Is it unavoidable?

> Signed-off-by: Alberto Garcia 
> ---
>  block.c | 86 
> +++--
>  1 file changed, 84 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0b9268a48d..91fff6361f 100644
> --- a/block.c
> +++ b/block.c
> @@ -2823,6 +2823,27 @@ BlockDriverState *bdrv_open(const char *filename, 
> const char *reference,
>  }
>  
>  /*
> + * Returns true if @child can be reached recursively from @bs
> + */
> +static bool bdrv_recurse_has_child(BlockDriverState *bs,
> +   BlockDriverState *child)
> +{
> +BdrvChild *c;
> +
> +if (bs == child) {
> +return true;
> +}
> +
> +QLIST_FOREACH(c, >children, next) {
> +if (bdrv_recurse_has_child(c->bs, child)) {
> +return true;
> +}
> +}
> +
> +return false;
> +}
> +
> +/*
>   * Adds a BlockDriverState to a simple queue for an atomic, transactional
>   * reopen of multiple devices.
>   *
> @@ -2957,6 +2978,7 @@ static BlockReopenQueue 
> *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
>  QLIST_FOREACH(child, >children, next) {
>  QDict *new_child_options;
>  char *child_key_dot;
> +bool child_keep_old = keep_old_opts;
>  
>  /* reopen can only change the options of block devices that were
>   * implicitly created and inherited options. For other (referenced)
> @@ -2965,12 +2987,28 @@ static BlockReopenQueue 
> *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
>  continue;
>  }
>  
> +/* If the 'backing' option is set and it points to a different
> + * node then it means that we want to replace the current one,
> + * so we shouldn't put it in the reopen queue */
> +if (child->role == _backing && qdict_haskey(options, 
> "backing")) {

I think checking child->name for "backing" makes more sense when we also
look for the "backing" key in the options QDict. This would make
generalising it for other children easier (which we probably should do,
whether in this patch or a follow-up).

> +const char *backing = qdict_get_try_str(options, "backing");
> +if (g_strcmp0(backing, child->bs->node_name)) {
> +continue;
> +}
> +}
> +
>  child_key_dot = g_strdup_printf("%s.", child->name);
>  qdict_extract_subqdict(options, _child_options, child_key_dot);
>  g_free(child_key_dot);
>  
> +/* If the 'backing' option was omitted then we keep the old
> + * set of 

[Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-14 Thread Alberto Garcia
This patch allows the user to change the backing file of an image that
is being reopened. Here's what it does:

 - In bdrv_reopen_queue_child(): if the 'backing' option points to an
   image different from the current backing file then it means that
   the latter is going be detached so it must not be put in the reopen
   queue.

 - In bdrv_reopen_prepare(): check that the value of 'backing' points
   to an existing node or is null. If it points to an existing node it
   also needs to make sure that replacing the backing file will not
   create a cycle in the node graph (i.e. you cannot reach the parent
   from the new backing file).

 - In bdrv_reopen_commit(): perform the actual node replacement by
   calling bdrv_set_backing_hd(). It may happen that there are
   temporary implicit nodes between the BDS that we are reopening and
   the backing file that we want to replace (e.g. a commit fiter node),
   so we must skip them.

Although x-blockdev-reopen is meant to be used like blockdev-add,
there's two important things that must be taken into account:

1) The only way to set a new backing file is by using a reference to
   an existing node (previously added with e.g. blockdev-add).
   If 'backing' contains a dictionary with a new set of options
   ({"driver": "qcow2", "file": { ... }}) then it is interpreted that
   the _existing_ backing file must be reopened with those options.

2) If the 'backing' option is omitted altogether then the existing
   backing file (if any) is kept. Unlike blockdev-add this will _not_
   open the default backing file specified in the image header.

Signed-off-by: Alberto Garcia 
---
 block.c | 86 +++--
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0b9268a48d..91fff6361f 100644
--- a/block.c
+++ b/block.c
@@ -2823,6 +2823,27 @@ BlockDriverState *bdrv_open(const char *filename, const 
char *reference,
 }
 
 /*
+ * Returns true if @child can be reached recursively from @bs
+ */
+static bool bdrv_recurse_has_child(BlockDriverState *bs,
+   BlockDriverState *child)
+{
+BdrvChild *c;
+
+if (bs == child) {
+return true;
+}
+
+QLIST_FOREACH(c, >children, next) {
+if (bdrv_recurse_has_child(c->bs, child)) {
+return true;
+}
+}
+
+return false;
+}
+
+/*
  * Adds a BlockDriverState to a simple queue for an atomic, transactional
  * reopen of multiple devices.
  *
@@ -2957,6 +2978,7 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 QLIST_FOREACH(child, >children, next) {
 QDict *new_child_options;
 char *child_key_dot;
+bool child_keep_old = keep_old_opts;
 
 /* reopen can only change the options of block devices that were
  * implicitly created and inherited options. For other (referenced)
@@ -2965,12 +2987,28 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 continue;
 }
 
+/* If the 'backing' option is set and it points to a different
+ * node then it means that we want to replace the current one,
+ * so we shouldn't put it in the reopen queue */
+if (child->role == _backing && qdict_haskey(options, "backing")) 
{
+const char *backing = qdict_get_try_str(options, "backing");
+if (g_strcmp0(backing, child->bs->node_name)) {
+continue;
+}
+}
+
 child_key_dot = g_strdup_printf("%s.", child->name);
 qdict_extract_subqdict(options, _child_options, child_key_dot);
 g_free(child_key_dot);
 
+/* If the 'backing' option was omitted then we keep the old
+ * set of options */
+if (child->role == _backing && !qdict_size(new_child_options)) {
+child_keep_old = true;
+}
+
 bdrv_reopen_queue_child(bs_queue, child->bs, new_child_options, 0,
-child->role, options, flags, keep_old_opts);
+child->role, options, flags, child_keep_old);
 }
 
 return bs_queue;
@@ -3262,7 +3300,34 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
  * the user can simply omit options which cannot be changed anyway,
  * so they will stay unchanged.
  */
-if (!qobject_is_equal(new, old)) {
+/*
+ * Allow changing the 'backing' option. The new value can be
+ * either a reference to an existing node (using its node name)
+ * or NULL to simply detach the current backing file.
+ */
+if (!strcmp(entry->key, "backing")) {
+switch (qobject_type(new)) {
+case QTYPE_QNULL:
+break;
+case QTYPE_QSTRING: {
+const char *str =