Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
On Thu 21 Jun 2018 03:06:22 PM CEST, Kevin Wolf wrote: >> > Actually, do we ever use bdrv_reopen() for flags other than >> > read-only? Maybe we should get rid of that flags nonsense and >> > simply make it a bdrv_reopen_set_readonly() taking a boolean. >> >> I think that's a good idea. There's however one case where other >> flags are changed: reopen_backing_file() in block/replication.c that >> also clears BDRV_O_INACTIVE. I'd need to take a closer look to that >> code to see what we can do about it. > > Uh, and that works? > > Clearing BDRV_O_INACTIVE with bdrv_reopen() (which means, without > calling bdrv_invalidate_cache()) is surely suspicious. Probably this > code is buggy. > > Another reason for removing flags from the interface: We didn't do any > sanity checks for the flag changes. I'm working on removing the flags parameter from bdrv_reopen() and friends, and I think this is the only clearly strange thing on the way. The documentation is in https://wiki.qemu.org/Features/BlockReplication or in docs/block-replication.txt. As far as I can see there's the following backing chain: [secondary] <- [hidden] <- [active] There's an NBD server writing on [secondary], a backup job copying data from [secondary] to [hidden], and the VM writing to [active]. The reopen_backing_file() function in question simply puts the two backing images in read-write mode (clearing BDRV_O_INACTIVE along the way) so the NBD server and the backup job can write to them. This is done by replication_start(), which can only be reached with the QMP command "xen-set-replication". There's a comment that says "The caller of the function MUST make sure vm stopped" but no one seems to check that the vm is indeed stopped (?). This API was btw added half a year after the replication driver, and before this point the only user of replication_start() was a unit test. Without having tested the COLO / replication code before, I don't see any reason why it would make sense to clear the BDRV_O_INACTIVE flag on the backing files, or why that flag would be there in the first place (and I'm not even talking about whether that flag is supposed to be set/cleared with a simple bdrv_reopen()). So I'm thinking to simply remove it from there. If anyone else has more information I'd be happy to hear it. Berto
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
Am 20.06.2018 um 14:35 hat Alberto Garcia geschrieben: > On Wed 20 Jun 2018 12:58:55 PM CEST, Kevin Wolf wrote: > > Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben: > >> >> Wait, I think the description I gave is inaccurate: > >> >> > >> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the > >> >> backing image name (c->role->update_filename()). If we're doing this in > >> >> an intermediate node then it needs to be reopened in read-write mode, > >> >> while keeping the rest of the options. > >> >> > >> >> But then bdrv_reopen_commit() realizes that the backing file (from > >> >> reopen_state->options) is not the current one (because there's a > >> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's > >> >> the root cause of the crash. > >> > > >> > How did the other (the real?) backing file get into > >> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to > >> > change anything except the read-only flag, so we should surely have > >> > the node name of bdrv_mirror_top there? > >> > >> No, it doesn't (try to) change anything else. It cannot do it: > >> bdrv_reopen() only takes flags, but keeps the current options. And the > >> current options have 'backing' set to a thing different from the > >> bdrv_mirror_top node. > > > > Okay, so in theory this is expected to just work. > > > > Actually, do we ever use bdrv_reopen() for flags other than read-only? > > Maybe we should get rid of that flags nonsense and simply make it a > > bdrv_reopen_set_readonly() taking a boolean. > > I think that's a good idea. There's however one case where other flags > are changed: reopen_backing_file() in block/replication.c that also > clears BDRV_O_INACTIVE. I'd need to take a closer look to that code to > see what we can do about it. Uh, and that works? Clearing BDRV_O_INACTIVE with bdrv_reopen() (which means, without calling bdrv_invalidate_cache()) is surely suspicious. Probably this code is buggy. Another reason for removing flags from the interface: We didn't do any sanity checks for the flag changes. > And there's of course qemu-io's "reopen" command, which does take > options but keeps the previous values. It provides -r/-w for changing readonly and -c to change the cache mode flags. Both should be easy to convert to QDict options. > > I think we should already remove nested options of children from the > > dicts in bdrv_open_inherit(). I'm less sure about node-name > > references. Maybe instead of keeing the dicts up-to-date each time we > > modify the graph, we should just get rid of those in the dicts as > > well, and instead add a function that reconstructs the full dict from > > bs->options and the currently attached children. This requires that > > the child name and the option name match, but I think that's > > true. (Mostly at least - what about quorum? But the child node > > handling of quorum is broken anyway.) > > Yes, removing nested options sounds like a good idea. In what cases do > we need the full qdict, though? Not sure, maybe we don't even need them now that we decided that the default is leaving the reference unchanged. There's the creation of json: filenames, maybe we need it there. We'd also certainly need to get the full QDict if we wanted to convert the options to a QAPI object before we pass them to the drivers. Kevin
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
On Wed 20 Jun 2018 12:58:55 PM CEST, Kevin Wolf wrote: > Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben: >> >> Wait, I think the description I gave is inaccurate: >> >> >> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the >> >> backing image name (c->role->update_filename()). If we're doing this in >> >> an intermediate node then it needs to be reopened in read-write mode, >> >> while keeping the rest of the options. >> >> >> >> But then bdrv_reopen_commit() realizes that the backing file (from >> >> reopen_state->options) is not the current one (because there's a >> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's >> >> the root cause of the crash. >> > >> > How did the other (the real?) backing file get into >> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to >> > change anything except the read-only flag, so we should surely have >> > the node name of bdrv_mirror_top there? >> >> No, it doesn't (try to) change anything else. It cannot do it: >> bdrv_reopen() only takes flags, but keeps the current options. And the >> current options have 'backing' set to a thing different from the >> bdrv_mirror_top node. > > Okay, so in theory this is expected to just work. > > Actually, do we ever use bdrv_reopen() for flags other than read-only? > Maybe we should get rid of that flags nonsense and simply make it a > bdrv_reopen_set_readonly() taking a boolean. I think that's a good idea. There's however one case where other flags are changed: reopen_backing_file() in block/replication.c that also clears BDRV_O_INACTIVE. I'd need to take a closer look to that code to see what we can do about it. And there's of course qemu-io's "reopen" command, which does take options but keeps the previous values. >> > > So my first impression is that we should not try to change backing >> > > files if a reopen was not requested by the user (blockdev-reopen) >> > > and perhaps we should forbid it when there are implicit nodes >> > > involved. >> > Changing the behaviour depending on who the caller is sounds like a >> > hack that relies on coincidence and may break sooner or later. >> >> The main difference between the user calling blockdev-reopen and the >> code doing bdrv_reopen() internally is that in the former case all >> existing options are ignored (keep_old_opts = false) and in the latter >> they are kept. >> >> This latter case can have unintended consequences, and I think they're >> all related to the fact that bs->explicit_options also keeps the options >> of its children. So if you have >> >>C <- B <- A >> >> and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and >> you reopen A (keeping its options) then everything goes fine. However if >> you remove B from the chain (using e.g. block-stream) then you can't >> reopen A anymore because backing.* no longer matches the current backing >> file. >> >> I suppose that we would need to remove the children options from >> bs->explicit_options in that case? If this all happens because of the >> user calling blockdev-reopen then we don't need to worry about it becase >> bs->explicit_options are ignored. > > So the problem is that bs->explicit_options (and probably bs->options) > aren't consistent with the actual state of the graph. The fix for that > is likely not in bdrv_reopen(). Probably not because the graph can be changed by other means (e.g block-stream as I just said). > I think we should already remove nested options of children from the > dicts in bdrv_open_inherit(). I'm less sure about node-name > references. Maybe instead of keeing the dicts up-to-date each time we > modify the graph, we should just get rid of those in the dicts as > well, and instead add a function that reconstructs the full dict from > bs->options and the currently attached children. This requires that > the child name and the option name match, but I think that's > true. (Mostly at least - what about quorum? But the child node > handling of quorum is broken anyway.) Yes, removing nested options sounds like a good idea. In what cases do we need the full qdict, though? >> At the moment there's >> >> Berto > > And it's very good to have a Berto at the moment, but I think you > wanted to add something else there? ;-) I think it was just a leftover from a previous paragraph :-) Berto
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben: > >> Wait, I think the description I gave is inaccurate: > >> > >> commit_complete() calls bdrv_drop_intermediate(), and that updates the > >> backing image name (c->role->update_filename()). If we're doing this in > >> an intermediate node then it needs to be reopened in read-write mode, > >> while keeping the rest of the options. > >> > >> But then bdrv_reopen_commit() realizes that the backing file (from > >> reopen_state->options) is not the current one (because there's a > >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's > >> the root cause of the crash. > > > > How did the other (the real?) backing file get into > > reopen_state->options? I don't think bdrv_drop_intermediate() wants to > > change anything except the read-only flag, so we should surely have > > the node name of bdrv_mirror_top there? > > No, it doesn't (try to) change anything else. It cannot do it: > bdrv_reopen() only takes flags, but keeps the current options. And the > current options have 'backing' set to a thing different from the > bdrv_mirror_top node. Okay, so in theory this is expected to just work. Actually, do we ever use bdrv_reopen() for flags other than read-only? Maybe we should get rid of that flags nonsense and simply make it a bdrv_reopen_set_readonly() taking a boolean. > > > So my first impression is that we should not try to change backing > > > files if a reopen was not requested by the user (blockdev-reopen) > > > and perhaps we should forbid it when there are implicit nodes > > > involved. > > Changing the behaviour depending on who the caller is sounds like a > > hack that relies on coincidence and may break sooner or later. > > The main difference between the user calling blockdev-reopen and the > code doing bdrv_reopen() internally is that in the former case all > existing options are ignored (keep_old_opts = false) and in the latter > they are kept. > > This latter case can have unintended consequences, and I think they're > all related to the fact that bs->explicit_options also keeps the options > of its children. So if you have > >C <- B <- A > > and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and > you reopen A (keeping its options) then everything goes fine. However if > you remove B from the chain (using e.g. block-stream) then you can't > reopen A anymore because backing.* no longer matches the current backing > file. > > I suppose that we would need to remove the children options from > bs->explicit_options in that case? If this all happens because of the > user calling blockdev-reopen then we don't need to worry about it becase > bs->explicit_options are ignored. So the problem is that bs->explicit_options (and probably bs->options) aren't consistent with the actual state of the graph. The fix for that is likely not in bdrv_reopen(). I think we should already remove nested options of children from the dicts in bdrv_open_inherit(). I'm less sure about node-name references. Maybe instead of keeing the dicts up-to-date each time we modify the graph, we should just get rid of those in the dicts as well, and instead add a function that reconstructs the full dict from bs->options and the currently attached children. This requires that the child name and the option name match, but I think that's true. (Mostly at least - what about quorum? But the child node handling of quorum is broken anyway.) I'm almost sure Max has an opinion about this, too. :-) > >> Ah, ok, I think that's related to the crash that I mentioned earlier > >> with block-commit. Yes, it makes sense that we forbid that. I suppose > >> that we can do this already with BLK_PERM_GRAPH_MOD ? > > > > Possibly. The thing with BLK_PERM_GRAPH_MOD is that nobody knows what > > its meaning has to be so that it's actually useful, so we're not using > > it in any meaningful way yet. > > > > I suspect BLK_PERM_GRAPH_MOD is the wrong tool because what we > > actually want to protect against modification is not a BDS, but a > > BdrvChild. But I may be wrong there. > > I'll take a closer look and see what I come up with. Okay. Maybe don't implement anything yet, but just try to come up with a concept for discussion. > At the moment there's > > Berto And it's very good to have a Berto at the moment, but I think you wanted to add something else there? ;-) Kevin
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
>> Wait, I think the description I gave is inaccurate: >> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the >> backing image name (c->role->update_filename()). If we're doing this in >> an intermediate node then it needs to be reopened in read-write mode, >> while keeping the rest of the options. >> >> But then bdrv_reopen_commit() realizes that the backing file (from >> reopen_state->options) is not the current one (because there's a >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's >> the root cause of the crash. > > How did the other (the real?) backing file get into > reopen_state->options? I don't think bdrv_drop_intermediate() wants to > change anything except the read-only flag, so we should surely have > the node name of bdrv_mirror_top there? No, it doesn't (try to) change anything else. It cannot do it: bdrv_reopen() only takes flags, but keeps the current options. And the current options have 'backing' set to a thing different from the bdrv_mirror_top node. > > So my first impression is that we should not try to change backing > > files if a reopen was not requested by the user (blockdev-reopen) > > and perhaps we should forbid it when there are implicit nodes > > involved. > Changing the behaviour depending on who the caller is sounds like a > hack that relies on coincidence and may break sooner or later. The main difference between the user calling blockdev-reopen and the code doing bdrv_reopen() internally is that in the former case all existing options are ignored (keep_old_opts = false) and in the latter they are kept. This latter case can have unintended consequences, and I think they're all related to the fact that bs->explicit_options also keeps the options of its children. So if you have C <- B <- A and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and you reopen A (keeping its options) then everything goes fine. However if you remove B from the chain (using e.g. block-stream) then you can't reopen A anymore because backing.* no longer matches the current backing file. I suppose that we would need to remove the children options from bs->explicit_options in that case? If this all happens because of the user calling blockdev-reopen then we don't need to worry about it becase bs->explicit_options are ignored. >> >> >> 2) If the 'backing' option is omitted altogether then the existing >> >> >> backing file (if any) is kept. Unlike blockdev-add this will _not_ >> >> >> open the default backing file specified in the image header. >> >> > >> >> > Hm... This is certainly an inconsistency. Is it unavoidable? >> >> >> >> I don't think we want to open new nodes on reopen(), but one easy way >> >> to avoid this behavior is simply to return an error if the user omits >> >> the 'backing' option. >> > >> > Hm, yes, not opening new nodes is a good point. >> > >> > As long as find a good solution that can be consistently applied to >> > all BlockdevRef, it should be okay. So I don't necessarily object to >> > the special casing and just leaving them as they are by default. >> > >> >> But this raises a few questions. Let's say you have an image with a >> >> backing file and you open it with blockdev-add omitting the 'backing' >> >> option. That means the default backing file is opened. >> >> >> >> - Do you still have to specify 'backing' on reopen? I suppose you >> >> don't have to. >> > >> > You would have to. I agree it's ugly (not the least because you probably >> > didn't assign an explicit node name, though -drive allows specifying >> > only the node name, but still using the filename from the image file). >> >> Yes it's a bit ugly, but we wouldn't be having a special case with the >> 'backing' option. > > I think files I'm meanwhile tending towards your current solution > (i.e. default is leaving the reference as it is and you must use null > to get rid of a backing file). It's not necessarily a bad option. The only one that I don't like is opening the default backing file if 'backing' is omitted. >> >> > Do we need some way for e.g. block jobs to forbid changing the >> >> > backing file of the subchain they are operating on? >> >> >> >> Are you thinking of any particular scenario? >> > >> > Not specifically, but I think e.g. the commit job might get confused >> > if you break its backing chain because it assumes that base is >> > somewhere in the backing chain of top, and also that it called >> > block_job_add_bdrv() on everything in the subchain it is working on. >> > >> > It just feels like we'd allow to break any such assumptions if we >> > don't block graph changes there. >> >> Ah, ok, I think that's related to the crash that I mentioned earlier >> with block-commit. Yes, it makes sense that we forbid that. I suppose >> that we can do this already with BLK_PERM_GRAPH_MOD ? > > Possibly. The thing with BLK_PERM_GRAPH_MOD is that nobody knows what > its meaning has to be so that it's actually useful, so
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
Am 19.06.2018 um 14:27 hat Alberto Garcia geschrieben: > On Mon 18 Jun 2018 06:12:29 PM CEST, Kevin Wolf wrote: > >> >> This patch allows the user to change the backing file of an image > >> >> that is being reopened. Here's what it does: > >> >> > >> >> - In bdrv_reopen_queue_child(): if the 'backing' option points to an > >> >>image different from the current backing file then it means that > >> >>the latter is going be detached so it must not be put in the reopen > >> >>queue. > >> >> > >> >> - In bdrv_reopen_prepare(): check that the value of 'backing' points > >> >>to an existing node or is null. If it points to an existing node it > >> >>also needs to make sure that replacing the backing file will not > >> >>create a cycle in the node graph (i.e. you cannot reach the parent > >> >>from the new backing file). > >> >> > >> >> - In bdrv_reopen_commit(): perform the actual node replacement by > >> >>calling bdrv_set_backing_hd(). It may happen that there are > >> >>temporary implicit nodes between the BDS that we are reopening and > >> >>the backing file that we want to replace (e.g. a commit fiter node), > >> >>so we must skip them. > >> > > >> > I think we shouldn't do this. blockdev-reopen is an advanced command > >> > that requires knowledge of the graph at the node level. Therefore we can > >> > expect the management tool to consider filter nodes when it reconfigures > >> > the graph. This is important because it makes a difference whether a > >> > new node is inserted above or below the filter node. > >> > >> But those nodes that the management tool knows about would not be > >> implicit, or would they? > > > > Hm, you're right, they are only implicit if no node name was given. So > > it's not as bad as I thought. > > > > I still think of bs->implicit as a hack to maintain compatibility for > > legacy commands rather than something to use in new commands. > > Yes, and I'm still not 100% convinced that skipping the implicit nodes > as I'm doing here is the proper solution, so maybe I'll need to come up > with something else. > > >> The reason why I did this is because there's several places in the > >> QEMU codebase where bdrv_reopen() is called while there's a temporary > >> implicit node. For example, on exit bdrv_commit() needs to call > >> bdrv_set_backing_hd() to remove intermediate nodes from the > >> chain. This happens while bdrv_mirror_top is still there, so if we > >> don't skip it then QEMU crashes. > > > > But isn't that bdrv_set_backing_hd() a separate operation? I would > > understand that it matters if you changed the code so that it would > > use bdrv_reopen() in order to change the backing file, but that's not > > what it does, is it? > > Wait, I think the description I gave is inaccurate: > > commit_complete() calls bdrv_drop_intermediate(), and that updates the > backing image name (c->role->update_filename()). If we're doing this in > an intermediate node then it needs to be reopened in read-write mode, > while keeping the rest of the options. > > But then bdrv_reopen_commit() realizes that the backing file (from > reopen_state->options) is not the current one (because there's a > bdrv_mirror_top implicit filter) and attempts to remove it. And that's > the root cause of the crash. How did the other (the real?) backing file get into reopen_state->options? I don't think bdrv_drop_intermediate() wants to change anything except the read-only flag, so we should surely have the node name of bdrv_mirror_top there? > So my first impression is that we should not try to change backing files > if a reopen was not requested by the user (blockdev-reopen) and perhaps > we should forbid it when there are implicit nodes involved. Changing the behaviour depending on who the caller is sounds like a hack that relies on coincidence and may break sooner or later. > >> >> 2) If the 'backing' option is omitted altogether then the existing > >> >> backing file (if any) is kept. Unlike blockdev-add this will _not_ > >> >> open the default backing file specified in the image header. > >> > > >> > Hm... This is certainly an inconsistency. Is it unavoidable? > >> > >> I don't think we want to open new nodes on reopen(), but one easy way > >> to avoid this behavior is simply to return an error if the user omits > >> the 'backing' option. > > > > Hm, yes, not opening new nodes is a good point. > > > > As long as find a good solution that can be consistently applied to > > all BlockdevRef, it should be okay. So I don't necessarily object to > > the special casing and just leaving them as they are by default. > > > >> But this raises a few questions. Let's say you have an image with a > >> backing file and you open it with blockdev-add omitting the 'backing' > >> option. That means the default backing file is opened. > >> > >> - Do you still have to specify 'backing' on reopen? I suppose you > >> don't have to. > > > > You would have to. I
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
On Mon 18 Jun 2018 06:12:29 PM CEST, Kevin Wolf wrote: >> >> This patch allows the user to change the backing file of an image >> >> that is being reopened. Here's what it does: >> >> >> >> - In bdrv_reopen_queue_child(): if the 'backing' option points to an >> >>image different from the current backing file then it means that >> >>the latter is going be detached so it must not be put in the reopen >> >>queue. >> >> >> >> - In bdrv_reopen_prepare(): check that the value of 'backing' points >> >>to an existing node or is null. If it points to an existing node it >> >>also needs to make sure that replacing the backing file will not >> >>create a cycle in the node graph (i.e. you cannot reach the parent >> >>from the new backing file). >> >> >> >> - In bdrv_reopen_commit(): perform the actual node replacement by >> >>calling bdrv_set_backing_hd(). It may happen that there are >> >>temporary implicit nodes between the BDS that we are reopening and >> >>the backing file that we want to replace (e.g. a commit fiter node), >> >>so we must skip them. >> > >> > I think we shouldn't do this. blockdev-reopen is an advanced command >> > that requires knowledge of the graph at the node level. Therefore we can >> > expect the management tool to consider filter nodes when it reconfigures >> > the graph. This is important because it makes a difference whether a >> > new node is inserted above or below the filter node. >> >> But those nodes that the management tool knows about would not be >> implicit, or would they? > > Hm, you're right, they are only implicit if no node name was given. So > it's not as bad as I thought. > > I still think of bs->implicit as a hack to maintain compatibility for > legacy commands rather than something to use in new commands. Yes, and I'm still not 100% convinced that skipping the implicit nodes as I'm doing here is the proper solution, so maybe I'll need to come up with something else. >> The reason why I did this is because there's several places in the >> QEMU codebase where bdrv_reopen() is called while there's a temporary >> implicit node. For example, on exit bdrv_commit() needs to call >> bdrv_set_backing_hd() to remove intermediate nodes from the >> chain. This happens while bdrv_mirror_top is still there, so if we >> don't skip it then QEMU crashes. > > But isn't that bdrv_set_backing_hd() a separate operation? I would > understand that it matters if you changed the code so that it would > use bdrv_reopen() in order to change the backing file, but that's not > what it does, is it? Wait, I think the description I gave is inaccurate: commit_complete() calls bdrv_drop_intermediate(), and that updates the backing image name (c->role->update_filename()). If we're doing this in an intermediate node then it needs to be reopened in read-write mode, while keeping the rest of the options. But then bdrv_reopen_commit() realizes that the backing file (from reopen_state->options) is not the current one (because there's a bdrv_mirror_top implicit filter) and attempts to remove it. And that's the root cause of the crash. So my first impression is that we should not try to change backing files if a reopen was not requested by the user (blockdev-reopen) and perhaps we should forbid it when there are implicit nodes involved. >> >> 2) If the 'backing' option is omitted altogether then the existing >> >> backing file (if any) is kept. Unlike blockdev-add this will _not_ >> >> open the default backing file specified in the image header. >> > >> > Hm... This is certainly an inconsistency. Is it unavoidable? >> >> I don't think we want to open new nodes on reopen(), but one easy way >> to avoid this behavior is simply to return an error if the user omits >> the 'backing' option. > > Hm, yes, not opening new nodes is a good point. > > As long as find a good solution that can be consistently applied to > all BlockdevRef, it should be okay. So I don't necessarily object to > the special casing and just leaving them as they are by default. > >> But this raises a few questions. Let's say you have an image with a >> backing file and you open it with blockdev-add omitting the 'backing' >> option. That means the default backing file is opened. >> >> - Do you still have to specify 'backing' on reopen? I suppose you >> don't have to. > > You would have to. I agree it's ugly (not the least because you probably > didn't assign an explicit node name, though -drive allows specifying > only the node name, but still using the filename from the image file). Yes it's a bit ugly, but we wouldn't be having a special case with the 'backing' option. >> - If you don't have to, can QEMU tell if bs->backing is the original >> backing file or a new one? I suppose it can, by checking for >> 'backing / backing.*' in bs->options. >> >> - If you omit 'backing', does the backing file still get reopened? And >> more importantly, does it keep its
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
Am 18.06.2018 um 17:06 hat Alberto Garcia geschrieben: > On Mon 18 Jun 2018 04:38:01 PM CEST, Kevin Wolf wrote: > > Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben: > >> This patch allows the user to change the backing file of an image that > >> is being reopened. Here's what it does: > >> > >> - In bdrv_reopen_queue_child(): if the 'backing' option points to an > >>image different from the current backing file then it means that > >>the latter is going be detached so it must not be put in the reopen > >>queue. > >> > >> - In bdrv_reopen_prepare(): check that the value of 'backing' points > >>to an existing node or is null. If it points to an existing node it > >>also needs to make sure that replacing the backing file will not > >>create a cycle in the node graph (i.e. you cannot reach the parent > >>from the new backing file). > >> > >> - In bdrv_reopen_commit(): perform the actual node replacement by > >>calling bdrv_set_backing_hd(). It may happen that there are > >>temporary implicit nodes between the BDS that we are reopening and > >>the backing file that we want to replace (e.g. a commit fiter node), > >>so we must skip them. > > > > I think we shouldn't do this. blockdev-reopen is an advanced command > > that requires knowledge of the graph at the node level. Therefore we can > > expect the management tool to consider filter nodes when it reconfigures > > the graph. This is important because it makes a difference whether a > > new node is inserted above or below the filter node. > > But those nodes that the management tool knows about would not be > implicit, or would they? Hm, you're right, they are only implicit if no node name was given. So it's not as bad as I thought. I still think of bs->implicit as a hack to maintain compatibility for legacy commands rather than something to use in new commands. > The reason why I did this is because there's several places in the QEMU > codebase where bdrv_reopen() is called while there's a temporary > implicit node. For example, on exit bdrv_commit() needs to call > bdrv_set_backing_hd() to remove intermediate nodes from the chain. This > happens while bdrv_mirror_top is still there, so if we don't skip it > then QEMU crashes. But isn't that bdrv_set_backing_hd() a separate operation? I would understand that it matters if you changed the code so that it would use bdrv_reopen() in order to change the backing file, but that's not what it does, is it? > >> Although x-blockdev-reopen is meant to be used like blockdev-add, > >> there's two important things that must be taken into account: > >> > >> 1) The only way to set a new backing file is by using a reference to > >>an existing node (previously added with e.g. blockdev-add). > >>If 'backing' contains a dictionary with a new set of options > >>({"driver": "qcow2", "file": { ... }}) then it is interpreted that > >>the _existing_ backing file must be reopened with those options. > > > > This sounds a bit odd at first, but it makes perfect sense in fact. > > > > Maybe we should enforce that child nodes that were openend by reference > > first must be reopened with a reference, and only those that were defined > > inline (and therefore inherit options from the parent) can be reopened > > inline? > > I think that should be doable, yes. > > >> 2) If the 'backing' option is omitted altogether then the existing > >>backing file (if any) is kept. Unlike blockdev-add this will _not_ > >>open the default backing file specified in the image header. > > > > Hm... This is certainly an inconsistency. Is it unavoidable? > > I don't think we want to open new nodes on reopen(), but one easy way to > avoid this behavior is simply to return an error if the user omits the > 'backing' option. Hm, yes, not opening new nodes is a good point. As long as find a good solution that can be consistently applied to all BlockdevRef, it should be okay. So I don't necessarily object to the special casing and just leaving them as they are by default. > But this raises a few questions. Let's say you have an image with a > backing file and you open it with blockdev-add omitting the 'backing' > option. That means the default backing file is opened. > > - Do you still have to specify 'backing' on reopen? I suppose you > don't have to. You would have to. I agree it's ugly (not the least because you probably didn't assign an explicit node name, though -drive allows specifying only the node name, but still using the filename from the image file). > - If you don't have to, can QEMU tell if bs->backing is the original > backing file or a new one? I suppose it can, by checking for > 'backing / backing.*' in bs->options. > > - If you omit 'backing', does the backing file still get reopened? And > more importantly, does it keep its current options or are they > supposed to be reset to their default values? If you make omitting it an
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
On Mon 18 Jun 2018 04:38:01 PM CEST, Kevin Wolf wrote: > Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben: >> This patch allows the user to change the backing file of an image that >> is being reopened. Here's what it does: >> >> - In bdrv_reopen_queue_child(): if the 'backing' option points to an >>image different from the current backing file then it means that >>the latter is going be detached so it must not be put in the reopen >>queue. >> >> - In bdrv_reopen_prepare(): check that the value of 'backing' points >>to an existing node or is null. If it points to an existing node it >>also needs to make sure that replacing the backing file will not >>create a cycle in the node graph (i.e. you cannot reach the parent >>from the new backing file). >> >> - In bdrv_reopen_commit(): perform the actual node replacement by >>calling bdrv_set_backing_hd(). It may happen that there are >>temporary implicit nodes between the BDS that we are reopening and >>the backing file that we want to replace (e.g. a commit fiter node), >>so we must skip them. > > I think we shouldn't do this. blockdev-reopen is an advanced command > that requires knowledge of the graph at the node level. Therefore we can > expect the management tool to consider filter nodes when it reconfigures > the graph. This is important because it makes a difference whether a > new node is inserted above or below the filter node. But those nodes that the management tool knows about would not be implicit, or would they? The reason why I did this is because there's several places in the QEMU codebase where bdrv_reopen() is called while there's a temporary implicit node. For example, on exit bdrv_commit() needs to call bdrv_set_backing_hd() to remove intermediate nodes from the chain. This happens while bdrv_mirror_top is still there, so if we don't skip it then QEMU crashes. >> Although x-blockdev-reopen is meant to be used like blockdev-add, >> there's two important things that must be taken into account: >> >> 1) The only way to set a new backing file is by using a reference to >>an existing node (previously added with e.g. blockdev-add). >>If 'backing' contains a dictionary with a new set of options >>({"driver": "qcow2", "file": { ... }}) then it is interpreted that >>the _existing_ backing file must be reopened with those options. > > This sounds a bit odd at first, but it makes perfect sense in fact. > > Maybe we should enforce that child nodes that were openend by reference > first must be reopened with a reference, and only those that were defined > inline (and therefore inherit options from the parent) can be reopened > inline? I think that should be doable, yes. >> 2) If the 'backing' option is omitted altogether then the existing >>backing file (if any) is kept. Unlike blockdev-add this will _not_ >>open the default backing file specified in the image header. > > Hm... This is certainly an inconsistency. Is it unavoidable? I don't think we want to open new nodes on reopen(), but one easy way to avoid this behavior is simply to return an error if the user omits the 'backing' option. But this raises a few questions. Let's say you have an image with a backing file and you open it with blockdev-add omitting the 'backing' option. That means the default backing file is opened. - Do you still have to specify 'backing' on reopen? I suppose you don't have to. - If you don't have to, can QEMU tell if bs->backing is the original backing file or a new one? I suppose it can, by checking for 'backing / backing.*' in bs->options. - If you omit 'backing', does the backing file still get reopened? And more importantly, does it keep its current options or are they supposed to be reset to their default values? >> +/* If the 'backing' option is set and it points to a different >> + * node then it means that we want to replace the current one, >> + * so we shouldn't put it in the reopen queue */ >> +if (child->role == _backing && qdict_haskey(options, >> "backing")) { > > I think checking child->name for "backing" makes more sense when we > also look for the "backing" key in the options QDict. This would make > generalising it for other children easier (which we probably should > do, whether in this patch or a follow-up). Sounds reasonable. > Do we need some way for e.g. block jobs to forbid changing the backing > file of the subchain they are operating on? Are you thinking of any particular scenario? Berto
Re: [Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
Am 14.06.2018 um 17:49 hat Alberto Garcia geschrieben: > This patch allows the user to change the backing file of an image that > is being reopened. Here's what it does: > > - In bdrv_reopen_queue_child(): if the 'backing' option points to an >image different from the current backing file then it means that >the latter is going be detached so it must not be put in the reopen >queue. > > - In bdrv_reopen_prepare(): check that the value of 'backing' points >to an existing node or is null. If it points to an existing node it >also needs to make sure that replacing the backing file will not >create a cycle in the node graph (i.e. you cannot reach the parent >from the new backing file). > > - In bdrv_reopen_commit(): perform the actual node replacement by >calling bdrv_set_backing_hd(). It may happen that there are >temporary implicit nodes between the BDS that we are reopening and >the backing file that we want to replace (e.g. a commit fiter node), >so we must skip them. I think we shouldn't do this. blockdev-reopen is an advanced command that requires knowledge of the graph at the node level. Therefore we can expect the management tool to consider filter nodes when it reconfigures the graph. This is important because it makes a difference whether a new node is inserted above or below the filter node. > Although x-blockdev-reopen is meant to be used like blockdev-add, > there's two important things that must be taken into account: > > 1) The only way to set a new backing file is by using a reference to >an existing node (previously added with e.g. blockdev-add). >If 'backing' contains a dictionary with a new set of options >({"driver": "qcow2", "file": { ... }}) then it is interpreted that >the _existing_ backing file must be reopened with those options. This sounds a bit odd at first, but it makes perfect sense in fact. Maybe we should enforce that child nodes that were openend by reference first must be reopened with a reference, and only those that were defined inline (and therefore inherit options from the parent) can be reopened inline? > 2) If the 'backing' option is omitted altogether then the existing >backing file (if any) is kept. Unlike blockdev-add this will _not_ >open the default backing file specified in the image header. Hm... This is certainly an inconsistency. Is it unavoidable? > Signed-off-by: Alberto Garcia > --- > block.c | 86 > +++-- > 1 file changed, 84 insertions(+), 2 deletions(-) > > diff --git a/block.c b/block.c > index 0b9268a48d..91fff6361f 100644 > --- a/block.c > +++ b/block.c > @@ -2823,6 +2823,27 @@ BlockDriverState *bdrv_open(const char *filename, > const char *reference, > } > > /* > + * Returns true if @child can be reached recursively from @bs > + */ > +static bool bdrv_recurse_has_child(BlockDriverState *bs, > + BlockDriverState *child) > +{ > +BdrvChild *c; > + > +if (bs == child) { > +return true; > +} > + > +QLIST_FOREACH(c, >children, next) { > +if (bdrv_recurse_has_child(c->bs, child)) { > +return true; > +} > +} > + > +return false; > +} > + > +/* > * Adds a BlockDriverState to a simple queue for an atomic, transactional > * reopen of multiple devices. > * > @@ -2957,6 +2978,7 @@ static BlockReopenQueue > *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, > QLIST_FOREACH(child, >children, next) { > QDict *new_child_options; > char *child_key_dot; > +bool child_keep_old = keep_old_opts; > > /* reopen can only change the options of block devices that were > * implicitly created and inherited options. For other (referenced) > @@ -2965,12 +2987,28 @@ static BlockReopenQueue > *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, > continue; > } > > +/* If the 'backing' option is set and it points to a different > + * node then it means that we want to replace the current one, > + * so we shouldn't put it in the reopen queue */ > +if (child->role == _backing && qdict_haskey(options, > "backing")) { I think checking child->name for "backing" makes more sense when we also look for the "backing" key in the options QDict. This would make generalising it for other children easier (which we probably should do, whether in this patch or a follow-up). > +const char *backing = qdict_get_try_str(options, "backing"); > +if (g_strcmp0(backing, child->bs->node_name)) { > +continue; > +} > +} > + > child_key_dot = g_strdup_printf("%s.", child->name); > qdict_extract_subqdict(options, _child_options, child_key_dot); > g_free(child_key_dot); > > +/* If the 'backing' option was omitted then we keep the old > + * set of
[Qemu-devel] [RFC PATCH 06/10] block: Allow changing the backing file on reopen
This patch allows the user to change the backing file of an image that is being reopened. Here's what it does: - In bdrv_reopen_queue_child(): if the 'backing' option points to an image different from the current backing file then it means that the latter is going be detached so it must not be put in the reopen queue. - In bdrv_reopen_prepare(): check that the value of 'backing' points to an existing node or is null. If it points to an existing node it also needs to make sure that replacing the backing file will not create a cycle in the node graph (i.e. you cannot reach the parent from the new backing file). - In bdrv_reopen_commit(): perform the actual node replacement by calling bdrv_set_backing_hd(). It may happen that there are temporary implicit nodes between the BDS that we are reopening and the backing file that we want to replace (e.g. a commit fiter node), so we must skip them. Although x-blockdev-reopen is meant to be used like blockdev-add, there's two important things that must be taken into account: 1) The only way to set a new backing file is by using a reference to an existing node (previously added with e.g. blockdev-add). If 'backing' contains a dictionary with a new set of options ({"driver": "qcow2", "file": { ... }}) then it is interpreted that the _existing_ backing file must be reopened with those options. 2) If the 'backing' option is omitted altogether then the existing backing file (if any) is kept. Unlike blockdev-add this will _not_ open the default backing file specified in the image header. Signed-off-by: Alberto Garcia --- block.c | 86 +++-- 1 file changed, 84 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 0b9268a48d..91fff6361f 100644 --- a/block.c +++ b/block.c @@ -2823,6 +2823,27 @@ BlockDriverState *bdrv_open(const char *filename, const char *reference, } /* + * Returns true if @child can be reached recursively from @bs + */ +static bool bdrv_recurse_has_child(BlockDriverState *bs, + BlockDriverState *child) +{ +BdrvChild *c; + +if (bs == child) { +return true; +} + +QLIST_FOREACH(c, >children, next) { +if (bdrv_recurse_has_child(c->bs, child)) { +return true; +} +} + +return false; +} + +/* * Adds a BlockDriverState to a simple queue for an atomic, transactional * reopen of multiple devices. * @@ -2957,6 +2978,7 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, QLIST_FOREACH(child, >children, next) { QDict *new_child_options; char *child_key_dot; +bool child_keep_old = keep_old_opts; /* reopen can only change the options of block devices that were * implicitly created and inherited options. For other (referenced) @@ -2965,12 +2987,28 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue, continue; } +/* If the 'backing' option is set and it points to a different + * node then it means that we want to replace the current one, + * so we shouldn't put it in the reopen queue */ +if (child->role == _backing && qdict_haskey(options, "backing")) { +const char *backing = qdict_get_try_str(options, "backing"); +if (g_strcmp0(backing, child->bs->node_name)) { +continue; +} +} + child_key_dot = g_strdup_printf("%s.", child->name); qdict_extract_subqdict(options, _child_options, child_key_dot); g_free(child_key_dot); +/* If the 'backing' option was omitted then we keep the old + * set of options */ +if (child->role == _backing && !qdict_size(new_child_options)) { +child_keep_old = true; +} + bdrv_reopen_queue_child(bs_queue, child->bs, new_child_options, 0, -child->role, options, flags, keep_old_opts); +child->role, options, flags, child_keep_old); } return bs_queue; @@ -3262,7 +3300,34 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, * the user can simply omit options which cannot be changed anyway, * so they will stay unchanged. */ -if (!qobject_is_equal(new, old)) { +/* + * Allow changing the 'backing' option. The new value can be + * either a reference to an existing node (using its node name) + * or NULL to simply detach the current backing file. + */ +if (!strcmp(entry->key, "backing")) { +switch (qobject_type(new)) { +case QTYPE_QNULL: +break; +case QTYPE_QSTRING: { +const char *str =