Re: [Qemu-block] [RFC] Intermediate block mirroring
On Mon 11 Jun 2018 02:20:06 PM CEST, Kevin Wolf wrote: > Am 01.06.2018 um 12:51 hat Alberto Garcia geschrieben: >> On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: >> >> > Were the (more or less) exact requirements of QMP blockdev-reopen >> >> > discussed? How is it different from qemu-io's "reopen" command? >> >> > What are the options that you can and can not change? >> >> >> >> I can't quite remember, I'm afraid. I think it was supposed to be >> >> pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you >> >> cannot change the driver (obviously) or probably the node name, because >> >> either would result in the node being replaced by a completely new one. >> >> >> >> Other than that, it probably depends on what the block driver supports, >> >> but ideally you should be able to change everything. >> > >> > Honestly the design of bdrv_reopen() is quite broken because of the >> > way it tries to maintain old options if they aren't specified, and >> > guesses what you might mean when you add flags to the mix. The exact >> > semantics are quite complicated and I'd rather avoid them in a stable >> > API. >> > >> > A clean QMP command would probably apply the same defaults as >> > blockdev-add, so you just get to specify the full options again. >> >> I have a prototype of this working and almost ready to be published, but >> there's a tricky thing with this part: >> >> If we want blockdev-reopen to apply the defaults for all options except >> from the ones expliclity specified by the user, then it means that we >> need to check not just the options that are present, but also the ones >> that are omitted. >> >> For example: >> >>{ "execute": "blockdev-add", >> "arguments": { "driver": "null-aio", >> "node-name": "root", >> "size": 1024 } >> >> This adds a null-aio block device with the "size" option set to 1024 >> (the default is 1 << 30). >> >> null_reopen_prepare() allows reopening that block device, but it does >> not allow changing any of its options. Attempting to change the value of >> "size" is detected by the loop that checks unhandled options at the end >> of bdrv_reopen_prepare() and returns "Cannot change the option 'size'". >> >> So far, so good. We have this generic check for all options that works >> with all drivers, so as long as we only specify options that we know >> that can be changed, everything is fine. >> >> However if we want blockdev-reopen to apply the default values for all >> omitted options, then omitting "size" would be equivalent to setting it >> to its default value (1 << 30). And if "size" cannot be changed then >> QEMU should complain unless we explicitly set "size" to 1024 again on >> reopen. >> >> This complicates things a bit, because we would go from "the options >> that can't be changed are the ones that are not handled by each driver's >> _prepare() function" to "options that are absent can also produce an >> error". > > To be honest, I think this is fine. If the user can specify the size > once (in blockdev-add), they can do it again (in blockdev-reopen). > > We just need to make sure that we don't break existing bdrv_reopen*() > calls that come from places other than the monitor. Good. I have the first revision of the series almost ready, expect it this week. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
Am 01.06.2018 um 12:51 hat Alberto Garcia geschrieben: > On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: > >> > Were the (more or less) exact requirements of QMP blockdev-reopen > >> > discussed? How is it different from qemu-io's "reopen" command? > >> > What are the options that you can and can not change? > >> > >> I can't quite remember, I'm afraid. I think it was supposed to be > >> pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you > >> cannot change the driver (obviously) or probably the node name, because > >> either would result in the node being replaced by a completely new one. > >> > >> Other than that, it probably depends on what the block driver supports, > >> but ideally you should be able to change everything. > > > > Honestly the design of bdrv_reopen() is quite broken because of the > > way it tries to maintain old options if they aren't specified, and > > guesses what you might mean when you add flags to the mix. The exact > > semantics are quite complicated and I'd rather avoid them in a stable > > API. > > > > A clean QMP command would probably apply the same defaults as > > blockdev-add, so you just get to specify the full options again. > > I have a prototype of this working and almost ready to be published, but > there's a tricky thing with this part: > > If we want blockdev-reopen to apply the defaults for all options except > from the ones expliclity specified by the user, then it means that we > need to check not just the options that are present, but also the ones > that are omitted. > > For example: > >{ "execute": "blockdev-add", > "arguments": { "driver": "null-aio", > "node-name": "root", > "size": 1024 } > > This adds a null-aio block device with the "size" option set to 1024 > (the default is 1 << 30). > > null_reopen_prepare() allows reopening that block device, but it does > not allow changing any of its options. Attempting to change the value of > "size" is detected by the loop that checks unhandled options at the end > of bdrv_reopen_prepare() and returns "Cannot change the option 'size'". > > So far, so good. We have this generic check for all options that works > with all drivers, so as long as we only specify options that we know > that can be changed, everything is fine. > > However if we want blockdev-reopen to apply the default values for all > omitted options, then omitting "size" would be equivalent to setting it > to its default value (1 << 30). And if "size" cannot be changed then > QEMU should complain unless we explicitly set "size" to 1024 again on > reopen. > > This complicates things a bit, because we would go from "the options > that can't be changed are the ones that are not handled by each driver's > _prepare() function" to "options that are absent can also produce an > error". To be honest, I think this is fine. If the user can specify the size once (in blockdev-add), they can do it again (in blockdev-reopen). We just need to make sure that we don't break existing bdrv_reopen*() calls that come from places other than the monitor. Kevin
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: >> > Were the (more or less) exact requirements of QMP blockdev-reopen >> > discussed? How is it different from qemu-io's "reopen" command? >> > What are the options that you can and can not change? >> >> I can't quite remember, I'm afraid. I think it was supposed to be >> pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you >> cannot change the driver (obviously) or probably the node name, because >> either would result in the node being replaced by a completely new one. >> >> Other than that, it probably depends on what the block driver supports, >> but ideally you should be able to change everything. > > Honestly the design of bdrv_reopen() is quite broken because of the > way it tries to maintain old options if they aren't specified, and > guesses what you might mean when you add flags to the mix. The exact > semantics are quite complicated and I'd rather avoid them in a stable > API. > > A clean QMP command would probably apply the same defaults as > blockdev-add, so you just get to specify the full options again. I have a prototype of this working and almost ready to be published, but there's a tricky thing with this part: If we want blockdev-reopen to apply the defaults for all options except from the ones expliclity specified by the user, then it means that we need to check not just the options that are present, but also the ones that are omitted. For example: { "execute": "blockdev-add", "arguments": { "driver": "null-aio", "node-name": "root", "size": 1024 } This adds a null-aio block device with the "size" option set to 1024 (the default is 1 << 30). null_reopen_prepare() allows reopening that block device, but it does not allow changing any of its options. Attempting to change the value of "size" is detected by the loop that checks unhandled options at the end of bdrv_reopen_prepare() and returns "Cannot change the option 'size'". So far, so good. We have this generic check for all options that works with all drivers, so as long as we only specify options that we know that can be changed, everything is fine. However if we want blockdev-reopen to apply the default values for all omitted options, then omitting "size" would be equivalent to setting it to its default value (1 << 30). And if "size" cannot be changed then QEMU should complain unless we explicitly set "size" to 1024 again on reopen. This complicates things a bit, because we would go from "the options that can't be changed are the ones that are not handled by each driver's _prepare() function" to "options that are absent can also produce an error". Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: > A clean QMP command would probably apply the same defaults as > blockdev-add, so you just get to specify the full options again. > >> reopen: You call it like blockdev-add, that is: >> { "node-name": "overlay", >> "backing": "new-backing" } >> In theory every option you do not specify is unchanged, so I suppose >> you can leave out e.g. the "driver" option here. > > That already doesn't work because "driver" is the discriminator for > the QAPI union, so it must be mandatory. I'd go further and reuse > BlockdevOptions, which makes everything mandatory that is mandatory > when creating the image. > > blockdev-reopen would basically mean "replace the whole set of options > of this node". There's at least one case which would need special consideration. Let's say we want to reopen an image ("top") that has a backing file ("base"). When specifying the options there are at least these alternatives for the backing image: 1) "backing" is a dict with the options of the backing image. { ..., "backing": { "driver": "qcow2", "node-name": "base", ... } In this case I suppose the backing image would be reopened with the new options. Any attempt to change the node name or the file name should fail. The graph itself doesn't change, only the options of each node. 2) "backing" is reference to another node. { ..., "backing": "base2" } In this case this should replace "base" with "base2" (which must have been opened already). 3) "backing" is null { ..., "backing": null } This is similar to (2). "base" would be removed from the backing chain and "top" would no longer have a backing image. 4) "backing" is omitted altogether In the case of 'blockdev-add' this means that the default backing image (specified in the header) is opened with the default options. What would it mean for 'blockdev-reopen' ? We keep the backing image with the current options (e.g we remove it from the BlockReopenQueue)? There are similar considerations with other children. In the case of "file" the child is mandatory and non-null, so options (3) and (4) are not possible. In the case of Quorum (where there are no default children) we should keep the children that are specified and remove all others. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote: > Am 02.05.2018 um 16:12 hat Max Reitz geschrieben: >> On 2018-05-02 15:07, Alberto Garcia wrote: >> > Were the (more or less) exact requirements of QMP blockdev-reopen >> > discussed? How is it different from qemu-io's "reopen" command? What are >> > the options that you can and can not change? >> >> I can't quite remember, I'm afraid. I think it was supposed to be >> pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you >> cannot change the driver (obviously) or probably the node name, because >> either would result in the node being replaced by a completely new one. >> >> Other than that, it probably depends on what the block driver supports, >> but ideally you should be able to change everything. > > Honestly the design of bdrv_reopen() is quite broken because of the way > it tries to maintain old options if they aren't specified, and guesses > what you might mean when you add flags to the mix. The exact semantics > are quite complicated and I'd rather avoid them in a stable API. > > A clean QMP command would probably apply the same defaults as > blockdev-add, so you just get to specify the full options again. Ok, so the end result would be more or less equivalent to "open the new block device with blockdev-add and the user-specified options, then replace the old one with the new one", but implemented with reopen instead of open + replace. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
Am 02.05.2018 um 16:12 hat Max Reitz geschrieben: > On 2018-05-02 15:07, Alberto Garcia wrote: > > Were the (more or less) exact requirements of QMP blockdev-reopen > > discussed? How is it different from qemu-io's "reopen" command? What are > > the options that you can and can not change? > > I can't quite remember, I'm afraid. I think it was supposed to be > pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you > cannot change the driver (obviously) or probably the node name, because > either would result in the node being replaced by a completely new one. > > Other than that, it probably depends on what the block driver supports, > but ideally you should be able to change everything. Honestly the design of bdrv_reopen() is quite broken because of the way it tries to maintain old options if they aren't specified, and guesses what you might mean when you add flags to the mix. The exact semantics are quite complicated and I'd rather avoid them in a stable API. A clean QMP command would probably apply the same defaults as blockdev-add, so you just get to specify the full options again. > reopen: You call it like blockdev-add, that is: > { "node-name": "overlay", > "backing": "new-backing" } > In theory every option you do not specify is unchanged, so I suppose you > can leave out e.g. the "driver" option here. That already doesn't work because "driver" is the discriminator for the QAPI union, so it must be mandatory. I'd go further and reuse BlockdevOptions, which makes everything mandatory that is mandatory when creating the image. blockdev-reopen would basically mean "replace the whole set of options of this node". > Sure, with reopen you can also respecify the whole tree if you so desire > (so you can get away without blockdev-add'ing new-backing before), but > you don't have to. It might be reasonable to forbid inline declarations for BlockdevRef in blockdev-reopen, so that always have to specify node names instead. On the other hand, maybe support for inline declarations falls out almost automatically. Kevin signature.asc Description: PGP signature
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Wed 02 May 2018 04:12:49 PM CEST, Max Reitz wrote: > On 2018-05-02 15:07, Alberto Garcia wrote: >> On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote: > But the question stands whether we need simple node replacement when > we want bdrv_reopen() anyway. In addition, we don't need just > replacement, we also need addition and removal (e.g. for backing > files or quorum children) -- and especially in the case of quorum, > that is going to be a pain (mostly naming the children). > > With bdrv_reopen(), we can just require the user to respecify all > children, so we don't run into the issue of how to name things at > least. So in this example, if you want to replace [B] with [E] you would reopen [C] specifying the new backing file? [A] <- [B] <- [C] <- [D] [E] >>> >>> You can just use child node name references like in blockdev-add. >> >> Were the (more or less) exact requirements of QMP blockdev-reopen >> discussed? How is it different from qemu-io's "reopen" command? What are >> the options that you can and can not change? > > I can't quite remember, I'm afraid. I think it was supposed to be > pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you > cannot change the driver (obviously) or probably the node name, > because either would result in the node being replaced by a completely > new one. I was testing qemu-io's reopen and it didn't seem like you could change the backing chain at all, so this would probably need to do more things. > Other than that, it probably depends on what the block driver > supports, but ideally you should be able to change everything. I agree. >> Also, you could replace an element in the backing chain using reopen, >> but can you replace a node that is not a backing image of another >> one? > > Node replacement only makes sense if the node is used as the child of > something. Otherwise it's just a blockdev-add + blockdev-del. But how do you do blockdev-del of a root node if it's being used by the guest? > But it doesn't have to be in a backing chain, of course. It would > work for Quorum children, too. Sure, I meant "child" when I said "backing image" :-) >> With QMP blockdev-replace you could use the existing blockdev-add >> command and create an arbitrary tree and then simply do >> blockdev-replace so I suppose that for some use cases at least it >> would be pretty straightforward. > > You can do the same with reopen. Just use a reference for the child > name. Yes, yes, I agree. I mean that the implementation with blockdev-replace seems much simpler. Although there are cases where blockdev-replace would probably not work, e.g. if you want to change the options of an image opened in read-write mode. In this case you cannot open the image again with the new options and replace the old with the new one because of the image locks. So you would need to reopen the image instead. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-05-02 15:07, Alberto Garcia wrote: > On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote: But the question stands whether we need simple node replacement when we want bdrv_reopen() anyway. In addition, we don't need just replacement, we also need addition and removal (e.g. for backing files or quorum children) -- and especially in the case of quorum, that is going to be a pain (mostly naming the children). With bdrv_reopen(), we can just require the user to respecify all children, so we don't run into the issue of how to name things at least. >>> >>> So in this example, if you want to replace [B] with [E] you would >>> reopen [C] specifying the new backing file? >>> >>>[A] <- [B] <- [C] <- [D] >>> >>> [E] >> >> You can just use child node name references like in blockdev-add. > > Were the (more or less) exact requirements of QMP blockdev-reopen > discussed? How is it different from qemu-io's "reopen" command? What are > the options that you can and can not change? I can't quite remember, I'm afraid. I think it was supposed to be pretty much qemu-io's reopen (so just bdrv_reopen()). I suppose you cannot change the driver (obviously) or probably the node name, because either would result in the node being replaced by a completely new one. Other than that, it probably depends on what the block driver supports, but ideally you should be able to change everything. > Also, you could replace an element in the backing chain using reopen, > but can you replace a node that is not a backing image of another one? Node replacement only makes sense if the node is used as the child of something. Otherwise it's just a blockdev-add + blockdev-del. But it doesn't have to be in a backing chain, of course. It would work for Quorum children, too. > With QMP blockdev-replace you could use the existing blockdev-add > command and create an arbitrary tree and then simply do blockdev-replace > so I suppose that for some use cases at least it would be pretty > straightforward. You can do the same with reopen. Just use a reference for the child name. Say you have a qcow2 node "overlay" with a backing child. Now you want to replace that with a different child "new-backing". For both blockdev-replace and reopen the first thing you can do is blockdev-add the new child. Then it depends: blockdev-replace: You call it with parent=overlay, child=backing, node=new-backing. reopen: You call it like blockdev-add, that is: { "node-name": "overlay", "backing": "new-backing" } In theory every option you do not specify is unchanged, so I suppose you can leave out e.g. the "driver" option here. Sure, with reopen you can also respecify the whole tree if you so desire (so you can get away without blockdev-add'ing new-backing before), but you don't have to. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote: >>> But the question stands whether we need simple node replacement when >>> we want bdrv_reopen() anyway. In addition, we don't need just >>> replacement, we also need addition and removal (e.g. for backing >>> files or quorum children) -- and especially in the case of quorum, >>> that is going to be a pain (mostly naming the children). >>> >>> With bdrv_reopen(), we can just require the user to respecify all >>> children, so we don't run into the issue of how to name things at >>> least. >> >> So in this example, if you want to replace [B] with [E] you would >> reopen [C] specifying the new backing file? >> >>[A] <- [B] <- [C] <- [D] >> >> [E] > > You can just use child node name references like in blockdev-add. Were the (more or less) exact requirements of QMP blockdev-reopen discussed? How is it different from qemu-io's "reopen" command? What are the options that you can and can not change? Also, you could replace an element in the backing chain using reopen, but can you replace a node that is not a backing image of another one? With QMP blockdev-replace you could use the existing blockdev-add command and create an arbitrary tree and then simply do blockdev-replace so I suppose that for some use cases at least it would be pretty straightforward. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-04-25 15:42, Alberto Garcia wrote: > On Wed 25 Apr 2018 03:06:34 PM CEST, Max Reitz wrote: > One other way is to have a more generic replace-node command > which would call bdrv_replace_node(), but I don't know if we > want to expose that and I don't have any other use case for it > at the moment. I think we do want to expose that. As far as I'm informed, for graph manipulation we want a command that can replace nodes (including replacing nothing by a new node or replacing an existing node by nothing, if the parent supports that). >>> >>> Was there any discussion about this in the mailing list? >>> (proposed name for this function, features, etc.) >> >> Well, there is x-blockdev-change. But we probably want to expose >> bdrv_reopen() in the long run. > > Exposing bdrv_reopen() itself shouldn't be too much work (it takes > just two node names, or am I missing something?). >>> >>> I just realized that I meant "Exposing bdrv_replace_node()" here. >>> So from my perspective... If you think it's easy, why don't you try it and then we'll see? *cough* >>> >>> I'm doing it :-) >>> > There's still the question of how to update the backing file string > (on the overlay). stream and commit do it automatically, but if we do > bdrv_reopen() or the aforementioned modification to blockdev-mirror >>> >>> bdrv_replace_node() again >> >> But the question stands whether we need simple node replacement when >> we want bdrv_reopen() anyway. In addition, we don't need just >> replacement, we also need addition and removal (e.g. for backing files >> or quorum children) -- and especially in the case of quorum, that is >> going to be a pain (mostly naming the children). >> >> With bdrv_reopen(), we can just require the user to respecify all >> children, so we don't run into the issue of how to name things at >> least. > > So in this example, if you want to replace [B] with [E] you would reopen > [C] specifying the new backing file? > >[A] <- [B] <- [C] <- [D] > > [E] You can just use child node name references like in blockdev-add. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Wed 25 Apr 2018 03:06:34 PM CEST, Max Reitz wrote: One other way is to have a more generic replace-node command which would call bdrv_replace_node(), but I don't know if we want to expose that and I don't have any other use case for it at the moment. >>> >>> I think we do want to expose that. As far as I'm informed, for >>> graph manipulation we want a command that can replace nodes >>> (including replacing nothing by a new node or replacing an >>> existing node by nothing, if the parent supports that). >> >> Was there any discussion about this in the mailing list? >> (proposed name for this function, features, etc.) > > Well, there is x-blockdev-change. But we probably want to expose > bdrv_reopen() in the long run. Exposing bdrv_reopen() itself shouldn't be too much work (it takes just two node names, or am I missing something?). >> >> I just realized that I meant "Exposing bdrv_replace_node()" here. >> >>> So from my perspective... If you think it's easy, why don't you try >>> it and then we'll see? *cough* >> >> I'm doing it :-) >> There's still the question of how to update the backing file string (on the overlay). stream and commit do it automatically, but if we do bdrv_reopen() or the aforementioned modification to blockdev-mirror >> >> bdrv_replace_node() again > > But the question stands whether we need simple node replacement when > we want bdrv_reopen() anyway. In addition, we don't need just > replacement, we also need addition and removal (e.g. for backing files > or quorum children) -- and especially in the case of quorum, that is > going to be a pain (mostly naming the children). > > With bdrv_reopen(), we can just require the user to respecify all > children, so we don't run into the issue of how to name things at > least. So in this example, if you want to replace [B] with [E] you would reopen [C] specifying the new backing file? [A] <- [B] <- [C] <- [D] [E] Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-04-25 14:58, Alberto Garcia wrote: > On Fri 20 Apr 2018 03:13:49 PM CEST, Max Reitz wrote: >>> One other way is to have a more generic replace-node command >>> which would call bdrv_replace_node(), but I don't know if we want >>> to expose that and I don't have any other use case for it at the >>> moment. >> >> I think we do want to expose that. As far as I'm informed, for >> graph manipulation we want a command that can replace nodes >> (including replacing nothing by a new node or replacing an >> existing node by nothing, if the parent supports that). > > Was there any discussion about this in the mailing list? (proposed > name for this function, features, etc.) Well, there is x-blockdev-change. But we probably want to expose bdrv_reopen() in the long run. >>> >>> Exposing bdrv_reopen() itself shouldn't be too much work (it takes >>> just two node names, or am I missing something?). > > I just realized that I meant "Exposing bdrv_replace_node()" here. > >> So from my perspective... If you think it's easy, why don't you try >> it and then we'll see? *cough* > > I'm doing it :-) > >>> There's still the question of how to update the backing file string >>> (on the overlay). stream and commit do it automatically, but if we do >>> bdrv_reopen() or the aforementioned modification to blockdev-mirror > > bdrv_replace_node() again But the question stands whether we need simple node replacement when we want bdrv_reopen() anyway. In addition, we don't need just replacement, we also need addition and removal (e.g. for backing files or quorum children) -- and especially in the case of quorum, that is going to be a pain (mostly naming the children). With bdrv_reopen(), we can just require the user to respecify all children, so we don't run into the issue of how to name things at least. (Other things we we'd want bdrv_reopen() are listed in the meeting notes.) >>> we'd need to do it explicitly with change-backing-file, or extend the >>> API to allow for that. >> >> Well, the December notes do say that we probably want an option to >> specify the target's filename which is what will be written into the >> overlays of @to_replace as their backing file string, so I think >> extending the API should be fine. > > Yes, but how do you find out what the overlays are (without an > additional parameter, that is)? I thought we didn't want to support > walking the backing chain in reverse direction. We already have an .update_filename() BdrvChildRole callback, though. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Fri 20 Apr 2018 03:13:49 PM CEST, Max Reitz wrote: >> One other way is to have a more generic replace-node command >> which would call bdrv_replace_node(), but I don't know if we want >> to expose that and I don't have any other use case for it at the >> moment. > > I think we do want to expose that. As far as I'm informed, for > graph manipulation we want a command that can replace nodes > (including replacing nothing by a new node or replacing an > existing node by nothing, if the parent supports that). Was there any discussion about this in the mailing list? (proposed name for this function, features, etc.) >>> >>> Well, there is x-blockdev-change. But we probably want to expose >>> bdrv_reopen() in the long run. >> >> Exposing bdrv_reopen() itself shouldn't be too much work (it takes >> just two node names, or am I missing something?). I just realized that I meant "Exposing bdrv_replace_node()" here. > So from my perspective... If you think it's easy, why don't you try > it and then we'll see? *cough* I'm doing it :-) >> There's still the question of how to update the backing file string >> (on the overlay). stream and commit do it automatically, but if we do >> bdrv_reopen() or the aforementioned modification to blockdev-mirror bdrv_replace_node() again >> we'd need to do it explicitly with change-backing-file, or extend the >> API to allow for that. > > Well, the December notes do say that we probably want an option to > specify the target's filename which is what will be written into the > overlays of @to_replace as their backing file string, so I think > extending the API should be fine. Yes, but how do you find out what the overlays are (without an additional parameter, that is)? I thought we didn't want to support walking the backing chain in reverse direction. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-04-18 17:34, Alberto Garcia wrote: > On Mon 16 Apr 2018 05:15:21 PM CEST, Max Reitz wrote: >> On 2018-04-16 16:59, Alberto Garcia wrote: >>> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote: On 2018-04-12 19:07, Alberto Garcia wrote: > Hello, > > I mentioned this some time ago, but I'd like to retake it now: I'm > checking how to copy arbitrary nodes on a backing chain, so if I have > e.g. > >[A] <- [B] <- [C] <- [D] > > I'd like to end up with > >[A] <- [E] <- [C] <- [D] > > where [E] is a copy of [B]. The most obvious use case is to move [B] > to a different storage backend. > > At the moment we can already copy [B] into [E] (outside QEMU) and then > do > >change-backing-file device=[D] image-node-name=[C] backing-file=[E] > > However this only changes the image on disk and the bs->backing_file > string in memory. QEMU keeps using the [B] BlockDriverState, and we > need to make it switch to [E]. > > One possible way to do this would be to modify blockdev-mirror so the > source image can be located anywhere on a chain. Currently there's > bs->backing_blocker preventing that, plus qmp_blockdev_mirror() > refuses to take any non-root source node. Other than permission > changes I don't think the algorithm itself needs any additional > modification, although I suppose we'd like to change the backing file > as part of the same job, and that would require API changes. Another way would be to write blockdev-copy. :-) >>> >>> Something like blockdev-backup but without copying data from backing >>> files? blockdev-mirror already allows configuring that using the >>> MirrorSyncMode parameter. >> >> No, a block job that unites blockdev-backup, blockdev-mirror, and >> block-commit. > > Yeah, I was reading again the notes from December. This would then unify > all those existing commands. > > Is there any other plan or roadmap for this command, or was just an idea > that was discussed as nice to have? Mostly the latter, I think. Well, there definitely isn't a roadmap and there is no plan apart from what's in the December notes. > Also, does that mean that the old > commands should not be changed now? I'm asking because there's a big > difference between updating blockdev-mirror to allow non-root source > nodes and creating this new command that would support all features of > the old ones :-) No, changing old commands is OK. We just should think about how to incorporate those changes into blockdev-copy (so don't introduce some way of doing the same thing with two different interfaces to both mirror and backup). Especially adding things to mirror should be OK, because I suppose mirror is going to be our blockdev-copy (and we'll just extend it to cover backup and non-active commit). Of course, things usually don't get done because they're "nice to have" but because someone actually needs them. So that's the reason why I mentioned blockdev-copy. O:-) >> I think we wanted to exclude streaming, because that works >> differently. But, well, streaming would be just installing a COR >> filter node and doing a blockdev-copy to null-co://, right? :-) > > Yeah I suppose, but I don't think you can make the filter step part of > the blockdev-copy API. No, definitely not. Or, well, actually... If you could start block jobs by installing their filter nodes via blockdev-add, then maybe you could in a sense, but let's not get into that rabbit hole now. O:-) [...] > One other way is to have a more generic replace-node command which > would call bdrv_replace_node(), but I don't know if we want to > expose that and I don't have any other use case for it at the > moment. I think we do want to expose that. As far as I'm informed, for graph manipulation we want a command that can replace nodes (including replacing nothing by a new node or replacing an existing node by nothing, if the parent supports that). >>> >>> Was there any discussion about this in the mailing list? (proposed >>> name for this function, features, etc.) >> >> Well, there is x-blockdev-change. But we probably want to expose >> bdrv_reopen() in the long run. > > Exposing bdrv_reopen() itself shouldn't be too much work (it takes just > two node names, or am I missing something?). I suppose Kevin knows more about this. I guess there may be things in the block layer that cannot cope with an arbitrary reopen from the user, but that's the only thing that would come to my mind. So from my perspective... If you think it's easy, why don't you try it and then we'll see? *cough* > There's still the question of how to update the backing file string (on > the overlay). stream and commit do it automatically, but if we do > bdrv_reopen() or the aforementioned modification to blockdev-mirror we'd > need to do it explicitly wi
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Mon 16 Apr 2018 05:15:21 PM CEST, Max Reitz wrote: > On 2018-04-16 16:59, Alberto Garcia wrote: >> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote: >>> On 2018-04-12 19:07, Alberto Garcia wrote: Hello, I mentioned this some time ago, but I'd like to retake it now: I'm checking how to copy arbitrary nodes on a backing chain, so if I have e.g. [A] <- [B] <- [C] <- [D] I'd like to end up with [A] <- [E] <- [C] <- [D] where [E] is a copy of [B]. The most obvious use case is to move [B] to a different storage backend. At the moment we can already copy [B] into [E] (outside QEMU) and then do change-backing-file device=[D] image-node-name=[C] backing-file=[E] However this only changes the image on disk and the bs->backing_file string in memory. QEMU keeps using the [B] BlockDriverState, and we need to make it switch to [E]. One possible way to do this would be to modify blockdev-mirror so the source image can be located anywhere on a chain. Currently there's bs->backing_blocker preventing that, plus qmp_blockdev_mirror() refuses to take any non-root source node. Other than permission changes I don't think the algorithm itself needs any additional modification, although I suppose we'd like to change the backing file as part of the same job, and that would require API changes. >>> >>> Another way would be to write blockdev-copy. :-) >> >> Something like blockdev-backup but without copying data from backing >> files? blockdev-mirror already allows configuring that using the >> MirrorSyncMode parameter. > > No, a block job that unites blockdev-backup, blockdev-mirror, and > block-commit. Yeah, I was reading again the notes from December. This would then unify all those existing commands. Is there any other plan or roadmap for this command, or was just an idea that was discussed as nice to have? Also, does that mean that the old commands should not be changed now? I'm asking because there's a big difference between updating blockdev-mirror to allow non-root source nodes and creating this new command that would support all features of the old ones :-) > I think we wanted to exclude streaming, because that works > differently. But, well, streaming would be just installing a COR > filter node and doing a blockdev-copy to null-co://, right? :-) Yeah I suppose, but I don't think you can make the filter step part of the blockdev-copy API. > I just now remembered a bitmap issue, but that probably shouldn't stop > you. There is an issue with bitmaps on nodes with shared-writing > children. > > Imagine you have some qcow2 node or whatever, and you have a guest > running on it. Then you attach a filter to it (which allows shared > writing) and put a mirror job on top of the filter. Now whenever the > guest writes to the qcow2 node, the bitmap the mirror job installed on > the filter won't reflect that. So that's an issue, but I think that > case isn't forbidden by the current permission system, so we're > already screwed there. > > So on the contrary, I'd suspect that running mirror inside of some > backing chain or whatever may be less problematic than running it on > top, actually... Yeah I don't think there's a problem there. One other way is to have a more generic replace-node command which would call bdrv_replace_node(), but I don't know if we want to expose that and I don't have any other use case for it at the moment. >>> >>> I think we do want to expose that. As far as I'm informed, for >>> graph manipulation we want a command that can replace nodes >>> (including replacing nothing by a new node or replacing an existing >>> node by nothing, if the parent supports that). >> >> Was there any discussion about this in the mailing list? (proposed >> name for this function, features, etc.) > > Well, there is x-blockdev-change. But we probably want to expose > bdrv_reopen() in the long run. Exposing bdrv_reopen() itself shouldn't be too much work (it takes just two node names, or am I missing something?). There's still the question of how to update the backing file string (on the overlay). stream and commit do it automatically, but if we do bdrv_reopen() or the aforementioned modification to blockdev-mirror we'd need to do it explicitly with change-backing-file, or extend the API to allow for that. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-04-16 16:59, Alberto Garcia wrote: > On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote: >> On 2018-04-12 19:07, Alberto Garcia wrote: >>> Hello, >>> >>> I mentioned this some time ago, but I'd like to retake it now: I'm >>> checking how to copy arbitrary nodes on a backing chain, so if I have >>> e.g. >>> >>>[A] <- [B] <- [C] <- [D] >>> >>> I'd like to end up with >>> >>>[A] <- [E] <- [C] <- [D] >>> >>> where [E] is a copy of [B]. The most obvious use case is to move [B] >>> to a different storage backend. >>> >>> At the moment we can already copy [B] into [E] (outside QEMU) and then >>> do >>> >>>change-backing-file device=[D] image-node-name=[C] backing-file=[E] >>> >>> However this only changes the image on disk and the bs->backing_file >>> string in memory. QEMU keeps using the [B] BlockDriverState, and we >>> need to make it switch to [E]. >>> >>> One possible way to do this would be to modify blockdev-mirror so the >>> source image can be located anywhere on a chain. Currently there's >>> bs->backing_blocker preventing that, plus qmp_blockdev_mirror() >>> refuses to take any non-root source node. Other than permission >>> changes I don't think the algorithm itself needs any additional >>> modification, although I suppose we'd like to change the backing file >>> as part of the same job, and that would require API changes. >> >> Another way would be to write blockdev-copy. :-) > > Something like blockdev-backup but without copying data from backing > files? blockdev-mirror already allows configuring that using the > MirrorSyncMode parameter. No, a block job that unites blockdev-backup, blockdev-mirror, and block-commit. Active commit already is just mirror, non-active commit probably can easily be implemented with mirror (we just need an auto-complete), and it should be possible to add a backup mode to mirror (like active mirroring). So we'd just have a single blockdev-copy job that can simply... Copy data from any node to any other. I think we wanted to exclude streaming, because that works differently. But, well, streaming would be just installing a COR filter node and doing a blockdev-copy to null-co://, right? :-) >> I don't think there should be a limitation which nodes can be used as >> the source of blockdev-mirror, and I think the hardest part about >> lifting that limitation is about thinking what might break... Maybe >> we should just drop the limitation and deal with the fallout later? I >> can't imagine a single node configuration where we'd want to disallow >> copying off a node, so I suppose if that breaks in some edge case that >> wouldn't be something we'd have to disallow again but we'd just have >> to fix it. > > I can't think of any problem either. The important question is what to > do if the source node is read/write, but that's what blockdev-mirror is > already about. In all other cases the source node is going to be read > only, isn't it? I guess so. I just now remembered a bitmap issue, but that probably shouldn't stop you. There is an issue with bitmaps on nodes with shared-writing children. Imagine you have some qcow2 node or whatever, and you have a guest running on it. Then you attach a filter to it (which allows shared writing) and put a mirror job on top of the filter. Now whenever the guest writes to the qcow2 node, the bitmap the mirror job installed on the filter won't reflect that. So that's an issue, but I think that case isn't forbidden by the current permission system, so we're already screwed there. So on the contrary, I'd suspect that running mirror inside of some backing chain or whatever may be less problematic than running it on top, actually... >>> One other way is to have a more generic replace-node command which >>> would call bdrv_replace_node(), but I don't know if we want to expose >>> that and I don't have any other use case for it at the moment. >> >> I think we do want to expose that. As far as I'm informed, for graph >> manipulation we want a command that can replace nodes (including >> replacing nothing by a new node or replacing an existing node by >> nothing, if the parent supports that). > > Was there any discussion about this in the mailing list? (proposed name > for this function, features, etc.) Well, there is x-blockdev-change. But we probably want to expose bdrv_reopen() in the long run. http://lists.nongnu.org/archive/html/qemu-block/2017-12/msg00522.html is what I can offer you. That also has some info on blockdev-copy and the bitmap issue ("omniscient bitmaps" is the keyword). Max >> Also, well, there is blockdev-mirror already has a @replaces >> parameter, right? That was supposed to work with quorum, but it seems >> obvious to use it here, too. > > Yes, it could be used for this. > > Berto > signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote: > On 2018-04-12 19:07, Alberto Garcia wrote: >> Hello, >> >> I mentioned this some time ago, but I'd like to retake it now: I'm >> checking how to copy arbitrary nodes on a backing chain, so if I have >> e.g. >> >>[A] <- [B] <- [C] <- [D] >> >> I'd like to end up with >> >>[A] <- [E] <- [C] <- [D] >> >> where [E] is a copy of [B]. The most obvious use case is to move [B] >> to a different storage backend. >> >> At the moment we can already copy [B] into [E] (outside QEMU) and then >> do >> >>change-backing-file device=[D] image-node-name=[C] backing-file=[E] >> >> However this only changes the image on disk and the bs->backing_file >> string in memory. QEMU keeps using the [B] BlockDriverState, and we >> need to make it switch to [E]. >> >> One possible way to do this would be to modify blockdev-mirror so the >> source image can be located anywhere on a chain. Currently there's >> bs->backing_blocker preventing that, plus qmp_blockdev_mirror() >> refuses to take any non-root source node. Other than permission >> changes I don't think the algorithm itself needs any additional >> modification, although I suppose we'd like to change the backing file >> as part of the same job, and that would require API changes. > > Another way would be to write blockdev-copy. :-) Something like blockdev-backup but without copying data from backing files? blockdev-mirror already allows configuring that using the MirrorSyncMode parameter. > I don't think there should be a limitation which nodes can be used as > the source of blockdev-mirror, and I think the hardest part about > lifting that limitation is about thinking what might break... Maybe > we should just drop the limitation and deal with the fallout later? I > can't imagine a single node configuration where we'd want to disallow > copying off a node, so I suppose if that breaks in some edge case that > wouldn't be something we'd have to disallow again but we'd just have > to fix it. I can't think of any problem either. The important question is what to do if the source node is read/write, but that's what blockdev-mirror is already about. In all other cases the source node is going to be read only, isn't it? >> One other way is to have a more generic replace-node command which >> would call bdrv_replace_node(), but I don't know if we want to expose >> that and I don't have any other use case for it at the moment. > > I think we do want to expose that. As far as I'm informed, for graph > manipulation we want a command that can replace nodes (including > replacing nothing by a new node or replacing an existing node by > nothing, if the parent supports that). Was there any discussion about this in the mailing list? (proposed name for this function, features, etc.) > Also, well, there is blockdev-mirror already has a @replaces > parameter, right? That was supposed to work with quorum, but it seems > obvious to use it here, too. Yes, it could be used for this. Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On 2018-04-12 19:07, Alberto Garcia wrote: > Hello, > > I mentioned this some time ago, but I'd like to retake it now: I'm > checking how to copy arbitrary nodes on a backing chain, so if I have > e.g. > >[A] <- [B] <- [C] <- [D] > > I'd like to end up with > >[A] <- [E] <- [C] <- [D] > > where [E] is a copy of [B]. The most obvious use case is to move [B] > to a different storage backend. > > At the moment we can already copy [B] into [E] (outside QEMU) and then > do > >change-backing-file device=[D] image-node-name=[C] backing-file=[E] > > However this only changes the image on disk and the bs->backing_file > string in memory. QEMU keeps using the [B] BlockDriverState, and we > need to make it switch to [E]. > > One possible way to do this would be to modify blockdev-mirror so the > source image can be located anywhere on a chain. Currently there's > bs->backing_blocker preventing that, plus qmp_blockdev_mirror() > refuses to take any non-root source node. Other than permission > changes I don't think the algorithm itself needs any additional > modification, although I suppose we'd like to change the backing file > as part of the same job, and that would require API changes. Another way would be to write blockdev-copy. :-) I don't think there should be a limitation which nodes can be used as the source of blockdev-mirror, and I think the hardest part about lifting that limitation is about thinking what might break... Maybe we should just drop the limitation and deal with the fallout later? I can't imagine a single node configuration where we'd want to disallow copying off a node, so I suppose if that breaks in some edge case that wouldn't be something we'd have to disallow again but we'd just have to fix it. > One other way is to have a more generic replace-node command which > would call bdrv_replace_node(), but I don't know if we want to expose > that and I don't have any other use case for it at the moment. I think we do want to expose that. As far as I'm informed, for graph manipulation we want a command that can replace nodes (including replacing nothing by a new node or replacing an existing node by nothing, if the parent supports that). Also, well, there is blockdev-mirror already has a @replaces parameter, right? That was supposed to work with quorum, but it seems obvious to use it here, too. Max
[Qemu-block] [RFC] Intermediate block mirroring
Hello, I mentioned this some time ago, but I'd like to retake it now: I'm checking how to copy arbitrary nodes on a backing chain, so if I have e.g. [A] <- [B] <- [C] <- [D] I'd like to end up with [A] <- [E] <- [C] <- [D] where [E] is a copy of [B]. The most obvious use case is to move [B] to a different storage backend. At the moment we can already copy [B] into [E] (outside QEMU) and then do change-backing-file device=[D] image-node-name=[C] backing-file=[E] However this only changes the image on disk and the bs->backing_file string in memory. QEMU keeps using the [B] BlockDriverState, and we need to make it switch to [E]. One possible way to do this would be to modify blockdev-mirror so the source image can be located anywhere on a chain. Currently there's bs->backing_blocker preventing that, plus qmp_blockdev_mirror() refuses to take any non-root source node. Other than permission changes I don't think the algorithm itself needs any additional modification, although I suppose we'd like to change the backing file as part of the same job, and that would require API changes. One other way is to have a more generic replace-node command which would call bdrv_replace_node(), but I don't know if we want to expose that and I don't have any other use case for it at the moment. Opinions, comments? Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Thu, Apr 09, 2015 at 11:39:28AM +0100, Stefan Hajnoczi wrote: > > The goal would be to convert this: > > > >[A] -> [B] -> [C] -> [D] > > > > into this: > > > >[A] -> [B] -> [X] -> [D] > > > > where [D] is the active image and [X] would be a copy of [C]. The > > latter would be unlinked from the chain. > > > > A use case would be to move disk images across different storage > > backends. > > The simple solution to that problem is: > > Assumption: backing files are read only. (True in most cases.) > > 1. Copy the backing files using cp(1) or another method. > 2. Issue QMP 'change-backing-file' command so that [D] uses [X] instead >of [C]. > > So it can be done today already. So do you think it would be better to implement this somewhere else? The code that I have for QEMU is quite simple, the actual algorithm doesn't change, it only needs to do the changing of backing files in mirror_exit(). Berto
Re: [Qemu-block] [RFC] Intermediate block mirroring
On Thu, Apr 02, 2015 at 03:28:57PM +0200, Alberto Garcia wrote: > Hi, > > I'm interested in adding the possibility to mirror an intermediate > node in a disk image chain, but I would like to have some feedback > before sending any patches. > > The goal would be to convert this: > >[A] -> [B] -> [C] -> [D] > > into this: > >[A] -> [B] -> [X] -> [D] > > where [D] is the active image and [X] would be a copy of [C]. The > latter would be unlinked from the chain. > > A use case would be to move disk images across different storage > backends. The simple solution to that problem is: Assumption: backing files are read only. (True in most cases.) 1. Copy the backing files using cp(1) or another method. 2. Issue QMP 'change-backing-file' command so that [D] uses [X] instead of [C]. So it can be done today already. Note that the management tool needs to enforce the assumption since QEMU cannot know whether other programs or QEMU instances are modifying one of the backing files. Stefan pgpY_SXfLhXbM.pgp Description: PGP signature
[Qemu-block] [RFC] Intermediate block mirroring
Hi, I'm interested in adding the possibility to mirror an intermediate node in a disk image chain, but I would like to have some feedback before sending any patches. The goal would be to convert this: [A] -> [B] -> [C] -> [D] into this: [A] -> [B] -> [X] -> [D] where [D] is the active image and [X] would be a copy of [C]. The latter would be unlinked from the chain. A use case would be to move disk images across different storage backends. My idea is to extend the drive-mirror command. Similar to what we discussed in the case of the intermediate block streaming, I can reuse the 'device' parameter to refer to a node name. So the API doesn't need any changes other than the extended semantics for this parameter. One difference with the current functionality is that once the block job is completed, the node above the mirrored one would have to change its backing image to point to the new one. One solution is to iterate over all devices (bdrv_next()) and check which ones are connected directly or indirectly to the mirrored node (bdrv_find_overlay()). drive-mirror has three different sync modes: top, full and none. This would be the chain from the example using each one of these modes: top: [A] -> [B] -> [X] -> [D] full: [X] -> [D] none: [A] -> [B] -> [C] -> [X] -> [D] My understanding is that in the 'sync=full' case, [A] and [B] would also need to be blocked during the operation since they are going to disappear from the chain. I have some code and in principle everything seems to be working fine, but I'd like to test it a bit more. What's anyway your opinion about this proposal? Thanks, Berto