Re: [Qemu-devel] Image streaming and live block copy
Am 26.06.2011 14:50, schrieb Dor Laor: On 06/24/2011 12:28 PM, Stefan Hajnoczi wrote: On Sun, Jun 19, 2011 at 5:02 PM, Dor Laordl...@redhat.com wrote: On 06/18/2011 12:17 PM, Stefan Hajnoczi wrote: On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczistefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosattimtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. There is an easier work around for image streaming using live block copy (mirror approach): - Create the dst VM as an empty new COW image of the src (even over the non shared storage, use some protocol tag for the src location like nbd://original_path/src_file_name) Migration and non-shared storage has come up a few times in this discussion. But both live block copy and image streaming need access to source and destination - they do not have explicit non-shared storage support. I think non-shared and using nbd:// is orthogonal to the discussion. Just want to check that you agree and I haven't missed something? You're right, I was mainly trying to be as general as possible. I think there is one important point to consider for using NBD: You always see a single image on the NBD client, which could in fact have a backing file chain on the source. So bdrv_is_allocated() doesn't work over NBD, which becomes interesting when you want to share a backing file with the new copy. Kevin
Re: [Qemu-devel] Image streaming and live block copy
On 06/27/2011 10:48 AM, Kevin Wolf wrote: Am 26.06.2011 14:50, schrieb Dor Laor: On 06/24/2011 12:28 PM, Stefan Hajnoczi wrote: On Sun, Jun 19, 2011 at 5:02 PM, Dor Laordl...@redhat.com wrote: On 06/18/2011 12:17 PM, Stefan Hajnoczi wrote: On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczistefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosattimtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. There is an easier work around for image streaming using live block copy (mirror approach): - Create the dst VM as an empty new COW image of the src (even over the non shared storage, use some protocol tag for the src location like nbd://original_path/src_file_name) Migration and non-shared storage has come up a few times in this discussion. But both live block copy and image streaming need access to source and destination - they do not have explicit non-shared storage support. I think non-shared and using nbd:// is orthogonal to the discussion. Just want to check that you agree and I haven't missed something? You're right, I was mainly trying to be as general as possible. I think there is one important point to consider for using NBD: You always see a single image on the NBD client, which could in fact have a backing file chain on the source. So bdrv_is_allocated() doesn't work over NBD, which becomes interesting when you want to share a backing file with the new copy. What is we'll use iscsi? Will the client have a matching iscsi verb to detect whether a certain block is in fact a unallocated and we'll be able to use this for our benefit? Of course it will make the non shared storage case more complex. Kevin
Re: [Qemu-devel] Image streaming and live block copy
On 06/24/2011 12:28 PM, Stefan Hajnoczi wrote: On Sun, Jun 19, 2011 at 5:02 PM, Dor Laordl...@redhat.com wrote: On 06/18/2011 12:17 PM, Stefan Hajnoczi wrote: On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczistefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosattimtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. There is an easier work around for image streaming using live block copy (mirror approach): - Create the dst VM as an empty new COW image of the src (even over the non shared storage, use some protocol tag for the src location like nbd://original_path/src_file_name) Migration and non-shared storage has come up a few times in this discussion. But both live block copy and image streaming need access to source and destination - they do not have explicit non-shared storage support. I think non-shared and using nbd:// is orthogonal to the discussion. Just want to check that you agree and I haven't missed something? You're right, I was mainly trying to be as general as possible. - Run the usual live block copy of src image (master read only OS template) to the destination. - Use a -src-read-only flag that will make the copy skip the src writing. Voila - no duplicate writes, crash recovery works since we reference the original image and we share the code. So the running guest is using the destination image since the source is read-only? Yes. The image will run on the destination not because of the source in RO state but because that is what we look for. This approach makes sense to me. Stefan
Re: [Qemu-devel] Image streaming gives live block copy for free (and vice versa)
On 06/24/2011 12:35 PM, Stefan Hajnoczi wrote: On Sun, Jun 19, 2011 at 5:12 PM, Dor Laordl...@redhat.com wrote: On 06/17/2011 08:53 AM, Stefan Hajnoczi wrote: Perhaps someone has been saying this all along but I want to spell it out that image streaming and live block copy are equivalent in theory. I just realized this last night. In practice we might choose one implementation or two different ones for performance reasons. If any of these are unclear please let me know and I'll try to post diagrams. Live block copy using image streaming - 1. Create the destination image file and use the source image as the backing file. 2. Quiesce I/O and pause VM. 3. Switch to destination image. 4. Resume VM. 5. Start streaming destination image in order to copy source image data into destination file. 6. Streaming completes and disables the backing file, leaving the live copied destination image that no longer depends on the source image. Well streaming is copying just using post-copy approach. Both pre and post are required: - post-copy (aka streaming) for fast live block migration of the VM for cpu load balance. - pre-copy (aka live block copy) If you manage to get a network outage between the source and the destination storage systems, you will manage to keep running the VM on the source side. Out of interest, are you brainstorming using live block copy and image streaming for pre- and post-copy live migration or do you have concrete plans to update libvirt to use these mechanisms? Adding Eric, Daniel and Dave from the libvirt team. So far I've been focussing on the fast provisioning use case where image streaming helps, but eventually I would like to improve the state of storage migration too. In general it seems to me that we need to reuse the design so fast provision, live block migration, live snapshot (merge) and non shared storage option will all use the same code and the same interfaces. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Sun, Jun 19, 2011 at 5:02 PM, Dor Laor dl...@redhat.com wrote: On 06/18/2011 12:17 PM, Stefan Hajnoczi wrote: On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczistefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosattimtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. There is an easier work around for image streaming using live block copy (mirror approach): - Create the dst VM as an empty new COW image of the src (even over the non shared storage, use some protocol tag for the src location like nbd://original_path/src_file_name) Migration and non-shared storage has come up a few times in this discussion. But both live block copy and image streaming need access to source and destination - they do not have explicit non-shared storage support. I think non-shared and using nbd:// is orthogonal to the discussion. Just want to check that you agree and I haven't missed something? - Run the usual live block copy of src image (master read only OS template) to the destination. - Use a -src-read-only flag that will make the copy skip the src writing. Voila - no duplicate writes, crash recovery works since we reference the original image and we share the code. So the running guest is using the destination image since the source is read-only? This approach makes sense to me. Stefan
Re: [Qemu-devel] Image streaming gives live block copy for free (and vice versa)
On Sun, Jun 19, 2011 at 5:12 PM, Dor Laor dl...@redhat.com wrote: On 06/17/2011 08:53 AM, Stefan Hajnoczi wrote: Perhaps someone has been saying this all along but I want to spell it out that image streaming and live block copy are equivalent in theory. I just realized this last night. In practice we might choose one implementation or two different ones for performance reasons. If any of these are unclear please let me know and I'll try to post diagrams. Live block copy using image streaming - 1. Create the destination image file and use the source image as the backing file. 2. Quiesce I/O and pause VM. 3. Switch to destination image. 4. Resume VM. 5. Start streaming destination image in order to copy source image data into destination file. 6. Streaming completes and disables the backing file, leaving the live copied destination image that no longer depends on the source image. Well streaming is copying just using post-copy approach. Both pre and post are required: - post-copy (aka streaming) for fast live block migration of the VM for cpu load balance. - pre-copy (aka live block copy) If you manage to get a network outage between the source and the destination storage systems, you will manage to keep running the VM on the source side. Out of interest, are you brainstorming using live block copy and image streaming for pre- and post-copy live migration or do you have concrete plans to update libvirt to use these mechanisms? So far I've been focussing on the fast provisioning use case where image streaming helps, but eventually I would like to improve the state of storage migration too. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On 06/18/2011 12:17 PM, Stefan Hajnoczi wrote: On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczistefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosattimtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. There is an easier work around for image streaming using live block copy (mirror approach): - Create the dst VM as an empty new COW image of the src (even over the non shared storage, use some protocol tag for the src location like nbd://original_path/src_file_name) - Run the usual live block copy of src image (master read only OS template) to the destination. - Use a -src-read-only flag that will make the copy skip the src writing. Voila - no duplicate writes, crash recovery works since we reference the original image and we share the code. Compare that to image streaming, which works across crash and doesn't duplicate I/O. Stefan
Re: [Qemu-devel] Image streaming gives live block copy for free (and vice versa)
On 06/17/2011 08:53 AM, Stefan Hajnoczi wrote: Perhaps someone has been saying this all along but I want to spell it out that image streaming and live block copy are equivalent in theory. I just realized this last night. In practice we might choose one implementation or two different ones for performance reasons. If any of these are unclear please let me know and I'll try to post diagrams. Live block copy using image streaming - 1. Create the destination image file and use the source image as the backing file. 2. Quiesce I/O and pause VM. 3. Switch to destination image. 4. Resume VM. 5. Start streaming destination image in order to copy source image data into destination file. 6. Streaming completes and disables the backing file, leaving the live copied destination image that no longer depends on the source image. Well streaming is copying just using post-copy approach. Both pre and post are required: - post-copy (aka streaming) for fast live block migration of the VM for cpu load balance. - pre-copy (aka live block copy) If you manage to get a network outage between the source and the destination storage systems, you will manage to keep running the VM on the source side. Like Kevin noted, the implementation and the functionality of these two features are almost identical, that's why we can have a single implementation with some options to enable pre or post copy. There is no need for dirty block tracking because image streaming will only copy over unallocated clusters. There are no phases to the copy process because the guest is writing to the destination file already and does not dirty the source file. Implementing live block copy without switch is also possible using the block-mirror driver to update both the source image and the destination image. This would require making the backing file writable though. Image streaming using live block copy - 1. Create destination image. 2. Start a live block copy to copy the source image data into destination file. 3. When live block copy reaches switch state, quiesce I/O and pause VM. 4. Switch to destination image, which now contains the flattened source image. 5. Resume VM. 7. Delete source image. This approach copies the contents of the source image (and its backing file) into the destination file. You need to have two times the disk space, since towards the end of live block copy you have two copies of the image. The QED image streaming patches I posted do not need twice the disk space because they work in-place. Call now! We'll give you live snapshot merge using image streaming for FREE 1. Quiesce I/O and pause VM. 2. Free clusters in the snapshot that are allocated in the copy-on-write delta file. 3. Make the copy-on-write delta file the backing file of the snapshot. (Now the snapshot image has most clusters allocated except those which were COWed due to a write after the snapshot was taken) 4. Resume VM. 5. Start streaming the snapshot image in order to copy the COW data back into the snapshot file. 6. Streaming completes and disables the backing file, leaving the merged snapshot. 7. Delete the COW file. This approach is much more handwavy. We needed to invert the backing file relationship between snapshot and COW file, as well as freeing clusters in order to make image streaming copy the data backing into the snapshot file. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Sat, Jun 18, 2011 at 10:15 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 17, 2011 at 1:31 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? Think of the use case with a Fedora master image over NFS. You want a local clone of that master image and use the stream command to copy the data from the master image into the local clone. You cannot modify that master image because other VMs are using it too and/or you want to be able to clone new VMs from it in the future. BTW the workaround is to create two local images: 1. Local clone with master image as a backing file. This is the live block copy source image. 2. Local image without a backing file. This is the live block copy destination image. But this is not very elegant. Writes get mirrored so that crash recovery works. Compare that to image streaming, which works across crash and doesn't duplicate I/O. Stefan
Re: [Qemu-devel] Image streaming and live block copy
Am 16.06.2011 16:38, schrieb Marcelo Tosatti: On Thu, Jun 16, 2011 at 02:35:37PM +0200, Kevin Wolf wrote: Am 14.06.2011 20:18, schrieb Stefan Hajnoczi: Overview This patch series adds image streaming support for QED image files. Other image formats can also be supported in the future. Image streaming populates the file in the background while the guest is running. This makes it possible to start the guest before its image file has been fully provisioned. Example use cases include: * Providing small virtual appliances for download that can be launched immediately but provision themselves in the background. * Reducing guest provisioning time by creating local image files but backing them with shared master images which will be streamed. When image streaming is enabled, the unallocated regions of the image file are populated with the data from the backing file. This occurs in the background and the guest can perform regular I/O in the meantime. Once the entire backing file has been streamed, the image no longer requires a backing file and will drop its reference Long CC list and Kevin wearing his upstream hat - this might be an unpleasant email. :-) So yesterday I had separate discussions with Stefan about image streaming and Marcelo, Avi and some other folks about live block copy. The conclusion was in both cases that, yes, both things are pretty similar and, yes, the current implementation don't reflect that but duplicate everything. To summarise what both things are about: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) People seem to agree on this, and the reason that I've heard why we should merge the existing code instead is downstream time pressure. That may be a valid reason for downstreams to add such code, but is taking such code really the best option for upstream (and therefore long-term for downstreams)? If we take these patches as they are, I doubt that we'll ever get a rewrite to implement the code as it should have been done in the first place. That (a newer, unified mechanism) is just a matter of allocating resources to the implementation. At least in block copy's case the interface can be reused, so it can be seen as an incremental approach (read: advocating in favour of merging live block copy patchset). Right. However, my point was that I'm afraid that this resource allocation and therefore the incremental improvement won't happen because for most people (who don't care about the code) the result will be good enough and the problem will only be visible for block layer people who must work with the code. Kevin
Re: [Qemu-devel] Image streaming and live block copy
Am 16.06.2011 16:52, schrieb Marcelo Tosatti: On Thu, Jun 16, 2011 at 03:08:30PM +0200, Kevin Wolf wrote: Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. Might be an option. In this case generic code is only involved with the stacking of BlockDriverStates, which is already implemented (but requires -blockdev for a sane way to configure things). Kevin What are the disadvantages of such an approach for image streaming, versus the current QED approach? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. Why maintain it in memory? We already have mechanisms to track this in COW image formats, so that you can even continue after a crash. We can still add a raw-cow driver that maintains the COW data in memory for allowing raw copies, if this is needed. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. Image streaming shouldn't write to the source. But adding a flag for this isn't a major problem. Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Where remote-image is remote accessible image such as NBD. I think that should work. By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 Kevin
Re: [Qemu-devel] Image streaming and live block copy
On Fri, Jun 17, 2011 at 9:36 AM, Kevin Wolf kw...@redhat.com wrote: By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 Embedding block driver options in filenames is getting worse as time goes on. I recently tried to refactor and eliminate QEMUOptionParameter so that we only use QemuOpts instead of two different option APIs. Part of that involves keeping separate per-block driver (i.e. -blockdev) options lists, which would allow us to pass proper options to block drivers instead of embedding them in the filename. I got stuck because today the protocol and format QEMUOptionParameters get concatenated in some cases. Concatenation is not really supported by QemuOpts :). Anyway, here's the current state if anyone is interested: http://repo.or.cz/w/qemu/stefanha.git/commitdiff/b49babb2c8b476a36357cfd7276ca45a11039ca5 Stefan
Re: [Qemu-devel] Image streaming and live block copy
Am 17.06.2011 10:57, schrieb Stefan Hajnoczi: On Fri, Jun 17, 2011 at 9:36 AM, Kevin Wolf kw...@redhat.com wrote: By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 Embedding block driver options in filenames is getting worse as time goes on. Well, yes. We need -blockdev for a sane way to express complex relations between BlockDriverStates. But then, we'll also want to have convenient shortcuts for manual use, and that may be something like the existing colon syntax. I really don't feel like typing three full -blockdev parameters for qcow2 on blockdbg on raw. I recently tried to refactor and eliminate QEMUOptionParameter so that we only use QemuOpts instead of two different option APIs. Part of that involves keeping separate per-block driver (i.e. -blockdev) options lists, which would allow us to pass proper options to block drivers instead of embedding them in the filename. Aren't these completely independent things? QEMUOptionParameter is used for image creation, whereas filenames are used for opening images. I think you can change one without changing the other. Kevin
Re: [Qemu-devel] Image streaming and live block copy
On 06/17/2011 03:36 AM, Kevin Wolf wrote: Am 16.06.2011 16:52, schrieb Marcelo Tosatti: On Thu, Jun 16, 2011 at 03:08:30PM +0200, Kevin Wolf wrote: Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Where remote-image is remote accessible image such as NBD. I think that should work. By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 So what's the main issue with -blockdev today? Just need someone to spend some time implementing it? Also, it would be a useful exercise to try and capture some of this in the wiki. That makes it a bit easier to reference as we move forward. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] Image streaming and live block copy
On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? Not directly. For block copy, the destination (or source) images might share the same backing file (this is a requirement for live snapshot merge).
Re: [Qemu-devel] Image streaming and live block copy
On Fri, Jun 17, 2011 at 10:36:21AM +0200, Kevin Wolf wrote: Am 16.06.2011 16:52, schrieb Marcelo Tosatti: On Thu, Jun 16, 2011 at 03:08:30PM +0200, Kevin Wolf wrote: Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. Might be an option. In this case generic code is only involved with the stacking of BlockDriverStates, which is already implemented (but requires -blockdev for a sane way to configure things). Kevin What are the disadvantages of such an approach for image streaming, versus the current QED approach? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. Why maintain it in memory? We already have mechanisms to track this in COW image formats, so that you can even continue after a crash. Which mechanism is that? You'd need separate metadata for image streaming purposes, since you might want the destination-image to use a backing image itself. We can still add a raw-cow driver that maintains the COW data in memory for allowing raw copies, if this is needed.
Re: [Qemu-devel] Image streaming and live block copy
On Fri, Jun 17, 2011 at 10:22 AM, Kevin Wolf kw...@redhat.com wrote: Am 17.06.2011 10:57, schrieb Stefan Hajnoczi: On Fri, Jun 17, 2011 at 9:36 AM, Kevin Wolf kw...@redhat.com wrote: By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 Embedding block driver options in filenames is getting worse as time goes on. Well, yes. We need -blockdev for a sane way to express complex relations between BlockDriverStates. But then, we'll also want to have convenient shortcuts for manual use, and that may be something like the existing colon syntax. I really don't feel like typing three full -blockdev parameters for qcow2 on blockdbg on raw. I recently tried to refactor and eliminate QEMUOptionParameter so that we only use QemuOpts instead of two different option APIs. Part of that involves keeping separate per-block driver (i.e. -blockdev) options lists, which would allow us to pass proper options to block drivers instead of embedding them in the filename. Aren't these completely independent things? QEMUOptionParameter is used for image creation, whereas filenames are used for opening images. I think you can change one without changing the other. Yeah but I'd rather not spread two different APIs to do the same thing. Most of QEMU uses QemuOpts but image creation uses QEMUOptionParameter. The longer we leave that the more quirks like the concatenation behavior will fragment the two. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Thu, Jun 16, 2011 at 04:30:18PM +0100, Stefan Hajnoczi wrote: On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. Why not? Is there any disadvantage of mirroring writes? If QEMU crashes or there is a power failure we need to restart the streaming process carefully - local blocks must not be overwritten. Perhaps this is the tricky part. Under the proposed scheme, if QEMU crashes you'd restart streaming from zero. In that case, the remote image is consistent due to mirrored writes. That is one disadvantage of keeping the local/remote status of blocks in memory: in case of a crash you'd have to restart from zero. But this should be an uncommon case (and there is not much of an option for generic-format image streaming without keeping metadata). Do you see a problem with that? Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Fri, Jun 17, 2011 at 10:36:21AM +0200, Kevin Wolf wrote: Am 16.06.2011 16:52, schrieb Marcelo Tosatti: On Thu, Jun 16, 2011 at 03:08:30PM +0200, Kevin Wolf wrote: Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. Might be an option. In this case generic code is only involved with the stacking of BlockDriverStates, which is already implemented (but requires -blockdev for a sane way to configure things). Kevin What are the disadvantages of such an approach for image streaming, versus the current QED approach? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. Why maintain it in memory? We already have mechanisms to track this in COW image formats, so that you can even continue after a crash. We can still add a raw-cow driver that maintains the COW data in memory for allowing raw copies, if this is needed. Well, then image streaming is not for generic-format anymore. OK, the uptodate information can live in disk if supported by the lower level format. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. Image streaming shouldn't write to the source. But adding a flag for this isn't a major problem. OK, block copy does write to the source. Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Where remote-image is remote accessible image such as NBD. I think that should work. By the way, we'll get problems with the colon syntax. Without -blockdev we'll have to invent a new syntax, maybe with brackets: blkstream:[nbd:localhost]:out.qcow2 Kevin
Re: [Qemu-devel] Image streaming and live block copy
On 06/16/2011 07:35 AM, Kevin Wolf wrote: Am 14.06.2011 20:18, schrieb Stefan Hajnoczi: Overview This patch series adds image streaming support for QED image files. Other image formats can also be supported in the future. Image streaming populates the file in the background while the guest is running. This makes it possible to start the guest before its image file has been fully provisioned. Example use cases include: * Providing small virtual appliances for download that can be launched immediately but provision themselves in the background. * Reducing guest provisioning time by creating local image files but backing them with shared master images which will be streamed. When image streaming is enabled, the unallocated regions of the image file are populated with the data from the backing file. This occurs in the background and the guest can perform regular I/O in the meantime. Once the entire backing file has been streamed, the image no longer requires a backing file and will drop its reference Long CC list and Kevin wearing his upstream hat - this might be an unpleasant email. :-) So yesterday I had separate discussions with Stefan about image streaming and Marcelo, Avi and some other folks about live block copy. The conclusion was in both cases that, yes, both things are pretty similar and, yes, the current implementation don't reflect that but duplicate everything. To summarise what both things are about: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image Is this correct in practice? Image streaming has the following semantics for A - B where B is the backing file of A. 1) All writes go to A. 2) If a read can be satisified by A, read from A, else read from B, copy to A, then return Block copy has the following semantics where A is the source and B is the destination. 1) All reads and writes go to A 2) Copy data from B to A in the background 3) When B matches the content of A, switch over to B Other than at a hand wave, they both do copies, I'm not sure I see the overlap in implementations. Regards, Anthony Liguori
Re: [Qemu-devel] Image streaming and live block copy
Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. Might be an option. In this case generic code is only involved with the stacking of BlockDriverStates, which is already implemented (but requires -blockdev for a sane way to configure things). Kevin
Re: [Qemu-devel] Image streaming and live block copy
On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Image streaming and live block copy
On 06/16/2011 04:08 PM, Kevin Wolf wrote: Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. You're right, it's not exactly the same. But I think it can use much of the same code. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Image streaming and live block copy
Am 16.06.2011 15:10, schrieb Anthony Liguori: On 06/16/2011 07:35 AM, Kevin Wolf wrote: To summarise what both things are about: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image Is this correct in practice? Image streaming has the following semantics for A - B where B is the backing file of A. That B is a backing file of A is an implementation detail and not a requirement. 1) All writes go to A. 2) If a read can be satisified by A, read from A, else read from B, copy to A, then return Block copy has the following semantics where A is the source and B is the destination. 1) All reads and writes go to A 2) Copy data from B to A in the background 3) When B matches the content of A, switch over to B 3) is optional, it would be like adding an item for image streaming that it drops the backing file as soon as everything has been copied. Other than at a hand wave, they both do copies, I'm not sure I see the overlap in implementations. One thing is handling concurrent requests. If there's a concurrent guest write request, it must always have precedence over the background copy/COR. And even if we couldn't use a common implementation for live block copy and image streaming, I think it's something that shouldn't be duplicated for copy on read in each image format driver. I think it's possible to have a generic COR implementation that would not only work for QED, but also for qcow2, VMDK and any other format implementing backing files without adding code to each driver. Kevin
Re: [Qemu-devel] Image streaming and live block copy
On Thu, Jun 16, 2011 at 11:52:43AM -0300, Marcelo Tosatti wrote: This approach does not use the backing file feature? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. We open the remote file read-only for image streaming and do not want to mirror writes. If QEMU crashes or there is a power failure we need to restart the streaming process carefully - local blocks must not be overwritten. Perhaps this is the tricky part. Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Stefan
Re: [Qemu-devel] Image streaming and live block copy
On Thu, Jun 16, 2011 at 03:08:30PM +0200, Kevin Wolf wrote: Am 16.06.2011 14:49, schrieb Avi Kivity: On 06/16/2011 03:35 PM, Kevin Wolf wrote: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image Or a block-mirror started in degraded mode. At least not in the same configuration as with live block copy: You don't want to write to the source, you only want to read from it when the destination doesn't have the data yet. * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image = block-mirror started in degraded mode The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) Or in a block-mirror block format driver - generic code need not be involved. Might be an option. In this case generic code is only involved with the stacking of BlockDriverStates, which is already implemented (but requires -blockdev for a sane way to configure things). Kevin What are the disadvantages of such an approach for image streaming, versus the current QED approach? blkstream block driver: - Maintain in memory whether given block is allocated in local image, if not, read from remote, write to local. Set block as local. Local and remote simply two block drivers from image streaming driver POV. - Once all blocks are local, notify mgmt so it can switch to local copy. - Writes are mirrored to source and destination, minding guest writes over copy writes. Over this scheme, you'd have: 1) Block copy. Reopen image to be copied with blkstream:/path/to/current-image:/path/to/destination-image, background read sectors 0...N. 2) Image stream: blkstream:remote-image:/path/to/local-image, background read sectors 0...N. Where remote-image is remote accessible image such as NBD.
Re: [Qemu-devel] Image streaming and live block copy (was: [PATCH 00/13] QED image streaming)
On Thu, Jun 16, 2011 at 02:35:37PM +0200, Kevin Wolf wrote: Am 14.06.2011 20:18, schrieb Stefan Hajnoczi: Overview This patch series adds image streaming support for QED image files. Other image formats can also be supported in the future. Image streaming populates the file in the background while the guest is running. This makes it possible to start the guest before its image file has been fully provisioned. Example use cases include: * Providing small virtual appliances for download that can be launched immediately but provision themselves in the background. * Reducing guest provisioning time by creating local image files but backing them with shared master images which will be streamed. When image streaming is enabled, the unallocated regions of the image file are populated with the data from the backing file. This occurs in the background and the guest can perform regular I/O in the meantime. Once the entire backing file has been streamed, the image no longer requires a backing file and will drop its reference Long CC list and Kevin wearing his upstream hat - this might be an unpleasant email. :-) So yesterday I had separate discussions with Stefan about image streaming and Marcelo, Avi and some other folks about live block copy. The conclusion was in both cases that, yes, both things are pretty similar and, yes, the current implementation don't reflect that but duplicate everything. To summarise what both things are about: * Image streaming is a normal image file plus copy-on-read plus a background task that copies data from the source image * Live block copy is a block-mirror of two normal image files plus a background task that copies data from the source image The right solution is probably to implement COR and the background task in generic block layer code (no reason to restrict it to QED) and use it for both image streaming and live block copy. (This is a bit more complicated than it may sound here because guest writes must always take precedence over a copy - but doing complicated things is an even better reason to do it in a common place instead of duplicating) People seem to agree on this, and the reason that I've heard why we should merge the existing code instead is downstream time pressure. That may be a valid reason for downstreams to add such code, but is taking such code really the best option for upstream (and therefore long-term for downstreams)? If we take these patches as they are, I doubt that we'll ever get a rewrite to implement the code as it should have been done in the first place. That (a newer, unified mechanism) is just a matter of allocating resources to the implementation. At least in block copy's case the interface can be reused, so it can be seen as an incremental approach (read: advocating in favour of merging live block copy patchset). So I'm tempted to reject the current versions of both the image streaming and live block copy series and leave it to downstreams to use these as temporary solutions if the time pressure is too high. I know that maintaining things downstream is painful, but that's the whole point: I want to see the real implementation one day, and I'm afraid this might be the only way to get it. Kevin
Re: [Qemu-devel] Image streaming and live block copy (was: [PATCH 00/13] QED image streaming)
On Thu, Jun 16, 2011 at 11:38:48AM -0300, Marcelo Tosatti wrote: People seem to agree on this, and the reason that I've heard why we should merge the existing code instead is downstream time pressure. That may be a valid reason for downstreams to add such code, but is taking such code really the best option for upstream (and therefore long-term for downstreams)? If we take these patches as they are, I doubt that we'll ever get a rewrite to implement the code as it should have been done in the first place. That (a newer, unified mechanism) is just a matter of allocating resources to the implementation. At least in block copy's case the interface can be reused, so it can be seen as an incremental approach (read: advocating in favour of merging live block copy patchset). So I'm tempted to reject the current versions of both the image streaming and live block copy series and leave it to downstreams to use these as temporary solutions if the time pressure is too high. I know that maintaining things downstream is painful, but that's the whole point: I want to see the real implementation one day, and I'm afraid this might be the only way to get it. Again: there is no need to reject current live block copy implementation, as image streaming implemented for generic image formats is a strong motivator behind the unified implementation.