Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-27 Thread Stefan Hajnoczi
On Mon, May 23, 2011 at 2:02 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sun, May 22, 2011 at 10:52 AM, Dor Laor dl...@redhat.com wrote:
 On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote:

 I'm interested in what the API for snapshots would look like.
 Specifically how does user software do the following:
 1. Create a snapshot
 2. Delete a snapshot
 3. List snapshots
 4. Access data from a snapshot

 There are plenty of options there:
  - Run a (unrelated) VM and hotplug the snapshot as additional disk

 This is the backup appliance VM model and makes it possible to move
 the backup application to where the data is (or not, if you have a SAN
 and decide to spin up the appliance VM on another host).  This should
 be perfectly doable if snapshots are volumes at the libvirt level.

 A special-case of the backup appliance VM is using libguestfs to
 access the snapshot from the host.  This includes both block-level and
 file system-level access along with OS detection APIs that libguestfs
 provides.

 If snapshots are volumes at the libvirt level, then it is also
 possible to use virStorageVolDownload() to stream the entire snapshot
 through libvirt:
 http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload

 Summarizing, here are three access methods that integrate with libvirt
 and cover many use cases:

 1. Backup appliance VM.  Add a readonly snapshot volume to a backup
 appliance VM.  If shared storage (e.g. SAN) is available then the
 appliance can be run on any host.  Otherwise the appliance must run on
 the same host that the snapshot resides on.

 2. Libguestfs client on host.  Launch libguestfs with the readonly
 snapshot volume.  The backup application runs directly on the host, it
 has both block and file system access to the snapshot.

 3. Download the snapshot to a remote host for backup processing.  Use
 the virStorageVolDownload() API to download the snapshot onto a
 libvirt client machine.  Dirty block tracking is still useful here
 since the virStorageVolDownload() API supports offset, length
 arguments.

Jagane,
What do you think about these access methods?  What does your custom
protocol integrate with today - do you have a custom non-libvirt KVM
management stack?

Stefan



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-27 Thread Jagane Sundar

On 5/27/2011 9:46 AM, Stefan Hajnoczi wrote:

On Mon, May 23, 2011 at 2:02 PM, Stefan Hajnoczistefa...@gmail.com  wrote:

On Sun, May 22, 2011 at 10:52 AM, Dor Laordl...@redhat.com  wrote:

On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote:

I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot

There are plenty of options there:
  - Run a (unrelated) VM and hotplug the snapshot as additional disk

This is the backup appliance VM model and makes it possible to move
the backup application to where the data is (or not, if you have a SAN
and decide to spin up the appliance VM on another host).  This should
be perfectly doable if snapshots are volumes at the libvirt level.

A special-case of the backup appliance VM is using libguestfs to
access the snapshot from the host.  This includes both block-level and
file system-level access along with OS detection APIs that libguestfs
provides.

If snapshots are volumes at the libvirt level, then it is also
possible to use virStorageVolDownload() to stream the entire snapshot
through libvirt:
http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload

Summarizing, here are three access methods that integrate with libvirt
and cover many use cases:

1. Backup appliance VM.  Add a readonly snapshot volume to a backup
appliance VM.  If shared storage (e.g. SAN) is available then the
appliance can be run on any host.  Otherwise the appliance must run on
the same host that the snapshot resides on.

2. Libguestfs client on host.  Launch libguestfs with the readonly
snapshot volume.  The backup application runs directly on the host, it
has both block and file system access to the snapshot.

3. Download the snapshot to a remote host for backup processing.  Use
the virStorageVolDownload() API to download the snapshot onto a
libvirt client machine.  Dirty block tracking is still useful here
since the virStorageVolDownload() API supportsoffset, length
arguments.

Jagane,
What do you think about these access methods?  What does your custom
protocol integrate with today - do you have a custom non-libvirt KVM
management stack?

Stefan

Hello Stefan,

The current livebackup_client simply creates a backup of the VM on the
backup server. It can save the backup image as a complete image for
quick start of the VM on the backup server, or as 'full + n number of
incremental backup redo files'. The 'full + n incremental redo' is useful
if you want to store the backup on tape.

I don't have a full backup management stack yet. If livebackup_client
were available as part of kvm, then that would turn into the
command line utility that the backup management stack would use.
My own intertest is in using livebackup_client to integrate all
management functions into openstack. All management built
into openstack will be built with the intent of self service.
However, other Enterprise backup management stacks such as that
from Symantec, etc. can be enhanced to use livebackup_client to
extract the backup from the VM Host.

How does it apply to the above access mechanisms. Hmm. Let me see.

1. Backup appliance VM. : A backup appliance VM can be started
up and the livebackup images can be connected to it. The
limitation is that the backup appliance VM must be started up
on the backup server, where the livebackup image resides on
a local disk.

2. Libguestfs client on host. This too is possible. The
restriction is that libguestfs must be on the backup
server, and not on the VM Host.

3. Download the snapshot to a remote host for backup processing.
This is the native method for livebackup.


Thanks,
Jagane




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-23 Thread Stefan Hajnoczi
On Sun, May 22, 2011 at 10:52 AM, Dor Laor dl...@redhat.com wrote:
 On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote:

 I'm interested in what the API for snapshots would look like.
 Specifically how does user software do the following:
 1. Create a snapshot
 2. Delete a snapshot
 3. List snapshots
 4. Access data from a snapshot

 There are plenty of options there:
  - Run a (unrelated) VM and hotplug the snapshot as additional disk

This is the backup appliance VM model and makes it possible to move
the backup application to where the data is (or not, if you have a SAN
and decide to spin up the appliance VM on another host).  This should
be perfectly doable if snapshots are volumes at the libvirt level.

A special-case of the backup appliance VM is using libguestfs to
access the snapshot from the host.  This includes both block-level and
file system-level access along with OS detection APIs that libguestfs
provides.

If snapshots are volumes at the libvirt level, then it is also
possible to use virStorageVolDownload() to stream the entire snapshot
through libvirt:
http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload

Summarizing, here are three access methods that integrate with libvirt
and cover many use cases:

1. Backup appliance VM.  Add a readonly snapshot volume to a backup
appliance VM.  If shared storage (e.g. SAN) is available then the
appliance can be run on any host.  Otherwise the appliance must run on
the same host that the snapshot resides on.

2. Libguestfs client on host.  Launch libguestfs with the readonly
snapshot volume.  The backup application runs directly on the host, it
has both block and file system access to the snapshot.

3. Download the snapshot to a remote host for backup processing.  Use
the virStorageVolDownload() API to download the snapshot onto a
libvirt client machine.  Dirty block tracking is still useful here
since the virStorageVolDownload() API supports offset, length
arguments.

 5. Restore a VM from a snapshot

Simplest option: virStorageVolUpload().

 6. Get the dirty blocks list (for incremental backup)

 It might be needed for additional proposes like efficient delta sync across
 sites or any other storage operation (dedup, etc)


 We've discussed image format-level approaches but I think the scope of
 the API should cover several levels at which snapshots are
 implemented:
 1. Image format - image file snapshot (Jes, Jagane)
 2. Host file system - ext4 and btrfs snapshots
 3. Storage system - LVM or SAN volume snapshots

 It will be hard to take advantage of more efficient host file system
 or storage system snapshots if they are not designed in now.

 I agree but it can also be a chicken and the egg problem.
 Actually 1/2/3/5 are already working today regardless of live snapshots.

 Is anyone familiar enough with the libvirt storage APIs to draft an
 extension that adds snapshot support?  I will take a stab at it if no
 one else want to try it.

 I added libvirt-list and Ayal Baron from vdsm.
 What you're asking is even beyond snapshots, it's the whole management of VM
 images. Doing the above operations is simple but for enterprise
 virtualization solution you'll need to lock the NFS/SAN images, handle
 failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc.

 Today it is managed by a combination of rhev-m/vdsm and libvirt.
 I agree it would have been nice to get such common single entry point
 interface.

Okay, the user API seems to be one layer above libvirt.

Stefan



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-22 Thread Dor Laor

On 05/20/2011 03:19 PM, Stefan Hajnoczi wrote:

I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot


There are plenty of options there:
 - Run a (unrelated) VM and hotplug the snapshot as additional disk
 - Use v2v (libguestfs)
 - Boot the VM w/ RO
 - Plenty more


5. Restore a VM from a snapshot
6. Get the dirty blocks list (for incremental backup)


It might be needed for additional proposes like efficient delta sync 
across sites or any other storage operation (dedup, etc)




We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
implemented:
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots

It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.


I agree but it can also be a chicken and the egg problem.
Actually 1/2/3/5 are already working today regardless of live snapshots.


Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support?  I will take a stab at it if no
one else want to try it.


I added libvirt-list and Ayal Baron from vdsm.
What you're asking is even beyond snapshots, it's the whole management 
of VM images. Doing the above operations is simple but for enterprise 
virtualization solution you'll need to lock the NFS/SAN images, handle 
failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc.


Today it is managed by a combination of rhev-m/vdsm and libvirt.
I agree it would have been nice to get such common single entry point 
interface.




Stefan





Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-22 Thread Jagane Sundar

Hello Stefan,

I have been thinking about this since you sent out this message.
A quick look at the libvirt API indicates that their notion of a
snapshot often refers to a disk+memory snapshot. It would
be good to provide feedback to the libvirt developers to make
sure that proper support for a 'disk only snapshot' capability is
included.

You might have already seen this, but here's a email chain from
the libvirt mailing list that's relevant:

http://www.redhat.com/archives/libvir-list/2010-March/msg01389.html

I am very interested in enhancing libvirt to support
the Livebackup semantics, for the following reason:
If libvirt can be enhanced to support all the constructs
required for full Livebackup functionality, then I would like to
remove the built-in livebackup network protocol, and rewrite
the client such that it is a native program on the VM host linked
with libvirt, and can perform a full or incremental backup using
libvirt. If a remote backup needs to be performed, then I would
require the remote client to ssh into the VM host, and then
run the local backup and pipe back to the remote backup host.
This way I would not need to deal with authentication of
livebackup client and server, and encryption of the network
connection.

Please see my feedback regarding the specific operations below:

On 5/20/2011 5:19 AM, Stefan Hajnoczi wrote:

I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot

For livebackup, one parameter that is required is the 'full' or
'incremental' backup parameter. If the param is 'incremental'
then only the blocks that were modified since the last snapshot
command was issued are part of the snapshot. If the param
is 'full', the the snapshot includes all the blocks of all the disks
in the VM.

2. Delete a snapshot

Simple for livebackup, since no more than one snapshot is
allowed. Hence naming is a non-issue. As is deleting.

3. List snapshots

Again, simple for livebackup, on account of the one
active snapshot restriction.

4. Access data from a snapshot

In traditional terms, access could mean many
things. Some examples:
1. Access lists a set of files on the local
file system of the VM Host. A small VM
   may be started up, and mount these
   snapshot files as a set of secondary drives
2. Publish the snapshot drives as iSCSI LUNs.
3. If the origin drives are on a Netapp filer,
perhaps a filer snapshot is created, and
a URL describing that snapshot is printed
out.

Access, in Livebackup terms, is merely copying
dirty blocks over from qemu. Livebackup does
not provide a random access mode - i.e. one
where a VM could be started using the snapshot.

Currently, Livebackup uses 4K clusters of 512 byte
blocks. 'Dirty clusters' are transferred over by the
client supplying a 'cluster number' param, and qemu
returning the next 'n' number of contiguous dirty
clusters. At the end, qemu returns a 'no-more-dirty'
error.

5. Restore a VM from a snapshot


Additional info for re-creating the VM needs to be
saved when a snapshot is saved. The origin VM's
libvirt XML desciptor should probably be saved
along with the snapshot.


6. Get the dirty blocks list (for incremental backup)

Either a complete dump of the dirty blocks, or a way
to iterate through the dirty blocks and fetch them
needs to be provided. My preference is to use the
iterate through the dirty blocks approach, since
that will enable the client to pace the backup
process and provide guarantees such as 'no more
than 10% of the network b/w will be utilized for
backup'.

We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
implemented:
1. Image format - image file snapshot (Jes, Jagane)

Livebackup uses qcow2 to save the Copy-On-Write blocks
that are dirtied by the VM when the snapshot is active.

2. Host file system - ext4 and btrfs snapshots

I have tested with ext4 and raw LVM volumes for the origin
virtual disk files. The qcow2 COW files have only resided on
ext4.

3. Storage system - LVM or SAN volume snapshots




It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.


I agree. A snapshot and restore from backup should not result in
the virtual disk file getting inflated (going from sparse to fully
allocated, for example).

Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support?  I will take a stab at it if no
one else want to try it.


I have only looked at it briefly, after getting your email message.
If you can take a deeper look at it, I would be willing to work with
you to iron out details.

Thanks,
Jagane




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
5. Restore a VM from a snapshot
6. Get the dirty blocks list (for incremental backup)

We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
implemented:
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots

It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.

Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support?  I will take a stab at it if no
one else want to try it.

Stefan



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Jes Sorensen
On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

I presume you're talking external snapshots here? The API is really what
should be defined by libvirt, so you get a unified API that can work
both on QEMU level snapshots as well as enterprise storage, host file
system snapshots etc.

 Specifically how does user software do the following:
 1. Create a snapshot

There's a QMP patch out already that is still not applied, but it is
pretty simple, similar to the hmp command.

Alternatively you can do it the evil way by pre-creating the snapshot
image file and feeding that the snapshot command. In this case QEMU
won't create the snapshot file.

 2. Delete a snapshot

This is still to be defined.

 3. List snapshots

Again this is tricky as it depends on the type of snapshot. For QEMU
level ones they are files, so 'ls' is your friend :)

 4. Access data from a snapshot

You boot the snapshot file.

 5. Restore a VM from a snapshot

We're talking snapshots not checkpointing here, so you cannot restore a
VM from a snapshot.

 6. Get the dirty blocks list (for incremental backup)

Good question

 We've discussed image format-level approaches but I think the scope of
 the API should cover several levels at which snapshots are
 implemented:
 1. Image format - image file snapshot (Jes, Jagane)
 2. Host file system - ext4 and btrfs snapshots
 3. Storage system - LVM or SAN volume snapshots
 
 It will be hard to take advantage of more efficient host file system
 or storage system snapshots if they are not designed in now.
 
 Is anyone familiar enough with the libvirt storage APIs to draft an
 extension that adds snapshot support?  I will take a stab at it if no
 one else want to try it.

I believe the libvirt guys are already looking at this. Adding to the CC
list.

Cheers,
Jes



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Stefan Hajnoczi
On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

 I presume you're talking external snapshots here? The API is really what
 should be defined by libvirt, so you get a unified API that can work
 both on QEMU level snapshots as well as enterprise storage, host file
 system snapshots etc.

Thanks for the pointers on external snapshots using image files.  I'm
really thinking about the libvirt API.

Basically I'm not sure we'll implement the right things if we don't
think through the API that the user sees first.

Stefan



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-20 Thread Jes Sorensen
On 05/20/11 14:49, Stefan Hajnoczi wrote:
 On Fri, May 20, 2011 at 1:39 PM, Jes Sorensen jes.soren...@redhat.com wrote:
 On 05/20/11 14:19, Stefan Hajnoczi wrote:
 I'm interested in what the API for snapshots would look like.

 I presume you're talking external snapshots here? The API is really what
 should be defined by libvirt, so you get a unified API that can work
 both on QEMU level snapshots as well as enterprise storage, host file
 system snapshots etc.
 
 Thanks for the pointers on external snapshots using image files.  I'm
 really thinking about the libvirt API.
 
 Basically I'm not sure we'll implement the right things if we don't
 think through the API that the user sees first.

Right, I agree. There's a lot of variables there, and they are not
necessarily easy to map into a single namespace. I am not sure it should
be done either..

Cheers,
Jes



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-18 Thread Jagane Sundar

Hello Dor,

I'm glad I could convince you of the value of Livebackup. I
think Livesnapshot/Livemerge, Livebackup and Block
Migration all have very interesting use cases. For example:

- Livesnapshot/Livemerge is very useful in development/QA
  environments where one might want to create a snapshot
  before trying out some new software and then committing.
- Livebackup is useful in cloud environments where the
  Cloud Service Provider may want to offer regularly scheduled
  backed up VMs with no effort on the part of the customer
- Block Migration with COR is useful in Cloud Service provider
  environments where an arbitrary VM may need to be
  migrated over to another VM server, even though the VM
  is on direct attached storage.

The above is by no means an exhaustive list of use cases. I
am sure qemu/qemu-kvm users can come up with more.

Although there are some common concepts in these three
technologies, I think we should support all three in base
qemu. This would make qemu/qemu-kvm more feature rich
than vmware, xen and hyper-v.

Thanks,
Jagane

On 5/17/2011 3:53 PM, Dor Laor wrote:

On 05/16/2011 11:23 AM, Jagane Sundar wrote:

Hello Dor,

Let me see if I understand live snapshot correctly:
If I want to configure a VM for daily backup, then I would do
the following:
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
copy over the snapshot s1, which is the incremental
backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
I would live merge s1 with s0, to create a new merged
read-only image s1'.
- On day 2, I would create a new snapshot s3, then
copy over s2, which is the incremental backup from
s1' to s2
- And so on...

With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?

No and yes ;-)

For regular non incremental backup you can have no snapshot active most
times:

   - Create a snapshot s1. s0 is marked read-only.
   - Do a full backup of s0 on day 0.
   - Once backup is finished, live merge s1 into s0 and make s0 writeable
 again.

So this way there are no performance penalty here.
Here we need an option to track dirty block bits (either as internal
format or external file). This will be both efficient and get the job done.

But in order to be efficient in storage we'll need to ask the snapshot
creation to only refer to these dirt blocks.
Well, thinking out load, it turned out to your solution :)

Ok, I do see the value there is with incremental backups.

I'm aware that there were requirements that the backup software itself
will be done from the guest filesystem level, there incremental backup
would be done on the FS layer.

Still I do see the value in your solution.

Another option for us would be to keep the latest snapshots around and
and let the guest IO go through them all the time. There is some
performance cost but as the newer image format develop, this cost is
relatively very low.


If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?

Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?

Please read below for more comments:

2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.

Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.


What happens when a VM host reboots while a live merge of s0
and s1 is being done?

Live merge is using live copy that does duplicates each write IO.
On a host crash, the merge will continue from the same point where it
stopped.

I think I answered the your other good comments above.
Thanks,
Dor


The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.

In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.


Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.


One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the 

Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-17 Thread Dor Laor

On 05/16/2011 11:23 AM, Jagane Sundar wrote:

Hello Dor,

Let me see if I understand live snapshot correctly:
If I want to configure a VM for daily backup, then I would do
the following:
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
copy over the snapshot s1, which is the incremental
backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
I would live merge s1 with s0, to create a new merged
read-only image s1'.
- On day 2, I would create a new snapshot s3, then
copy over s2, which is the incremental backup from
s1' to s2
- And so on...

With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?


No and yes ;-)

For regular non incremental backup you can have no snapshot active most 
times:


 - Create a snapshot s1. s0 is marked read-only.
 - Do a full backup of s0 on day 0.
 - Once backup is finished, live merge s1 into s0 and make s0 writeable
   again.

So this way there are no performance penalty here.
Here we need an option to track dirty block bits (either as internal 
format or external file). This will be both efficient and get the job done.


But in order to be efficient in storage we'll need to ask the snapshot 
creation to only refer to these dirt blocks.

Well, thinking out load, it turned out to your solution :)

Ok, I do see the value there is with incremental backups.

I'm aware that there were requirements that the backup software itself 
will be done from the guest filesystem level, there incremental backup 
would be done on the FS layer.


Still I do see the value in your solution.

Another option for us would be to keep the latest snapshots around and 
and let the guest IO go through them all the time. There is some 
performance cost but as the newer image format develop, this cost is 
relatively very low.




If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?

Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?

Please read below for more comments:

2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.

Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.


What happens when a VM host reboots while a live merge of s0
and s1 is being done?


Live merge is using live copy that does duplicates each write IO.
On a host crash, the merge will continue from the same point where it 
stopped.


I think I answered the your other good comments above.
Thanks,
Dor


The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.

In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.


Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.


One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.

Both solutions can serve for the same scenario:
With live snapshot the backup is done the following:

1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
There are multiple ways for doing that -
1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1-s0
We'll use live copy for that so each write is duplicated (like your
live backup solution).
5. Delete s1

As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.



As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a 

Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-16 Thread Dor Laor

On 05/16/2011 12:38 AM, Jagane Sundar wrote:

Hello Dor,

One important advantage of live snapshot over live backup is support of
multiple (consecutive) live snapshots while there can be only a single
live backup at one time.

This is why I tend to think that although live backup carry some benefit
(no merge required), the live snapshot + live merge are more robust
mechanism.



The two things that concern me regarding the
live snapshot/live merge approach are:
1. Performance considerations of having
multiple active snapshots?


My description above was in accurate and I only hinted that multiple 
snapshots are possible but they are done consecutively -
Live snapshot takes practically almost no time - just the time to get 
the guest virtagent to freeze the guest FS and to create the snapshot 
(for qcow2 is immediate).


So if you like to have multiple snapshot, let's say 5 minute after you 
issued the first snapshot, there is no problem.


The new writes will go to the snapshot while the former base is marked 
as read only.
Eventually you like to (live) merge the snapshots together. This can be 
done in any point in time.



2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.


Since the base images and any snapshot which is not a leaf is marked as 
read only there is no such risk.




The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.


In case there were lots of changing for example additional 50GB changes 
it will take more time and there will be a performance hit.




One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.


Both solutions can serve for the same scenario:
With live snapshot the backup is done the following:

1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
   There are multiple ways for doing that -
   1. Use qemu-img and get the dirty blocks from former backup.
  - Currently qemu-img does not support it.
  - Nevertheless, such mechanism will work for lvm, btrfs, NetApp
   2. Mount the s0 image to another guest that runs traditional backup
  software at the file system level and let it do the backup.
4. Live merge s1-s0
   We'll use live copy for that so each write is duplicated (like your
   live backup solution).
5. Delete s1

As you can see, both approaches are very similar, while live snapshot is 
more general and not tied to backup specifically.




Thanks,
Jagane






Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-16 Thread Jagane Sundar

Hello Dor,

Let me see if I understand live snapshot correctly:
If I want to configure a VM for daily backup, then I would do
the following:
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
  copy over the snapshot s1, which is the incremental
   backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
  I would live merge s1 with s0, to create a new merged
  read-only image s1'.
- On day 2, I would create a new snapshot s3, then
   copy over s2, which is the incremental backup from
   s1' to s2
- And so on...

With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?

If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?

Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?

Please read below for more comments:

2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.

Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.


What happens when a VM host reboots while a live merge of s0
and s1 is being done?

The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.

In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.


Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.


One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.

Both solutions can serve for the same scenario:
With live snapshot the backup is done the following:

1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
 There are multiple ways for doing that -
 1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
 2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1-s0
 We'll use live copy for that so each write is duplicated (like your
 live backup solution).
5. Delete s1

As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.



As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a high penalty during normal operation
of the VM, whereas Livebackup results in this penalty being paid only
during the backup dirty block transfer operation.

Finally, I would like to bring up considerations of disk space. To expand on
my use case further, consider a Cloud Compute service with 100 VMs
running on a host. If live snapshot is used to create snapshot COW files,
then potentially each VM could grow the COW snapshot file to the size
of the base file, which means the VM host needs to reserve space for
the snapshot that equals the size of the VMs - i.e. a 8GB VM would
require an additional 8GB of space to be reserved for the snapshot,
so that the service provider could safely guarantee that the snapshot
will not run out of space.
Contrast this with livebackup, wherein the COW files are kept only when
the dirty block transfers are being done. This means that for a host with
100 VMs, if the backup server is connecting to each of the 100 qemu's
one by one and doing a livebackup, the service provider would need
to provision spare disk for at most the COW size of one VM.

Thanks,
Jagane




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-15 Thread Dor Laor

On 05/13/2011 06:16 AM, Jagane Sundar wrote:

On 5/12/2011 8:33 AM, Jes Sorensen wrote:

On 05/09/11 15:40, Dor Laor wrote:

Summary:
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.

Comments?

I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.

Cheers,
Jes

Thanks for the intro, Jes. I am very interested in garnering support for
Livebackup.

You are correct in that Livebackup solves some, but not all, problems in
the same space.

Some comments about my code: It took me about two months of development
before I connected with you on the list.
Initially, I started off by doing a dynamic block transfer such that
fewer and fewer blocks are dirty till there are no more dirty blocks and
we declare the backup complete. The problem with this approach was that
there was no real way to plug in a guest file system quiesce function. I
then moved on to the snapshot technique. With this snapshot technique I
am also able to test the livebackup function very thoroughly - I use a
technique where I create a LVM snapshot simultaneously, and do a cmp of
the LVM snapshot and the livebackup backup image.

With this mode of testing, I am very confident of the integrity of my
solution.

I chose to invent a new protocol that is very simple, and custom to
livebackup, because I needed livebackup specific functions such as
'create snapshot', 'delete snapshot', etc. Also, I am currently
implementing SSL based encryption with both client authenticating to
server and server authenticating to client using self signed certificate.
iSCSI or NBD would be more standards compliant, I suppose.


+1 that iScsi/NBD have better potential.



My high level goal is to make this a natural solution for Infrastructure
As A Cloud environments. I am looking carefully at integrating the
management of the Livebackup function into Openstack.


One important advantage of live snapshot over live backup is support of 
multiple (consecutive) live snapshots while there can be only a single 
live backup at one time.


This is why I tend to think that although live backup carry some benefit 
(no merge required), the live snapshot + live merge are more robust 
mechanism.




I would like to help in any way I can to make KVM be the *best* VM
technology for IaaS clouds.


:)



Thanks,
Jagane









Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-15 Thread Jagane Sundar

Hello Dor,

One important advantage of live snapshot over live backup is support of
multiple (consecutive) live snapshots while there can be only a single
live backup at one time.

This is why I tend to think that although live backup carry some benefit
(no merge required), the live snapshot + live merge are more robust
mechanism.



The two things that concern me regarding the
live snapshot/live merge approach are:
1. Performance considerations of having
multiple active snapshots?
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.

The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.

One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
  disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.

Thanks,
Jagane




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-12 Thread Marcelo Tosatti
On Mon, May 09, 2011 at 10:23:03AM -0500, Anthony Liguori wrote:
 On 05/09/2011 08:40 AM, Dor Laor wrote:
 No patch here (sorry) but collection of thoughts about these features
 and their potential building blocks. Please review (also on
 http://wiki.qemu.org/Features/LiveBlockMigration)
 
 Future qemu is expected to support these features (some already
 implemented):
 
 * Live block copy
 
 Ability to copy 1+ virtual disk from the source backing file/block
 device to a new target that is accessible by the host. The copy
 supposed to be executed while the VM runs in a transparent way.
 
 Status: code exists (by Marcelo) today in qemu but needs refactoring
 due to a race condition at the end of the copy operation. We agreed
 that a re-implementation of the copy operation should take place
 that makes sure the image is completely mirrored until management
 decides what copy to keep.
 
 Live block copy is growing on me.  It can actually be used (with an
 intermediate network storage) to do live block migration.
 
 
 * Live snapshots and live snapshot merge
 
 Live snapshot is already incorporated (by Jes) in qemu (still need
 qemu-agent work to freeze the guest FS).
 
 Live snapshot is unfortunately not really live.  It runs a lot of
 operations synchronously which will cause the guest to incur
 downtime.
 
 We really need to refactor it to truly be live.
 
 
 * Copy on read (image streaming)
 Ability to start guest execution while the parent image reside
 remotely and each block access is replicated to a local copy (image
 format snapshot)
 
 It should be nice to have a general mechanism that will be used for
 all image formats. What about the protocol to access these blocks
 over the net? We can reuse existing ones (nbd/iscsi).
 
 I think the image format is really the best place to have this
 logic. Of course, if we have live snapshot merge, we could use a
 temporary QED/QCOW2 file and then merge afterwards.
 
 * Using external dirty block bitmap
 
 FVD has an option to use external dirty block bitmap file in
 addition to the regular mapping/data files.
 
 We can consider using it for live block migration and live merge too.
 It can also allow additional usages of 3rd party tools to calculate
 diffs between the snapshots.
 There is a big down side thought since it will make management
 complicated and there is the risky of the image and its bitmap file
 get out of sync. It's much better choice to have qemu-img tool to be
 the single interface to the dirty block bitmap data.
 
 Does the dirty block bitmap need to exist outside of QEMU?
 
 IOW, if it goes away after a guest shuts down, is that problematic?
 
 I think it potentially greatly simplifies the problem which makes it
 appealing from my perspective.

One limitation of block copy is the need to rewrite data that differs
from the base image on every merge. But this is a limitation of qcow2
external snapshots represented as files, not block copy itself (with
external qcow2 snapshots, even a live block merge would require
potentially copying large amounts of data).

Only with snapshots internal to an image data copying can be avoided
(and depending on the scenario, this can be a nasty limitation).

 
 Regards,
 
 Anthony Liguori



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-12 Thread Jes Sorensen
On 05/09/11 15:40, Dor Laor wrote:
 Summary:
   * We need Marcelo's new (to come) block copy implementation
 * should work in parallel to migration and hotplug
   * General copy on read is desirable
   * Live snapshot merge to be implemented using block copy
   * Need to utilize a remote block access protocol (iscsi/nbd/other)
 Which one is the best?
   * Keep qemu-img the single interface for dirty block mappings.
   * Live block migration pre copy == live copy + block access protocol
 + live migration
   * Live block migration post copy == live migration + block access
 protocol/copy on read.
 
 Comments?

I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.

Cheers,
Jes



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-12 Thread Jes Sorensen
On 05/09/11 17:23, Anthony Liguori wrote:

 * Live snapshots and live snapshot merge

 Live snapshot is already incorporated (by Jes) in qemu (still need
 qemu-agent work to freeze the guest FS).
 
 Live snapshot is unfortunately not really live.  It runs a lot of
 operations synchronously which will cause the guest to incur downtime.
 
 We really need to refactor it to truly be live.

We keep having this discussion, but as pointed out in my last reply on
this, you can pre-create your image if you so desire. The actual
snapshot then becomes less in one command. Yes we can make it even
nicer, but what we have now is far less bad than you make it out to be.

Cheers,
Jes




Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-12 Thread Jagane Sundar

On 5/12/2011 8:33 AM, Jes Sorensen wrote:

On 05/09/11 15:40, Dor Laor wrote:

Summary:
   * We need Marcelo's new (to come) block copy implementation
 * should work in parallel to migration and hotplug
   * General copy on read is desirable
   * Live snapshot merge to be implemented using block copy
   * Need to utilize a remote block access protocol (iscsi/nbd/other)
 Which one is the best?
   * Keep qemu-img the single interface for dirty block mappings.
   * Live block migration pre copy == live copy + block access protocol
 + live migration
   * Live block migration post copy == live migration + block access
 protocol/copy on read.

Comments?

I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.

Cheers,
Jes
Thanks for the intro, Jes. I am very interested in garnering support for 
Livebackup.


You are correct in that Livebackup solves some, but not all, problems in 
the same space.


Some comments about my code: It took me about two months of development 
before I connected with you on the list.
Initially, I started off by doing a dynamic block transfer such that 
fewer and fewer blocks are dirty till there are no more dirty blocks and 
we declare the backup complete. The problem with this approach was that 
there was no real way to plug in a guest file system quiesce function. I 
then moved on to the snapshot technique. With this snapshot technique I 
am also able to test the livebackup function very thoroughly - I use a 
technique where I create a LVM snapshot simultaneously, and do a cmp of 
the LVM snapshot and the livebackup backup image.


With this mode of testing, I am very confident of the integrity of my 
solution.


I chose to invent a new protocol that is very simple, and custom to 
livebackup, because I needed livebackup specific functions such as 
'create snapshot', 'delete snapshot', etc. Also, I am currently 
implementing SSL based encryption with both client authenticating to 
server and server authenticating to client using self signed certificate.

iSCSI or NBD would be more standards compliant, I suppose.

My high level goal is to make this a natural solution for Infrastructure 
As A Cloud environments. I am looking carefully at integrating the 
management of the Livebackup function into Openstack.


I would like to help in any way I can to make KVM be the *best* VM 
technology for IaaS clouds.


Thanks,
Jagane






Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-10 Thread Marcelo Tosatti
On Mon, May 09, 2011 at 04:40:00PM +0300, Dor Laor wrote:
 No patch here (sorry) but collection of thoughts about these
 features and their potential building blocks. Please review (also on
 http://wiki.qemu.org/Features/LiveBlockMigration)
 
 Future qemu is expected to support these features (some already
 implemented):
 
  * Live block copy
 
Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.
 
Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.
 
  * Live snapshots and live snapshot merge
 
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
 
Live snapshot merge is required in order of reducing the overhead
caused by the additional snapshots (sometimes over raw device).
Currently not implemented for a live running guest
 
Possibility: enhance live copy to be used for live snapshot merge.
 It is almost the same mechanism.

The idea is to use live block copy to perform snapshot live merges.
The advantage is the simplicity, since there is no need to synchronize
between live merge writes and guest writes.

With live copy the guest is either using the old image or the new copy,
so crash handling is relatively simple.

  * Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)
 
It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).
 
Such functionality can be hooked together with live block migration
instead of the 'post copy' method.
 
  * Live block migration (pre/post)
 
Beyond live block copy we'll sometimes need to move both the storage
and the guest. There are two main approached here:
- pre copy
  First live copy the image and only then live migration the VM.
  It is simple but if the purpose of the whole live block migration
  was to balance the cpu load, it won't be practical to use since
  copying an image of 100GB will take too long.
- post copy
  First live migrate the VM, then live copy it's blocks.
  It's better approach for HA/load balancing but it might make
  management complex (need to keep the source VM alive, what happens
  on failures?)
  Using copy on read might simplify it -
  post copy = live snapshot + copy on read.
 
In addition there are two cases for the storage access:
1. The source block device is shared and can be easily accessed by
   the destination qemu-kvm process.
   That's the easy case, no special protocol needed for the block
   devices copying.
2. There is no shared storage at all.
   This means we should implement a block access protocol over the
   live migration fd :(
 
   We need to chose whether to implement a new one, or re-use NBD or
   iScsi (targetinitiator)
 
  * Using external dirty block bitmap
 
FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.
 
We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.
 
 Summary:
   * We need Marcelo's new (to come) block copy implementation
 * should work in parallel to migration and hotplug
   * General copy on read is desirable
   * Live snapshot merge to be implemented using block copy
   * Need to utilize a remote block access protocol (iscsi/nbd/other)
 Which one is the best?
   * Keep qemu-img the single interface for dirty block mappings.
   * Live block migration pre copy == live copy + block access protocol
 + live migration
   * Live block migration post copy == live migration + block access
 protocol/copy on read.
 
 Comments?
 
 Regards,
 Dor



[Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-09 Thread Dor Laor
No patch here (sorry) but collection of thoughts about these features 
and their potential building blocks. Please review (also on 
http://wiki.qemu.org/Features/LiveBlockMigration)


Future qemu is expected to support these features (some already 
implemented):


 * Live block copy

   Ability to copy 1+ virtual disk from the source backing file/block
   device to a new target that is accessible by the host. The copy
   supposed to be executed while the VM runs in a transparent way.

   Status: code exists (by Marcelo) today in qemu but needs refactoring
   due to a race condition at the end of the copy operation. We agreed
   that a re-implementation of the copy operation should take place
   that makes sure the image is completely mirrored until management
   decides what copy to keep.

 * Live snapshots and live snapshot merge

   Live snapshot is already incorporated (by Jes) in qemu (still need
   qemu-agent work to freeze the guest FS).

   Live snapshot merge is required in order of reducing the overhead
   caused by the additional snapshots (sometimes over raw device).
   Currently not implemented for a live running guest

   Possibility: enhance live copy to be used for live snapshot merge.
It is almost the same mechanism.

 * Copy on read (image streaming)
   Ability to start guest execution while the parent image reside
   remotely and each block access is replicated to a local copy (image
   format snapshot)

   It should be nice to have a general mechanism that will be used for
   all image formats. What about the protocol to access these blocks
   over the net? We can reuse existing ones (nbd/iscsi).

   Such functionality can be hooked together with live block migration
   instead of the 'post copy' method.

 * Live block migration (pre/post)

   Beyond live block copy we'll sometimes need to move both the storage
   and the guest. There are two main approached here:
   - pre copy
 First live copy the image and only then live migration the VM.
 It is simple but if the purpose of the whole live block migration
 was to balance the cpu load, it won't be practical to use since
 copying an image of 100GB will take too long.
   - post copy
 First live migrate the VM, then live copy it's blocks.
 It's better approach for HA/load balancing but it might make
 management complex (need to keep the source VM alive, what happens
 on failures?)
 Using copy on read might simplify it -
 post copy = live snapshot + copy on read.

   In addition there are two cases for the storage access:
   1. The source block device is shared and can be easily accessed by
  the destination qemu-kvm process.
  That's the easy case, no special protocol needed for the block
  devices copying.
   2. There is no shared storage at all.
  This means we should implement a block access protocol over the
  live migration fd :(

  We need to chose whether to implement a new one, or re-use NBD or
  iScsi (targetinitiator)

 * Using external dirty block bitmap

   FVD has an option to use external dirty block bitmap file in
   addition to the regular mapping/data files.

   We can consider using it for live block migration and live merge too.
   It can also allow additional usages of 3rd party tools to calculate
   diffs between the snapshots.
   There is a big down side thought since it will make management
   complicated and there is the risky of the image and its bitmap file
   get out of sync. It's much better choice to have qemu-img tool to be
   the single interface to the dirty block bitmap data.

Summary:
  * We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
  * General copy on read is desirable
  * Live snapshot merge to be implemented using block copy
  * Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
  * Keep qemu-img the single interface for dirty block mappings.
  * Live block migration pre copy == live copy + block access protocol
+ live migration
  * Live block migration post copy == live migration + block access
protocol/copy on read.

Comments?

Regards,
Dor



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-09 Thread Anthony Liguori

On 05/09/2011 08:40 AM, Dor Laor wrote:

No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)

Future qemu is expected to support these features (some already
implemented):

* Live block copy

Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.

Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.


Live block copy is growing on me.  It can actually be used (with an 
intermediate network storage) to do live block migration.




* Live snapshots and live snapshot merge

Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).


Live snapshot is unfortunately not really live.  It runs a lot of 
operations synchronously which will cause the guest to incur downtime.


We really need to refactor it to truly be live.



* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)

It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).


I think the image format is really the best place to have this logic. 
Of course, if we have live snapshot merge, we could use a temporary 
QED/QCOW2 file and then merge afterwards.



* Using external dirty block bitmap

FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.

We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.


Does the dirty block bitmap need to exist outside of QEMU?

IOW, if it goes away after a guest shuts down, is that problematic?

I think it potentially greatly simplifies the problem which makes it 
appealing from my perspective.


Regards,

Anthony Liguori



Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

2011-05-09 Thread Dor Laor

On 05/09/2011 06:23 PM, Anthony Liguori wrote:

On 05/09/2011 08:40 AM, Dor Laor wrote:

No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)

Future qemu is expected to support these features (some already
implemented):

* Live block copy

Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.

Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.


Live block copy is growing on me. It can actually be used (with an
intermediate network storage) to do live block migration.


I'm not sure that we can relay on such storage. While it looks that 
anyway can get such temporal storage, it makes failure cases complex, it 
will need additional locking, security permissions, etc.


That said, the main gap is the block copy protocol and using qemu as 
iScsi target/initiator might be a good solution.






* Live snapshots and live snapshot merge

Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).


Live snapshot is unfortunately not really live. It runs a lot of
operations synchronously which will cause the guest to incur downtime.

We really need to refactor it to truly be live.


Well live migration is not really live too.
It can be thought as implementation details and improved later on.




* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)

It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).


I think the image format is really the best place to have this logic. Of
course, if we have live snapshot merge, we could use a temporary
QED/QCOW2 file and then merge afterwards.


* Using external dirty block bitmap

FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.

We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.


Does the dirty block bitmap need to exist outside of QEMU?

IOW, if it goes away after a guest shuts down, is that problematic?


I admit I didn't give it enough thought, I think that sharing the code 
w/ qemu-img should be enough for us. If we have a live block operation 
and suddenly the guest shuts down in the middle we need to finish the 
block copy.




I think it potentially greatly simplifies the problem which makes it
appealing from my perspective.

Regards,

Anthony Liguori