Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)
hi, after watching Johns slides from the kvm forum (thanks for that) i had some quick look at the backup-v3 branch. Just to provide some feebdack for you guys, and some questions. My main question is about the part of the NBD backup. By default what you get from reading all the NDB data is a thick provisioned image of the domains disk. One can use the `qemu-img map' function to get a detailed information about the used blocks in the image, in case one wants to create a thin provisioned backup. As a third party backup vendor you cannot allways depend on qemu tools, because you might not even install any software on the host you are taking a backup from. So is, or will there be any way to get an output that represents the same information from the map function in the backup XML description via the libvirt API? Would it make sense to provide this information in the `backup-dumpxml' output? >From what i know in the Citrix XEN implementation, they provide a way to read this information via the API, because they do not want the backup vendor to install any component on the host systems. Another thing i came across is that libvirt currently seems to forget about the running backup job if a domain is destroyed and started after a backup job was created: [root@x ~]# virsh backup-begin centos backup-pull.xml Backup id 1 started backup used description from 'backup-pull.xml' [root@x ~]# virsh destroy centos && virsh start centos [root@x ~]# virsh backup-end --id 1 centos error: Requested operation is not valid: No active block job 'tmp-hda' [root@x ~]# virsh backup-dumpxml --id 1 centos thanks for your hard work on this! bye, - michael -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)
On 10/9/18 8:29 AM, Nir Soffer wrote: On Fri, Oct 5, 2018 at 7:58 AM Eric Blake wrote: On 10/4/18 12:05 AM, Eric Blake wrote: The following (long) email describes a portion of the work-flow of how my proposed incremental backup APIs will work, along with the backend QMP commands that each one executes. I will reply to this thread with further examples (the first example is long enough to be its own email). This is an update to a thread last posted here: https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html More to come in part 2. - Second example: a sequence of incremental backups via pull model In the first example, we did not create a checkpoint at the time of the full pull. That means we have no way to track a delta of changes since that point in time. Why do we want to support backup without creating a checkpoint? Fleecing. If you want to examine a portion of the disk at a given point in time, then kicking off a pull model backup gives you access to the state of the disk at that time, and your actions are transient. Ending the job when you are done with the fleece cleans up everything needed to perform the fleece operation, and since you did not intend to capture a full (well, a complete) incremental backup, but were rather grabbing just a subset of the disk, you really don't want that point in time to be recorded as a new checkpoint. Also, incremental backups (which are what require checkpoints) are limited to qcow2 disks, but full backups can be performed on any format (including raw disks). If you have a guest that does not use qcow2 disks, you can perform a full backup, but cannot create a checkpoint. If we don't have any real use case, I suggest to always require a checkpoint. But we do have real cases for backup without checkpoint. Let's repeat the full backup (reusing the same backup.xml from before), but this time, we'll add a new parameter, a second XML file for describing the checkpoint we want to create. Actually, it was easy enough to get virsh to write the XML for me (because it was very similar to existing code in virsh that creates XML for snapshot creation): $ $virsh checkpoint-create-as --print-xml $dom check1 testing \ --diskspec sdc --diskspec sdd | tee check1.xml check1 We should use an id, not a name, even of name is name is also unique like in most libvirt apis. In RHV we will use always use a UUID for this. Nothing prevents you from using a UUID as your name. But this particular choice of XML () matches what already exists in the snapshot XML. testing I had to supply two --diskspec arguments to virsh to select just the two qcow2 disks that I am using in my example (rather than every disk in the domain, which is the default when is not present). So is valid configuration, selecting all disks, or not having "disks" element selects all disks? It's about a one-line change to get whichever behavior you find more useful. Right now, I'm leaning towards: omitted == backup all disks, present: you MUST have at least one subelement that explicitly requests a checkpoint (because any omitted when is present is skipped). A checkpoint only makes sense as long as there is at least one disk to create a checkpoint with. But I could also go with: omitted == backup all disks, present but subelements missing: the missing elements default to being backed up, and you have to explicitly provide checkpoint='no'> to skip a particular disk. Or even: omitted, or present but subelements missing: the missing elements defer to the hypervisor for their default state, and the qemu hypervisor defaults to qcow2 disks being backed up/checkpointed and to non-qcow2 disks being omitted. But this latter one feels like more magic, which is harder to document and liable to go wrong. A stricter version would be is mandatory, and no subelement can be missing (or else the API fails because you weren't explicit in your choice). But that's rather strict, especially since existing snapshots XML handling is not that strict. I also picked a name (mandatory) and description (optional) to be associated with the checkpoint. The backup.xml file that we plan to reuse still mentions scratch1.img and scratch2.img as files needed for staging the pull request. However, any contents in those files could interfere with our second backup (after all, every cluster written into that file from the first backup represents a point in time that was frozen at the first backup; but our second backup will want to read the data as the guest sees it now rather than what it was at the first backup), so we MUST regenerate the scratch files. (Perhaps I should have just deleted them at the end of example 1 in my previous email, had I remembered when typing that mail). $ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img $ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img Now, to
Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)
On Fri, Oct 5, 2018 at 7:58 AM Eric Blake wrote: > On 10/4/18 12:05 AM, Eric Blake wrote: > > The following (long) email describes a portion of the work-flow of how > > my proposed incremental backup APIs will work, along with the backend > > QMP commands that each one executes. I will reply to this thread with > > further examples (the first example is long enough to be its own email). > > This is an update to a thread last posted here: > > https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html > > > > > More to come in part 2. > > > > - Second example: a sequence of incremental backups via pull model > > In the first example, we did not create a checkpoint at the time of the > full pull. That means we have no way to track a delta of changes since > that point in time. Why do we want to support backup without creating a checkpoint? If we don't have any real use case, I suggest to always require a checkpoint. > Let's repeat the full backup (reusing the same > backup.xml from before), but this time, we'll add a new parameter, a > second XML file for describing the checkpoint we want to create. > > Actually, it was easy enough to get virsh to write the XML for me > (because it was very similar to existing code in virsh that creates XML > for snapshot creation): > > $ $virsh checkpoint-create-as --print-xml $dom check1 testing \ > --diskspec sdc --diskspec sdd | tee check1.xml > >check1 > We should use an id, not a name, even of name is name is also unique like in most libvirt apis. In RHV we will use always use a UUID for this. >testing > > > > > > > I had to supply two --diskspec arguments to virsh to select just the two > qcow2 disks that I am using in my example (rather than every disk in the > domain, which is the default when is not present). So is valid configuration, selecting all disks, or not having "disks" element selects all disks? > I also picked > a name (mandatory) and description (optional) to be associated with the > checkpoint. > > The backup.xml file that we plan to reuse still mentions scratch1.img > and scratch2.img as files needed for staging the pull request. However, > any contents in those files could interfere with our second backup > (after all, every cluster written into that file from the first backup > represents a point in time that was frozen at the first backup; but our > second backup will want to read the data as the guest sees it now rather > than what it was at the first backup), so we MUST regenerate the scratch > files. (Perhaps I should have just deleted them at the end of example 1 > in my previous email, had I remembered when typing that mail). > > $ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img > $ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img > > Now, to begin the full backup and create a checkpoint at the same time. > Also, this time around, it would be nice if the guest had a chance to > freeze I/O to the disks prior to the point chosen as the checkpoint. > Assuming the guest is trusted, and running the qemu guest agent (qga), > we can do that with: > > $ $virsh fsfreeze $dom > $ $virsh backup-begin $dom backup.xml check1.xml > Backup id 1 started > backup used description from 'backup.xml' > checkpoint used description from 'check1.xml' > $ $virsh fsthaw $dom > Great, this answer my (unsent) question about freeze/thaw from part 1 :-) > > and eventually, we may decide to add a VIR_DOMAIN_BACKUP_BEGIN_QUIESCE > flag to combine those three steps into a single API (matching what we've > done on some other existing API). In other words, the sequence of QMP > operations performed during virDomainBackupBegin are quick enough that > they won't stall a freeze operation (at least Windows is picky if you > stall a freeze operation longer than 10 seconds). > We use fsFreeze/fsThaw directly in RHV since we need to support external snapshots (e.g. ceph), so we don't need this functionality, but it sounds good idea to make it work like snapshot. > > The tweaked $virsh backup-begin now results in a call to: > virDomainBackupBegin(dom, "", > " and in turn libvirt makes a similar sequence of QMP calls as before, > with a slight modification in the middle: > {"execute":"nbd-server-start",... > {"execute":"blockdev-add",... > This does not work yet for network disks like "rbd" and "glusterfs" does it mean that they will not be supported for backup? > {"execute":"transaction", > "arguments":{"actions":[ >{"type":"blockdev-backup", "data":{ > "device":"$node1", "target":"backup-sdc", "sync":"none", > "job-id":"backup-sdc" }}, >{"type":"blockdev-backup", "data":{ > "device":"$node2", "target":"backup-sdd", "sync":"none", > "job-id":"backup-sdd" }} >{"type":"block-dirty-bitmap-add", "data":{ > "node":"$node1", "name":"check1", "persistent":true}}, >{"type":"block-dirty-bitmap-add", "data":{ > "node":"$node2", "name":"check1", "persistent":true
Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)
On 10/4/18 12:05 AM, Eric Blake wrote: The following (long) email describes a portion of the work-flow of how my proposed incremental backup APIs will work, along with the backend QMP commands that each one executes. I will reply to this thread with further examples (the first example is long enough to be its own email). This is an update to a thread last posted here: https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html More to come in part 2. - Second example: a sequence of incremental backups via pull model In the first example, we did not create a checkpoint at the time of the full pull. That means we have no way to track a delta of changes since that point in time. Let's repeat the full backup (reusing the same backup.xml from before), but this time, we'll add a new parameter, a second XML file for describing the checkpoint we want to create. Actually, it was easy enough to get virsh to write the XML for me (because it was very similar to existing code in virsh that creates XML for snapshot creation): $ $virsh checkpoint-create-as --print-xml $dom check1 testing \ --diskspec sdc --diskspec sdd | tee check1.xml check1 testing I had to supply two --diskspec arguments to virsh to select just the two qcow2 disks that I am using in my example (rather than every disk in the domain, which is the default when is not present). I also picked a name (mandatory) and description (optional) to be associated with the checkpoint. The backup.xml file that we plan to reuse still mentions scratch1.img and scratch2.img as files needed for staging the pull request. However, any contents in those files could interfere with our second backup (after all, every cluster written into that file from the first backup represents a point in time that was frozen at the first backup; but our second backup will want to read the data as the guest sees it now rather than what it was at the first backup), so we MUST regenerate the scratch files. (Perhaps I should have just deleted them at the end of example 1 in my previous email, had I remembered when typing that mail). $ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img $ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img Now, to begin the full backup and create a checkpoint at the same time. Also, this time around, it would be nice if the guest had a chance to freeze I/O to the disks prior to the point chosen as the checkpoint. Assuming the guest is trusted, and running the qemu guest agent (qga), we can do that with: $ $virsh fsfreeze $dom $ $virsh backup-begin $dom backup.xml check1.xml Backup id 1 started backup used description from 'backup.xml' checkpoint used description from 'check1.xml' $ $virsh fsthaw $dom and eventually, we may decide to add a VIR_DOMAIN_BACKUP_BEGIN_QUIESCE flag to combine those three steps into a single API (matching what we've done on some other existing API). In other words, the sequence of QMP operations performed during virDomainBackupBegin are quick enough that they won't stall a freeze operation (at least Windows is picky if you stall a freeze operation longer than 10 seconds). The tweaked $virsh backup-begin now results in a call to: virDomainBackupBegin(dom, "", "and in turn libvirt makes a similar sequence of QMP calls as before, with a slight modification in the middle: {"execute":"nbd-server-start",... {"execute":"blockdev-add",... {"execute":"transaction", "arguments":{"actions":[ {"type":"blockdev-backup", "data":{ "device":"$node1", "target":"backup-sdc", "sync":"none", "job-id":"backup-sdc" }}, {"type":"blockdev-backup", "data":{ "device":"$node2", "target":"backup-sdd", "sync":"none", "job-id":"backup-sdd" }} {"type":"block-dirty-bitmap-add", "data":{ "node":"$node1", "name":"check1", "persistent":true}}, {"type":"block-dirty-bitmap-add", "data":{ "node":"$node2", "name":"check1", "persistent":true}} ]}} {"execute":"nbd-server-add",... The only change was adding more actions to the "transaction" command - in addition to kicking off the fleece image in the scratch nodes, it ALSO added a persistent bitmap to each of the original images, to track all changes made after the point of the transaction. The bitmaps are persistent - at this point (well, it's better if you wait until after backup-end), you could shut the guest down and restart it, and libvirt will still remember that the checkpoint exists, and qemu will continue track guest writes via the bitmap. However, the backup job itself is currently live-only, and shutting down the guest while a backup operation is in effect will lose track of the backup job. What that really means is that if the guest shuts down, your current backup job is hosed (you cannot ever get back the point-in-time data from your API request - as your next API request will be a new point in time) - but you have not permanently ruined the guest, and your recov