Re: [libvirt] [RFC v3] external (pull) backup API
On 05/17/2018 05:43 PM, Eric Blake wrote: Here's my updated counterproposal for a backup API. /** * virDomainBackupBegin: * @domain: a domain object * @diskXml: description of storage to utilize and expose during * the backup, or NULL * @checkpointXml: description of a checkpoint to create, or NULL * @flags: not used yet, pass 0 * Actually, since I'm taking two XML documents, this should really have a VIR_DOMAIN_BACKUP_VALIDATE flag for comparison of the XML against the schema. /** * virDomainCheckpointCreateXML: * @domain: a domain object * @xmlDesc: description of the checkpoint to create * @flags: bitwise-OR of supported virDomainCheckpointCreateFlags * */ virDomainCheckpointPtr virDomainCheckpointCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags); Ditto. And this was copied from virDomainSnapshotCreateXML, which should also gain a VALIDATE flag. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v3] external (pull) backup API
On 05/17/2018 05:43 PM, Eric Blake wrote: Here's my updated counterproposal for a backup API. /** * virDomainBackupBegin: * * There are two fundamental backup approaches. The first, called a * push model, instructs the hypervisor to copy the state of the guest * disk to the designated storage destination (which may be on the * local file system or a network device); in this mode, the * hypervisor writes the content of the guest disk to the destination, * then emits VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 when the backup is * either complete or failed (the backup image is invalid if the job * is ended prior to the event being emitted). Better is VIR_DOMAIN_EVENT_ID_JOB_COMPLETED (BLOCK_JOB can only inform status about one disk, while this is intended to inform about multiple disks done in a single transaction). I'm a bit depressed at our technical debt in this area: virDomainGetJobStats() and virDomainAbortJob() don't take a job id, but only operate on the most recently started job, but I did mention elsewhere in my plans: I think that it should be possible to run multiple backup operations in parallel in the long run. But in the interest of getting a proof of concept implementation out quickly, it's easier to state that for the initial implementation, libvirt supports at most one backup operation at a time (to do another backup, you have to wait for the current one to complete, or else abort and abandon the current one). As there is only one backup job running at a time, the existing virDomainGetJobInfo()/virDomainGetJobStats() will be able to report statistics about the job (insofar as such statistics are available). But in preparation for the future, when libvirt does add parallel job support, starting a backup job will return a job id; and presumably we'd add a new virDomainGetJobStatsByID() for grabbing statistics of an arbitrary (rather than the most-recently-started) job. Since live migration also acts as a job visible through virDomainGetJobStats(), I'm going to treat an active backup job and live migration as mutually exclusive. This is particularly true when we have a pull model backup ongoing: if qemu on the source is acting as an NBD server, you can't migrate away from that qemu and tell the NBD client to reconnect to the NBD server on the migration destination. So, to perform a migration, you have to cancel any pending backup operations. Conversely, if a migration job is underway, it will not be possible to start a new backup job until migration completes. However, we DO need to modify migration to ensure that any persistent bitmaps are migrated. Yes, this means that virDomainBackupEnd() (which takes a job id) and virDomainJobAbort() (which does not, but until we support parallel backup jobs or a mix of backup and migration at once, it does not matter) can initially both do the work of aborting a backup job. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v3] external (pull) backup API
On 05/17/2018 05:43 PM, Eric Blake wrote: Here's my updated counterproposal for a backup API. /** * virDomainBackupEnd: * @domain: a domain object * @id: the id of an active backup job previously started with * virDomainBackupBegin() * @flags: bitwise-OR of supported virDomainBackupEndFlags * * Conclude a point-in-time backup job @id on the given domain. * * If the backup job uses the push model, but the event marking that * all data has been copied has not yet been emitted, then the command * fails unless @flags includes VIR_DOMAIN_BACKUP_END_ABORT. If the * event has been issued, or if the backup uses the pull model, the * flag has no effect. * * Returns 0 on success and -1 on failure. */ int virDomainBackupEnd(virDomainPtr domain, int id, unsigned int flags); For this API, I'm considering a tri-state return, 1 if the backup job completed successfully (in the push model, the backup destination file is usable); 0 if the backup job was aborted (only possible if VIR_DOMAIN_BACKUP_END_ABORT was passed, the backup destination file is untrustworthy); and -1 on failure. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v3] external (pull) backup API
On Fri, May 25, 2018 at 10:26:12 -0500, Eric Blake wrote: > On 05/17/2018 05:43 PM, Eric Blake wrote: > > Here's my updated counterproposal for a backup API. > > > > In comparison to v2 posted by Nikolay: > > https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html > > - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a > > "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End" > > - flesh out more API descriptions > > - better documentation of proposed XML, for both checkpoints and backup > > > > Barring any major issues turned up during review, I've already starting > > to code this into libvirt with a goal of getting an implementation ready > > for review this month. > > > > > // Many additional functions copying heavily from virDomainSnapshot*: > > > > virDomainCheckpointList(virDomainPtr domain, > > virDomainCheckpointPtr **checkpoints, > > unsigned int flags); > > > > > > > int > > virDomainCheckpointListChildren(virDomainCheckpointPtr checkpoint, > > virDomainCheckpointPtr **children, > > unsigned int flags); > > > > Notably, none of the older racy list functions, like > > virDomainSnapshotNum, virDomainSnapshotNumChildren, or > > virDomainSnapshotListChildrenNames; also, for now, there is no revert > > support like virDomainSnapshotRevert. > > I'm finding it easier to understand if I name these: > > virDomainListCheckpoints() (find checkpoints relative to a domain) > virDomainCheckpointListChildren() (find children relative to a checkpoint) If you are going to name them "checkpoints" here we should first rename "snapshots with memory" in our docs since we refer to them as checkpoints. We refer to disk-only snapshots as snapshots and wanted to emphasize the difference. > > The counterpart Snapshot API used virDomainListAllSnapshots(); the term > 'All' was present because it was added after the initial racy > virDomainSnapshotNum(), but as we are avoiding the racy API here we can skip > it from the beginning. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v3] external (pull) backup API
On 05/17/2018 05:43 PM, Eric Blake wrote: Here's my updated counterproposal for a backup API. In comparison to v2 posted by Nikolay: https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End" - flesh out more API descriptions - better documentation of proposed XML, for both checkpoints and backup Barring any major issues turned up during review, I've already starting to code this into libvirt with a goal of getting an implementation ready for review this month. // Many additional functions copying heavily from virDomainSnapshot*: virDomainCheckpointList(virDomainPtr domain, virDomainCheckpointPtr **checkpoints, unsigned int flags); int virDomainCheckpointListChildren(virDomainCheckpointPtr checkpoint, virDomainCheckpointPtr **children, unsigned int flags); Notably, none of the older racy list functions, like virDomainSnapshotNum, virDomainSnapshotNumChildren, or virDomainSnapshotListChildrenNames; also, for now, there is no revert support like virDomainSnapshotRevert. I'm finding it easier to understand if I name these: virDomainListCheckpoints() (find checkpoints relative to a domain) virDomainCheckpointListChildren() (find children relative to a checkpoint) The counterpart Snapshot API used virDomainListAllSnapshots(); the term 'All' was present because it was added after the initial racy virDomainSnapshotNum(), but as we are avoiding the racy API here we can skip it from the beginning. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v3] external (pull) backup API
22.05.2018 01:03, Eric Blake wrote: On 05/21/2018 10:52 AM, Vladimir Sementsov-Ogievskiy wrote: 18.05.2018 01:43, Eric Blake wrote: Here's my updated counterproposal for a backup API. [...] Representing things on a timeline, when a guest is first created, there is no dirty bitmap; later, the checkpoint "check1" is created, which in turn creates "bitmap1" in the qcow2 image for all changes past that point; when a second checkmark "check2" is created, a qemu transaction is used to create and enable the new "bitmap2" bitmap at the same time as disabling "bitmap1" bitmap. (Actually, it's probably easier to name the bitmap in the qcow2 file with the same name as the Checkpoint object being tracked in libvirt, but for discussion purposes, it's less confusing if I use separate names for now.) creation ... check1 ... check2 ... active no bitmap bitmap1 bitmap2 When a user wants to create a backup, they select which point in time the backup starts from; the default value NULL represents a full backup (all content since disk creation to the point in time of the backup call, no bitmap is needed, use sync=full for push model or sync=none for the pull model); any other value represents the name of a checkpoint to use as an incremental backup (all content from the checkpoint to the point in time of the backup call; libvirt forms a temporary bitmap as needed, the uses sync=incremental for push model or sync=none plus exporting the bitmap for the pull model). For example, requesting an incremental backup from "check2" can just reuse "bitmap2", but requesting an incremental backup from "check1" requires the computation of the bitmap containing the union of "bitmap1" and "bitmap2". I have a bit of criticism on this part, exactly on ability to create a backup not from last checkpoint but from any from the past. For this ability we are implementing the whole api with checkpoints, we are going to store several bitmaps in Qemu (and possibly going to implement checkpoints in Qemu in future). But personally, I don't know any real and adequate use cases for this ability. I heard about the following cases: 1. Incremental restore: we want to rollback to some point in time (some element in incremental backup chain), and don't want to copy all the data, but only changed. - It's not real case, because information about dirtiness is already in backup chain: we just need to find allocated areas and copy them + we should copy areas, corresponding to dirty bits in active dirty bitmap in Qemu. If you do a pull mode backup (where the dirty bitmaps were exported over NBD), then yes, you can assume that the third-party app reading the backup data also saved the dirty bitmap in whatever form it likes, so that it only ever has to pull data from the most recent checkpoint and can reconstruct the union of changes from an earlier checkpoint offline without qemu help. But for a push mode backup (where qemu does the pushing), there is no way to expose the dirty bitmap of what the backup contains, unless you backup to something like a qcow2 image and track which clusters in the backup image were allocated as a result of the backup operation. So having a way in the libvirt API to grab an incremental backup from earlier than the most recent checkpoint may not be needed by everyone, but I don't see a problem in implementing it either. If we have a chain of incrementals from push backup, it should be possible to analyze their block status, so it should be something like qcow2. Otherwise we can't create a chain of backing files. If we backup incrementals to the same file, we anyway can't restore to some previous point except the last one. 2. Several backup solutions backing up the same vm - Ok, if we implement checkpoints, instead of maintaining several active dirty bitmaps, we can have only one active bitmap and others disabled, which lead to performance gain and possibility to save RAM space (if we unload disabled bitmaps from RAM to qcow2). But what are real cases? What is the real benefit? I doubt that somebody will use more than 2 - 3 different backup providers on same vm, so is it worth implementing such a big feature for this? It of course worth doing if we have 100 independent backup providers. Note: the word "independent" is important here. For example it may be two external backup tools, managed by different subsystems or different people or something like this. If we are just doing a backup weekly + daily, actually, we can synchronize them, so that weekly backup will be a merge of last 7 daily backups, so weekly backup don't need personal active dirty bitmap and even backup operation. I'm not sure if this is a complaint that libvirt should allow more than one active bitmap at a time, vs. having exactly one active bitmap at a time and then reconstructing bitmaps over larger sequences of time as needed. But does that change the API that libvi
Re: [libvirt] [RFC v3] external (pull) backup API
On 05/21/2018 10:52 AM, Vladimir Sementsov-Ogievskiy wrote: 18.05.2018 01:43, Eric Blake wrote: Here's my updated counterproposal for a backup API. [...] Representing things on a timeline, when a guest is first created, there is no dirty bitmap; later, the checkpoint "check1" is created, which in turn creates "bitmap1" in the qcow2 image for all changes past that point; when a second checkmark "check2" is created, a qemu transaction is used to create and enable the new "bitmap2" bitmap at the same time as disabling "bitmap1" bitmap. (Actually, it's probably easier to name the bitmap in the qcow2 file with the same name as the Checkpoint object being tracked in libvirt, but for discussion purposes, it's less confusing if I use separate names for now.) creation ... check1 ... check2 ... active no bitmap bitmap1 bitmap2 When a user wants to create a backup, they select which point in time the backup starts from; the default value NULL represents a full backup (all content since disk creation to the point in time of the backup call, no bitmap is needed, use sync=full for push model or sync=none for the pull model); any other value represents the name of a checkpoint to use as an incremental backup (all content from the checkpoint to the point in time of the backup call; libvirt forms a temporary bitmap as needed, the uses sync=incremental for push model or sync=none plus exporting the bitmap for the pull model). For example, requesting an incremental backup from "check2" can just reuse "bitmap2", but requesting an incremental backup from "check1" requires the computation of the bitmap containing the union of "bitmap1" and "bitmap2". I have a bit of criticism on this part, exactly on ability to create a backup not from last checkpoint but from any from the past. For this ability we are implementing the whole api with checkpoints, we are going to store several bitmaps in Qemu (and possibly going to implement checkpoints in Qemu in future). But personally, I don't know any real and adequate use cases for this ability. I heard about the following cases: 1. Incremental restore: we want to rollback to some point in time (some element in incremental backup chain), and don't want to copy all the data, but only changed. - It's not real case, because information about dirtiness is already in backup chain: we just need to find allocated areas and copy them + we should copy areas, corresponding to dirty bits in active dirty bitmap in Qemu. If you do a pull mode backup (where the dirty bitmaps were exported over NBD), then yes, you can assume that the third-party app reading the backup data also saved the dirty bitmap in whatever form it likes, so that it only ever has to pull data from the most recent checkpoint and can reconstruct the union of changes from an earlier checkpoint offline without qemu help. But for a push mode backup (where qemu does the pushing), there is no way to expose the dirty bitmap of what the backup contains, unless you backup to something like a qcow2 image and track which clusters in the backup image were allocated as a result of the backup operation. So having a way in the libvirt API to grab an incremental backup from earlier than the most recent checkpoint may not be needed by everyone, but I don't see a problem in implementing it either. 2. Several backup solutions backing up the same vm - Ok, if we implement checkpoints, instead of maintaining several active dirty bitmaps, we can have only one active bitmap and others disabled, which lead to performance gain and possibility to save RAM space (if we unload disabled bitmaps from RAM to qcow2). But what are real cases? What is the real benefit? I doubt that somebody will use more than 2 - 3 different backup providers on same vm, so is it worth implementing such a big feature for this? It of course worth doing if we have 100 independent backup providers. Note: the word "independent" is important here. For example it may be two external backup tools, managed by different subsystems or different people or something like this. If we are just doing a backup weekly + daily, actually, we can synchronize them, so that weekly backup will be a merge of last 7 daily backups, so weekly backup don't need personal active dirty bitmap and even backup operation. I'm not sure if this is a complaint that libvirt should allow more than one active bitmap at a time, vs. having exactly one active bitmap at a time and then reconstructing bitmaps over larger sequences of time as needed. But does that change the API that libvirt should expose to end users, or can it just be an implementation detail? 3. Some of backups in incremental backup chain are lost, and we want to recreate part of the chain as a new backup, instead of just dropping all chain and create full backup. In this case, I can say the following: disabled bitmaps (~ all checkpoints except the last one)
Re: [libvirt] [RFC v3] external (pull) backup API
On 05/18/2018 02:56 AM, Daniel P. Berrangé wrote: On Thu, May 17, 2018 at 05:43:37PM -0500, Eric Blake wrote: Here's my updated counterproposal for a backup API. In comparison to v2 posted by Nikolay: https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End" - flesh out more API descriptions - better documentation of proposed XML, for both checkpoints and backup Barring any major issues turned up during review, I've already starting to code this into libvirt with a goal of getting an implementation ready for review this month. I think the key thing missing from the docs is some kind of explanation about the difference between a backup, and checkpoint and a snapshot. I'll admit I've not read the mail in detail, but at a high level it is not immediately obvious what the difference is & thus which APIs I would want to be using for a given scenario. Indeed, and that's a fair complaint. Here's a first draft, that I'll have to polish into a formal html document that both the snapshot and checkpoint/backup pages refer to (or maybe I merge snapshots and checkpoint descriptions into a single html page, although I'm not quite sure what to name the page then). One of the features made possible with virtual machines is live migration, or transferring all state related to the guest from one host to another, with minimal interruption to the guest's activity. A clever observer will then note that if all state is available for live migration, there is nothing stopping a user from saving that state at a given point of time, to be able to later rewind guest execution back to the state it previously had. There are several different libvirt APIs associated with capturing the state of a guest, such that the captured state can later be used to rewind that guest to the conditions it was in earlier. But since there are multiple APIs, it is best to understand the tradeoffs and differences between them, in order to choose the best API for a given task. Timing: Capturing state can be a lengthy process, so while the captured state ideally represents an atomic point in time correpsonding to something the guest was actually executing, some interfaces require up-front preparation (the state captured is not complete until the API ends, which may be some time after the command was first started), while other interfaces track the state when the command was first issued even if it takes some time to finish capturing the state. While it is possible to freeze guest I/O around either point in time (so that the captured state is fully consistent, rather than just crash-consistent), knowing whether the state is captured at the start or end of the command may determine which approach to use. A related concept is the amount of downtime the guest will experience during the capture, particularly since freezing guest I/O has time constraints. Amount of state: For an offline guest, only the contents of the guest disks needs to be captured; restoring that state is merely a fresh boot with the disks restored to that state. But for an online guest, there is a choice between storing the guest's memory (all that is needed during live migration where the storage is shared between source and destination), the guest's disk state (all that is needed if there are no pending guest I/O transactions that would be lost without the corresponding memory state), or both together. Unless guest I/O is quiesced prior to capturing state, then reverting to captured disk state of a live guest without the corresponding memory state is comparable to booting a machine that previously lost power without a clean shutdown; but for a guest that uses appropriate journaling methods, this crash-consistent state may be sufficient to avoid the additional storage and time needed to capture memory state. Quantity of files: When capturing state, some approaches store all state within the same file (internal), while others expand a chain of related files that must be used together (external), for more files that a management application must track. There are also differences depending on whether the state is captured in the same file in use by a running guest, or whether the state is captured to a distinct file without impacting the files used to run the guest. Third-party integration: When capturing state, particularly for a running, there are tradeoffs to how much of the process must be done directly by the hypervisor, and how much can be off-loaded to third-party software. Since capturing state is not instantaneous, it is essential that any third-party integration see consistent data even if the running guest continues to modify that data after the point in time of the capture. Full vs. partial: When capturing state, it is useful to minimize the amount of state that must be captured in relation to a previous ca
Re: [libvirt] [RFC v3] external (pull) backup API
18.05.2018 01:43, Eric Blake wrote: Here's my updated counterproposal for a backup API. [...] Representing things on a timeline, when a guest is first created, there is no dirty bitmap; later, the checkpoint "check1" is created, which in turn creates "bitmap1" in the qcow2 image for all changes past that point; when a second checkmark "check2" is created, a qemu transaction is used to create and enable the new "bitmap2" bitmap at the same time as disabling "bitmap1" bitmap. (Actually, it's probably easier to name the bitmap in the qcow2 file with the same name as the Checkpoint object being tracked in libvirt, but for discussion purposes, it's less confusing if I use separate names for now.) creation ... check1 ... check2 ... active no bitmap bitmap1 bitmap2 When a user wants to create a backup, they select which point in time the backup starts from; the default value NULL represents a full backup (all content since disk creation to the point in time of the backup call, no bitmap is needed, use sync=full for push model or sync=none for the pull model); any other value represents the name of a checkpoint to use as an incremental backup (all content from the checkpoint to the point in time of the backup call; libvirt forms a temporary bitmap as needed, the uses sync=incremental for push model or sync=none plus exporting the bitmap for the pull model). For example, requesting an incremental backup from "check2" can just reuse "bitmap2", but requesting an incremental backup from "check1" requires the computation of the bitmap containing the union of "bitmap1" and "bitmap2". I have a bit of criticism on this part, exactly on ability to create a backup not from last checkpoint but from any from the past. For this ability we are implementing the whole api with checkpoints, we are going to store several bitmaps in Qemu (and possibly going to implement checkpoints in Qemu in future). But personally, I don't know any real and adequate use cases for this ability. I heard about the following cases: 1. Incremental restore: we want to rollback to some point in time (some element in incremental backup chain), and don't want to copy all the data, but only changed. - It's not real case, because information about dirtiness is already in backup chain: we just need to find allocated areas and copy them + we should copy areas, corresponding to dirty bits in active dirty bitmap in Qemu. 2. Several backup solutions backing up the same vm - Ok, if we implement checkpoints, instead of maintaining several active dirty bitmaps, we can have only one active bitmap and others disabled, which lead to performance gain and possibility to save RAM space (if we unload disabled bitmaps from RAM to qcow2). But what are real cases? What is the real benefit? I doubt that somebody will use more than 2 - 3 different backup providers on same vm, so is it worth implementing such a big feature for this? It of course worth doing if we have 100 independent backup providers. Note: the word "independent" is important here. For example it may be two external backup tools, managed by different subsystems or different people or something like this. If we are just doing a backup weekly + daily, actually, we can synchronize them, so that weekly backup will be a merge of last 7 daily backups, so weekly backup don't need personal active dirty bitmap and even backup operation. 3. Some of backups in incremental backup chain are lost, and we want to recreate part of the chain as a new backup, instead of just dropping all chain and create full backup. In this case, I can say the following: disabled bitmaps (~ all checkpoints except the last one) are constant metadata, related to the backup chain, not to the vm. And it should be stored as constant data: may be on the same server as backup chain, maybe on the other, maybe in some database, but not in vm. VM is a dynamic structure, and I don't see any reason of storing (almost) unrelated constant metadata in it. Also, saving this constant backup-related metadata separately from vm will allow to check it's consistency with a help of checksums or something like this. Finally, I'm not a specialist in storing constant data, but I think that the vm is not the best place. Note: Hmm, do someone have real examples of such user cases? Why backups are lost, is it often case? (I heard an assumption, that it may be a tool, checking backups (for example create a vm over the backup and check that it at least can start), which is running in background. But I'm not sure, that we must drop backup if it failed, may be it's enough to merge it up) 3.1 About external backup: we have even already exported this metadata to the third backup tool. So, this tool should store this information for future use, instead of exporting from Qemu again. To summarize: 1. I doubt that discussed ability is really needed. 2. If it is needed, I doubt that it
Re: [libvirt] [RFC v3] external (pull) backup API
On Thu, May 17, 2018 at 05:43:37PM -0500, Eric Blake wrote: > Here's my updated counterproposal for a backup API. > > In comparison to v2 posted by Nikolay: > https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html > - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a > "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End" > - flesh out more API descriptions > - better documentation of proposed XML, for both checkpoints and backup > > Barring any major issues turned up during review, I've already starting to > code this into libvirt with a goal of getting an implementation ready for > review this month. I think the key thing missing from the docs is some kind of explanation about the difference between a backup, and checkpoint and a snapshot. I'll admit I've not read the mail in detail, but at a high level it is not immediately obvious what the difference is & thus which APIs I would want to be using for a given scenario. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC v3] external (pull) backup API
Here's my updated counterproposal for a backup API. In comparison to v2 posted by Nikolay: https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End" - flesh out more API descriptions - better documentation of proposed XML, for both checkpoints and backup Barring any major issues turned up during review, I've already starting to code this into libvirt with a goal of getting an implementation ready for review this month. Each domain will gain the ability to track a tree of Checkpoint objects (we've previously mentioned the term "system checkpoint" in the XML as the combination of disk and RAM state; so I'll use the term "disk checkpoint" in prose as needed, to make it obvious that the checkpoints described here do not include RAM state). I will use the virDomainSnapshot API as a guide, meaning that we will track a tree of checkpoints where each checkpoint can have 0 or 1 parent checkpoints, in part because I plan to reuse a lot of the snapshot code as a starting point for implementing checkpoint tracking. Qemu does NOT track a relationship between internal snapshots, so libvirt has to manage the backing tree all by itself; by the same argument, if qemu does not add a parent relationship to dirty bitmaps, libvirt can probably manage everything itself by copying how it manages parent relationships between internal snapshots. However, I think it will be far easier for libvirt to exploit qemu dirty bitmaps if qemu DOES add bitmap tracking; particularly if qemu adds ways to easily compose a temporary bitmap that is the union of one bitmap plus a fixed number of its parents. Design-wise, libvirt will manage things so that there is only one enabled dirty-bitmap per qcow2 image at a time, when no backup operation is in effect. There is a notion of a current (or most recent) checkpoint; when a new checkpoint is created, that becomes the current one and the former checkpoint becomes the parent of the new one. If there is no current checkpoint, then there is no active dirty bitmap managed by libvirt. Representing things on a timeline, when a guest is first created, there is no dirty bitmap; later, the checkpoint "check1" is created, which in turn creates "bitmap1" in the qcow2 image for all changes past that point; when a second checkmark "check2" is created, a qemu transaction is used to create and enable the new "bitmap2" bitmap at the same time as disabling "bitmap1" bitmap. (Actually, it's probably easier to name the bitmap in the qcow2 file with the same name as the Checkpoint object being tracked in libvirt, but for discussion purposes, it's less confusing if I use separate names for now.) creation ... check1 ... check2 ... active no bitmap bitmap1bitmap2 When a user wants to create a backup, they select which point in time the backup starts from; the default value NULL represents a full backup (all content since disk creation to the point in time of the backup call, no bitmap is needed, use sync=full for push model or sync=none for the pull model); any other value represents the name of a checkpoint to use as an incremental backup (all content from the checkpoint to the point in time of the backup call; libvirt forms a temporary bitmap as needed, the uses sync=incremental for push model or sync=none plus exporting the bitmap for the pull model). For example, requesting an incremental backup from "check2" can just reuse "bitmap2", but requesting an incremental backup from "check1" requires the computation of the bitmap containing the union of "bitmap1" and "bitmap2". Libvirt will always create a new bitmap when starting a backup operation, whether or not the user requests that a checkpoint be created. Most users that want incremental backup sequences will create a new checkpoint every time they do a backup; the new bitmap that libvirt creates is then associated with that new checkpoint, and even after the backup operation completes, the new bitmap remains in the qcow2 file. But it is also possible to request a backup without a new checkpoint (it merely means that it is not possible to create a subsequent incremental backup from the backup just started); in that case, libvirt will have to take care of merging the new bitmap back into the previous one at the end of the backup operation. I think that it should be possible to run multiple backup operations in parallel in the long run. But in the interest of getting a proof of concept implementation out quickly, it's easier to state that for the initial implementation, libvirt supports at most one backup operation at a time (to do another backup, you have to wait for the current one to complete, or else abort and abandon the current one). As there is only one backup job running at a time, the existing virDomainGetJobInfo()/virDomainGetJobStats() will be able to