Re: [libvirt] [RFC v3] external (pull) backup API

2018-06-13 Thread Eric Blake

On 05/17/2018 05:43 PM, Eric Blake wrote:

Here's my updated counterproposal for a backup API.




/**
  * virDomainBackupBegin:
  * @domain: a domain object
  * @diskXml: description of storage to utilize and expose during
  *   the backup, or NULL
  * @checkpointXml: description of a checkpoint to create, or NULL
  * @flags: not used yet, pass 0
  *


Actually, since I'm taking two XML documents, this should really have a 
VIR_DOMAIN_BACKUP_VALIDATE flag for comparison of the XML against the 
schema.




/**
  * virDomainCheckpointCreateXML:
  * @domain: a domain object
  * @xmlDesc: description of the checkpoint to create
  * @flags: bitwise-OR of supported virDomainCheckpointCreateFlags
  *



  */
virDomainCheckpointPtr
virDomainCheckpointCreateXML(virDomainPtr domain, const char *xmlDesc,
  unsigned int flags);


Ditto.  And this was copied from virDomainSnapshotCreateXML, which 
should also gain a VALIDATE flag.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC v3] external (pull) backup API

2018-06-08 Thread Eric Blake

On 05/17/2018 05:43 PM, Eric Blake wrote:

Here's my updated counterproposal for a backup API.




/**
  * virDomainBackupBegin:



  *
  * There are two fundamental backup approaches.  The first, called a
  * push model, instructs the hypervisor to copy the state of the guest
  * disk to the designated storage destination (which may be on the
  * local file system or a network device); in this mode, the
  * hypervisor writes the content of the guest disk to the destination,
  * then emits VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 when the backup is
  * either complete or failed (the backup image is invalid if the job
  * is ended prior to the event being emitted).


Better is VIR_DOMAIN_EVENT_ID_JOB_COMPLETED (BLOCK_JOB can only inform 
status about one disk, while this is intended to inform about multiple 
disks done in a single transaction).  I'm a bit depressed at our 
technical debt in this area: virDomainGetJobStats() and 
virDomainAbortJob() don't take a job id, but only operate on the most 
recently started job, but I did mention elsewhere in my plans:




I think that it should be possible to run multiple backup operations
in parallel in the long run.  But in the interest of getting a proof
of concept implementation out quickly, it's easier to state that for
the initial implementation, libvirt supports at most one backup
operation at a time (to do another backup, you have to wait for the
current one to complete, or else abort and abandon the current
one). As there is only one backup job running at a time, the existing
virDomainGetJobInfo()/virDomainGetJobStats() will be able to report
statistics about the job (insofar as such statistics are available).
But in preparation for the future, when libvirt does add parallel job
support, starting a backup job will return a job id; and presumably
we'd add a new virDomainGetJobStatsByID() for grabbing statistics of
an arbitrary (rather than the most-recently-started) job.

Since live migration also acts as a job visible through
virDomainGetJobStats(), I'm going to treat an active backup job and
live migration as mutually exclusive.  This is particularly true when
we have a pull model backup ongoing: if qemu on the source is acting
as an NBD server, you can't migrate away from that qemu and tell the
NBD client to reconnect to the NBD server on the migration
destination.  So, to perform a migration, you have to cancel any
pending backup operations.  Conversely, if a migration job is
underway, it will not be possible to start a new backup job until
migration completes.  However, we DO need to modify migration to
ensure that any persistent bitmaps are migrated. 


Yes, this means that virDomainBackupEnd() (which takes a job id) and 
virDomainJobAbort() (which does not, but until we support parallel 
backup jobs or a mix of backup and migration at once, it does not 
matter) can initially both do the work of aborting a backup job.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC v3] external (pull) backup API

2018-06-08 Thread Eric Blake

On 05/17/2018 05:43 PM, Eric Blake wrote:

Here's my updated counterproposal for a backup API.




/**
  * virDomainBackupEnd:
  * @domain: a domain object
  * @id: the id of an active backup job previously started with
  *  virDomainBackupBegin()
  * @flags: bitwise-OR of supported virDomainBackupEndFlags
  *
  * Conclude a point-in-time backup job @id on the given domain.
  *
  * If the backup job uses the push model, but the event marking that
  * all data has been copied has not yet been emitted, then the command
  * fails unless @flags includes VIR_DOMAIN_BACKUP_END_ABORT.  If the
  * event has been issued, or if the backup uses the pull model, the
  * flag has no effect.
  *
  * Returns 0 on success and -1 on failure.
  */
int virDomainBackupEnd(virDomainPtr domain, int id, unsigned int flags);


For this API, I'm considering a tri-state return, 1 if the backup job 
completed successfully (in the push model, the backup destination file 
is usable); 0 if the backup job was aborted (only possible if 
VIR_DOMAIN_BACKUP_END_ABORT was passed, the backup destination file is 
untrustworthy); and -1 on failure.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-28 Thread Peter Krempa
On Fri, May 25, 2018 at 10:26:12 -0500, Eric Blake wrote:
> On 05/17/2018 05:43 PM, Eric Blake wrote:
> > Here's my updated counterproposal for a backup API.
> > 
> > In comparison to v2 posted by Nikolay:
> > https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html
> > - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a
> > "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End"
> > - flesh out more API descriptions
> > - better documentation of proposed XML, for both checkpoints and backup
> > 
> > Barring any major issues turned up during review, I've already starting
> > to code this into libvirt with a goal of getting an implementation ready
> > for review this month.
> > 
> 
> > // Many additional functions copying heavily from virDomainSnapshot*:
> > 
> > virDomainCheckpointList(virDomainPtr domain,
> >      virDomainCheckpointPtr **checkpoints,
> >      unsigned int flags);
> > 
> 
> > 
> > int
> > virDomainCheckpointListChildren(virDomainCheckpointPtr checkpoint,
> >      virDomainCheckpointPtr **children,
> >      unsigned int flags);
> > 
> > Notably, none of the older racy list functions, like
> > virDomainSnapshotNum, virDomainSnapshotNumChildren, or
> > virDomainSnapshotListChildrenNames; also, for now, there is no revert
> > support like virDomainSnapshotRevert.
> 
> I'm finding it easier to understand if I name these:
> 
> virDomainListCheckpoints() (find checkpoints relative to a domain)
> virDomainCheckpointListChildren() (find children relative to a checkpoint)

If you are going to name them "checkpoints" here we should first
rename "snapshots with memory" in our docs since we refer to them as
checkpoints. We refer to disk-only snapshots as snapshots and wanted to
emphasize the difference.

> 
> The counterpart Snapshot API used virDomainListAllSnapshots(); the term
> 'All' was present because it was added after the initial racy
> virDomainSnapshotNum(), but as we are avoiding the racy API here we can skip
> it from the beginning.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-25 Thread Eric Blake

On 05/17/2018 05:43 PM, Eric Blake wrote:

Here's my updated counterproposal for a backup API.

In comparison to v2 posted by Nikolay: 
https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html
- changed terminology a bit: Nikolay's "BlockSnapshot" is now called a 
"Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End"

- flesh out more API descriptions
- better documentation of proposed XML, for both checkpoints and backup

Barring any major issues turned up during review, I've already starting 
to code this into libvirt with a goal of getting an implementation ready 
for review this month.





// Many additional functions copying heavily from virDomainSnapshot*:

virDomainCheckpointList(virDomainPtr domain,
     virDomainCheckpointPtr **checkpoints,
     unsigned int flags);





int
virDomainCheckpointListChildren(virDomainCheckpointPtr checkpoint,
     virDomainCheckpointPtr **children,
     unsigned int flags);

Notably, none of the older racy list functions, like
virDomainSnapshotNum, virDomainSnapshotNumChildren, or
virDomainSnapshotListChildrenNames; also, for now, there is no revert
support like virDomainSnapshotRevert.


I'm finding it easier to understand if I name these:

virDomainListCheckpoints() (find checkpoints relative to a domain)
virDomainCheckpointListChildren() (find children relative to a checkpoint)

The counterpart Snapshot API used virDomainListAllSnapshots(); the term 
'All' was present because it was added after the initial racy 
virDomainSnapshotNum(), but as we are avoiding the racy API here we can 
skip it from the beginning.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-22 Thread Vladimir Sementsov-Ogievskiy

22.05.2018 01:03, Eric Blake wrote:

On 05/21/2018 10:52 AM, Vladimir Sementsov-Ogievskiy wrote:

18.05.2018 01:43, Eric Blake wrote:

Here's my updated counterproposal for a backup API.



[...]



Representing things on a timeline, when a guest is first created,
there is no dirty bitmap; later, the checkpoint "check1" is created,
which in turn creates "bitmap1" in the qcow2 image for all changes
past that point; when a second checkmark "check2" is created, a qemu
transaction is used to create and enable the new "bitmap2" bitmap at
the same time as disabling "bitmap1" bitmap.  (Actually, it's probably
easier to name the bitmap in the qcow2 file with the same name as the
Checkpoint object being tracked in libvirt, but for discussion
purposes, it's less confusing if I use separate names for now.)

creation ... check1 ... check2 ... active
    no bitmap   bitmap1    bitmap2

When a user wants to create a backup, they select which point in time
the backup starts from; the default value NULL represents a full
backup (all content since disk creation to the point in time of the
backup call, no bitmap is needed, use sync=full for push model or
sync=none for the pull model); any other value represents the name of
a checkpoint to use as an incremental backup (all content from the
checkpoint to the point in time of the backup call; libvirt forms a
temporary bitmap as needed, the uses sync=incremental for push model
or sync=none plus exporting the bitmap for the pull model). For
example, requesting an incremental backup from "check2" can just reuse
"bitmap2", but requesting an incremental backup from "check1" requires
the computation of the bitmap containing the union of "bitmap1" and
"bitmap2".


I have a bit of criticism on this part, exactly on ability to create 
a backup not from last checkpoint but from any from the past. For 
this ability we are implementing the whole api with checkpoints, we 
are going to store several bitmaps in Qemu (and possibly going to 
implement checkpoints in Qemu in future). But personally, I don't 
know any real and adequate use cases for this ability.


I heard about the following cases:
1. Incremental restore: we want to rollback to some point in time 
(some element in incremental backup chain), and don't want to copy 
all the data, but only changed.
- It's not real case, because information about dirtiness is already 
in backup chain: we just need to find allocated areas and copy them + 
we should copy areas, corresponding to dirty bits in active dirty 
bitmap in Qemu.


If you do a pull mode backup (where the dirty bitmaps were exported 
over NBD), then yes, you can assume that the third-party app reading 
the backup data also saved the dirty bitmap in whatever form it likes, 
so that it only ever has to pull data from the most recent checkpoint 
and can reconstruct the union of changes from an earlier checkpoint 
offline without qemu help.  But for a push mode backup (where qemu 
does the pushing), there is no way to expose the dirty bitmap of what 
the backup contains, unless you backup to something like a qcow2 image 
and track which clusters in the backup image were allocated as a 
result of the backup operation.  So having a way in the libvirt API to 
grab an incremental backup from earlier than the most recent 
checkpoint may not be needed by everyone, but I don't see a problem in 
implementing it either.


If we have a chain of incrementals from push backup, it should be 
possible to analyze their block status, so it should be something like 
qcow2. Otherwise we can't create a chain of backing files. If we backup 
incrementals to the same file, we anyway can't restore to some previous 
point except the last one.






2. Several backup solutions backing up the same vm
- Ok, if we implement checkpoints, instead of maintaining several 
active dirty bitmaps, we can have only one active bitmap and others 
disabled, which lead to performance gain and possibility to save RAM 
space (if we unload disabled bitmaps from RAM to qcow2). But what are 
real cases? What is the real benefit? I doubt that somebody will use 
more than 2 - 3 different backup providers on same vm, so is it worth 
implementing such a big feature for this? It of course worth doing if 
we have 100 independent backup providers.
Note: the word "independent" is important here. For example it may be 
two external backup tools, managed by different subsystems or 
different people or something like this. If we are just doing a 
backup weekly + daily, actually, we can synchronize them, so that 
weekly backup will be a merge of last 7 daily backups, so weekly 
backup don't need personal active dirty bitmap and even backup 
operation.


I'm not sure if this is a complaint that libvirt should allow more 
than one active bitmap at a time, vs. having exactly one active bitmap 
at a time and then reconstructing bitmaps over larger sequences of 
time as needed.  But does that change the API that libvi

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-21 Thread Eric Blake

On 05/21/2018 10:52 AM, Vladimir Sementsov-Ogievskiy wrote:

18.05.2018 01:43, Eric Blake wrote:

Here's my updated counterproposal for a backup API.



[...]



Representing things on a timeline, when a guest is first created,
there is no dirty bitmap; later, the checkpoint "check1" is created,
which in turn creates "bitmap1" in the qcow2 image for all changes
past that point; when a second checkmark "check2" is created, a qemu
transaction is used to create and enable the new "bitmap2" bitmap at
the same time as disabling "bitmap1" bitmap.  (Actually, it's probably
easier to name the bitmap in the qcow2 file with the same name as the
Checkpoint object being tracked in libvirt, but for discussion
purposes, it's less confusing if I use separate names for now.)

creation ... check1 ... check2 ... active
    no bitmap   bitmap1    bitmap2

When a user wants to create a backup, they select which point in time
the backup starts from; the default value NULL represents a full
backup (all content since disk creation to the point in time of the
backup call, no bitmap is needed, use sync=full for push model or
sync=none for the pull model); any other value represents the name of
a checkpoint to use as an incremental backup (all content from the
checkpoint to the point in time of the backup call; libvirt forms a
temporary bitmap as needed, the uses sync=incremental for push model
or sync=none plus exporting the bitmap for the pull model).  For
example, requesting an incremental backup from "check2" can just reuse
"bitmap2", but requesting an incremental backup from "check1" requires
the computation of the bitmap containing the union of "bitmap1" and
"bitmap2".


I have a bit of criticism on this part, exactly on ability to create a 
backup not from last checkpoint but from any from the past. For this 
ability we are implementing the whole api with checkpoints, we are going 
to store several bitmaps in Qemu (and possibly going to implement 
checkpoints in Qemu in future). But personally, I don't know any real 
and adequate use cases for this ability.


I heard about the following cases:
1. Incremental restore: we want to rollback to some point in time (some 
element in incremental backup chain), and don't want to copy all the 
data, but only changed.
- It's not real case, because information about dirtiness is already in 
backup chain: we just need to find allocated areas and copy them + we 
should copy areas, corresponding to dirty bits in active dirty bitmap in 
Qemu.


If you do a pull mode backup (where the dirty bitmaps were exported over 
NBD), then yes, you can assume that the third-party app reading the 
backup data also saved the dirty bitmap in whatever form it likes, so 
that it only ever has to pull data from the most recent checkpoint and 
can reconstruct the union of changes from an earlier checkpoint offline 
without qemu help.  But for a push mode backup (where qemu does the 
pushing), there is no way to expose the dirty bitmap of what the backup 
contains, unless you backup to something like a qcow2 image and track 
which clusters in the backup image were allocated as a result of the 
backup operation.  So having a way in the libvirt API to grab an 
incremental backup from earlier than the most recent checkpoint may not 
be needed by everyone, but I don't see a problem in implementing it either.




2. Several backup solutions backing up the same vm
- Ok, if we implement checkpoints, instead of maintaining several active 
dirty bitmaps, we can have only one active bitmap and others disabled, 
which lead to performance gain and possibility to save RAM space (if we 
unload disabled bitmaps from RAM to qcow2). But what are real cases? 
What is the real benefit? I doubt that somebody will use more than 2 - 3 
different backup providers on same vm, so is it worth implementing such 
a big feature for this? It of course worth doing if we have 100 
independent backup providers.
Note: the word "independent" is important here. For example it may be 
two external backup tools, managed by different subsystems or different 
people or something like this. If we are just doing a backup weekly + 
daily, actually, we can synchronize them, so that weekly backup will be 
a merge of last 7 daily backups, so weekly backup don't need personal 
active dirty bitmap and even backup operation.


I'm not sure if this is a complaint that libvirt should allow more than 
one active bitmap at a time, vs. having exactly one active bitmap at a 
time and then reconstructing bitmaps over larger sequences of time as 
needed.  But does that change the API that libvirt should expose to end 
users, or can it just be an implementation detail?




3. Some of backups in incremental backup chain are lost, and we want to 
recreate part of the chain as a new backup, instead of just dropping all 
chain and create full backup.

In this case, I can say the following:
disabled bitmaps (~ all checkpoints except the last one) 

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-21 Thread Eric Blake

On 05/18/2018 02:56 AM, Daniel P. Berrangé wrote:

On Thu, May 17, 2018 at 05:43:37PM -0500, Eric Blake wrote:

Here's my updated counterproposal for a backup API.

In comparison to v2 posted by Nikolay:
https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html
- changed terminology a bit: Nikolay's "BlockSnapshot" is now called a
"Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End"
- flesh out more API descriptions
- better documentation of proposed XML, for both checkpoints and backup

Barring any major issues turned up during review, I've already starting to
code this into libvirt with a goal of getting an implementation ready for
review this month.


I think the key thing missing from the docs is some kind of explanation
about the difference between a backup, and checkpoint and a snapshot.
I'll admit I've not read the mail in detail, but at a high level it is
not immediately obvious what the difference is & thus which APIs I would
want to be using for a given scenario.


Indeed, and that's a fair complaint.  Here's a first draft, that I'll 
have to polish into a formal html document that both the snapshot and 
checkpoint/backup pages refer to (or maybe I merge snapshots and 
checkpoint descriptions into a single html page, although I'm not quite 
sure what to name the page then).


One of the features made possible with virtual machines is live
migration, or transferring all state related to the guest from one
host to another, with minimal interruption to the guest's activity.  A
clever observer will then note that if all state is available for live
migration, there is nothing stopping a user from saving that state at
a given point of time, to be able to later rewind guest execution back
to the state it previously had.  There are several different libvirt
APIs associated with capturing the state of a guest, such that the
captured state can later be used to rewind that guest to the
conditions it was in earlier.  But since there are multiple APIs, it
is best to understand the tradeoffs and differences between them, in
order to choose the best API for a given task.

Timing: Capturing state can be a lengthy process, so while the
captured state ideally represents an atomic point in time
correpsonding to something the guest was actually executing, some
interfaces require up-front preparation (the state captured is not
complete until the API ends, which may be some time after the command
was first started), while other interfaces track the state when the
command was first issued even if it takes some time to finish
capturing the state.  While it is possible to freeze guest I/O around
either point in time (so that the captured state is fully consistent,
rather than just crash-consistent), knowing whether the state is
captured at the start or end of the command may determine which
approach to use.  A related concept is the amount of downtime the
guest will experience during the capture, particularly since freezing
guest I/O has time constraints.

Amount of state: For an offline guest, only the contents of the guest
disks needs to be captured; restoring that state is merely a fresh
boot with the disks restored to that state.  But for an online guest,
there is a choice between storing the guest's memory (all that is
needed during live migration where the storage is shared between
source and destination), the guest's disk state (all that is needed if
there are no pending guest I/O transactions that would be lost without
the corresponding memory state), or both together.  Unless guest I/O
is quiesced prior to capturing state, then reverting to captured disk
state of a live guest without the corresponding memory state is
comparable to booting a machine that previously lost power without a
clean shutdown; but for a guest that uses appropriate journaling
methods, this crash-consistent state may be sufficient to avoid the
additional storage and time needed to capture memory state.

Quantity of files: When capturing state, some approaches store all
state within the same file (internal), while others expand a chain of
related files that must be used together (external), for more files
that a management application must track.  There are also differences
depending on whether the state is captured in the same file in use by
a running guest, or whether the state is captured to a distinct file
without impacting the files used to run the guest.

Third-party integration: When capturing state, particularly for a
running, there are tradeoffs to how much of the process must be done
directly by the hypervisor, and how much can be off-loaded to
third-party software.  Since capturing state is not instantaneous, it
is essential that any third-party integration see consistent data even
if the running guest continues to modify that data after the point in
time of the capture.

Full vs. partial: When capturing state, it is useful to minimize the
amount of state that must be captured in relation to a previous
ca

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-21 Thread Vladimir Sementsov-Ogievskiy

18.05.2018 01:43, Eric Blake wrote:

Here's my updated counterproposal for a backup API.



[...]



Representing things on a timeline, when a guest is first created,
there is no dirty bitmap; later, the checkpoint "check1" is created,
which in turn creates "bitmap1" in the qcow2 image for all changes
past that point; when a second checkmark "check2" is created, a qemu
transaction is used to create and enable the new "bitmap2" bitmap at
the same time as disabling "bitmap1" bitmap.  (Actually, it's probably
easier to name the bitmap in the qcow2 file with the same name as the
Checkpoint object being tracked in libvirt, but for discussion
purposes, it's less confusing if I use separate names for now.)

creation ... check1 ... check2 ... active
    no bitmap   bitmap1    bitmap2

When a user wants to create a backup, they select which point in time
the backup starts from; the default value NULL represents a full
backup (all content since disk creation to the point in time of the
backup call, no bitmap is needed, use sync=full for push model or
sync=none for the pull model); any other value represents the name of
a checkpoint to use as an incremental backup (all content from the
checkpoint to the point in time of the backup call; libvirt forms a
temporary bitmap as needed, the uses sync=incremental for push model
or sync=none plus exporting the bitmap for the pull model).  For
example, requesting an incremental backup from "check2" can just reuse
"bitmap2", but requesting an incremental backup from "check1" requires
the computation of the bitmap containing the union of "bitmap1" and
"bitmap2".


I have a bit of criticism on this part, exactly on ability to create a 
backup not from last checkpoint but from any from the past. For this 
ability we are implementing the whole api with checkpoints, we are going 
to store several bitmaps in Qemu (and possibly going to implement 
checkpoints in Qemu in future). But personally, I don't know any real 
and adequate use cases for this ability.


I heard about the following cases:
1. Incremental restore: we want to rollback to some point in time (some 
element in incremental backup chain), and don't want to copy all the 
data, but only changed.
- It's not real case, because information about dirtiness is already in 
backup chain: we just need to find allocated areas and copy them + we 
should copy areas, corresponding to dirty bits in active dirty bitmap in 
Qemu.


2. Several backup solutions backing up the same vm
- Ok, if we implement checkpoints, instead of maintaining several active 
dirty bitmaps, we can have only one active bitmap and others disabled, 
which lead to performance gain and possibility to save RAM space (if we 
unload disabled bitmaps from RAM to qcow2). But what are real cases? 
What is the real benefit? I doubt that somebody will use more than 2 - 3 
different backup providers on same vm, so is it worth implementing such 
a big feature for this? It of course worth doing if we have 100 
independent backup providers.
Note: the word "independent" is important here. For example it may be 
two external backup tools, managed by different subsystems or different 
people or something like this. If we are just doing a backup weekly + 
daily, actually, we can synchronize them, so that weekly backup will be 
a merge of last 7 daily backups, so weekly backup don't need personal 
active dirty bitmap and even backup operation.


3. Some of backups in incremental backup chain are lost, and we want to 
recreate part of the chain as a new backup, instead of just dropping all 
chain and create full backup.

In this case, I can say the following:
disabled bitmaps (~ all checkpoints except the last one) are constant 
metadata, related to the backup chain, not to the vm. And it should be 
stored as constant data: may be on the same server as backup chain, 
maybe on the other, maybe in some database, but not in vm. VM is a 
dynamic structure, and I don't see any reason of storing (almost) 
unrelated constant metadata in it. Also, saving this constant 
backup-related metadata separately from vm will allow to check it's 
consistency with a help of checksums or something like this. Finally, 
I'm not a specialist in storing constant data, but I think that the vm 
is not the best place.


Note: Hmm, do someone have real examples of such user cases? Why backups 
are lost, is it often case? (I heard an assumption, that it may be a 
tool, checking backups (for example create a vm over the backup and 
check that it at least can start), which is running in background. But 
I'm not sure, that we must drop backup if it failed, may be it's enough 
to merge it up)


3.1 About external backup: we have even already exported this metadata 
to the third backup tool. So, this tool should store this information 
for future use, instead of exporting from Qemu again.


To summarize:
1. I doubt that discussed ability is really needed.
2. If it is needed, I doubt that it

Re: [libvirt] [RFC v3] external (pull) backup API

2018-05-18 Thread Daniel P . Berrangé
On Thu, May 17, 2018 at 05:43:37PM -0500, Eric Blake wrote:
> Here's my updated counterproposal for a backup API.
> 
> In comparison to v2 posted by Nikolay:
> https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html
> - changed terminology a bit: Nikolay's "BlockSnapshot" is now called a
> "Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End"
> - flesh out more API descriptions
> - better documentation of proposed XML, for both checkpoints and backup
> 
> Barring any major issues turned up during review, I've already starting to
> code this into libvirt with a goal of getting an implementation ready for
> review this month.

I think the key thing missing from the docs is some kind of explanation
about the difference between a backup, and checkpoint and a snapshot.
I'll admit I've not read the mail in detail, but at a high level it is
not immediately obvious what the difference is & thus which APIs I would
want to be using for a given scenario.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC v3] external (pull) backup API

2018-05-17 Thread Eric Blake

Here's my updated counterproposal for a backup API.

In comparison to v2 posted by Nikolay: 
https://www.redhat.com/archives/libvir-list/2018-April/msg00115.html
- changed terminology a bit: Nikolay's "BlockSnapshot" is now called a 
"Checkpoint", and "BlockExportStart/Stop" is now "BackupBegin/End"

- flesh out more API descriptions
- better documentation of proposed XML, for both checkpoints and backup

Barring any major issues turned up during review, I've already starting 
to code this into libvirt with a goal of getting an implementation ready 
for review this month.



Each domain will gain the ability to track a tree of Checkpoint
objects (we've previously mentioned the term "system checkpoint" in
the  XML as the combination of disk and RAM state; so
I'll use the term "disk checkpoint" in prose as needed, to make it
obvious that the checkpoints described here do not include RAM state).
I will use the virDomainSnapshot API as a guide, meaning that we will
track a tree of checkpoints where each checkpoint can have 0 or 1
parent checkpoints, in part because I plan to reuse a lot of the
snapshot code as a starting point for implementing checkpoint
tracking.

Qemu does NOT track a relationship between internal snapshots, so
libvirt has to manage the backing tree all by itself; by the same
argument, if qemu does not add a parent relationship to dirty bitmaps,
libvirt can probably manage everything itself by copying how it
manages parent relationships between internal snapshots.  However, I
think it will be far easier for libvirt to exploit qemu dirty bitmaps
if qemu DOES add bitmap tracking; particularly if qemu adds ways to
easily compose a temporary bitmap that is the union of one bitmap plus
a fixed number of its parents.

Design-wise, libvirt will manage things so that there is only one
enabled dirty-bitmap per qcow2 image at a time, when no backup
operation is in effect.  There is a notion of a current (or most
recent) checkpoint; when a new checkpoint is created, that becomes the
current one and the former checkpoint becomes the parent of the new
one.  If there is no current checkpoint, then there is no active dirty
bitmap managed by libvirt.

Representing things on a timeline, when a guest is first created,
there is no dirty bitmap; later, the checkpoint "check1" is created,
which in turn creates "bitmap1" in the qcow2 image for all changes
past that point; when a second checkmark "check2" is created, a qemu
transaction is used to create and enable the new "bitmap2" bitmap at
the same time as disabling "bitmap1" bitmap.  (Actually, it's probably
easier to name the bitmap in the qcow2 file with the same name as the
Checkpoint object being tracked in libvirt, but for discussion
purposes, it's less confusing if I use separate names for now.)

creation ... check1 ... check2 ... active
no bitmap   bitmap1bitmap2

When a user wants to create a backup, they select which point in time
the backup starts from; the default value NULL represents a full
backup (all content since disk creation to the point in time of the
backup call, no bitmap is needed, use sync=full for push model or
sync=none for the pull model); any other value represents the name of
a checkpoint to use as an incremental backup (all content from the
checkpoint to the point in time of the backup call; libvirt forms a
temporary bitmap as needed, the uses sync=incremental for push model
or sync=none plus exporting the bitmap for the pull model).  For
example, requesting an incremental backup from "check2" can just reuse
"bitmap2", but requesting an incremental backup from "check1" requires
the computation of the bitmap containing the union of "bitmap1" and
"bitmap2".

Libvirt will always create a new bitmap when starting a backup
operation, whether or not the user requests that a checkpoint be
created.  Most users that want incremental backup sequences will
create a new checkpoint every time they do a backup; the new bitmap
that libvirt creates is then associated with that new checkpoint, and
even after the backup operation completes, the new bitmap remains in
the qcow2 file.  But it is also possible to request a backup without a
new checkpoint (it merely means that it is not possible to create a
subsequent incremental backup from the backup just started); in that
case, libvirt will have to take care of merging the new bitmap back
into the previous one at the end of the backup operation.

I think that it should be possible to run multiple backup operations
in parallel in the long run.  But in the interest of getting a proof
of concept implementation out quickly, it's easier to state that for
the initial implementation, libvirt supports at most one backup
operation at a time (to do another backup, you have to wait for the
current one to complete, or else abort and abandon the current
one). As there is only one backup job running at a time, the existing
virDomainGetJobInfo()/virDomainGetJobStats() will be able to