Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-09-03 Thread Stefan Hajnoczi
On Mon, Sep 02, 2013 at 02:57:23PM +0200, Benoît Canet wrote:
 
 I don't see the point of using hashes.
 Using hashes means that at least one extra read will be done on the target to
 compute the candidate target hash.
 It's bad for a cloud provider where IOs count is a huge cost.
 
 Another structure to replace a bitmap (smaller on the canonical case) would be
 a block table as described in the Hystor paper:
 www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf

This is similar to syncing image formats that use a revision number for
each cluster instead of a hash.

The problem with counters is overflow.  In the case of Hystor it is not
necessary to preserve exact counts.  A dirty bitmap must mark a block
dirty if it has been modified, otherwise there is a risk of data loss.

A bit more than just counters are necessary to implement a persistent
dirty bitmap, but maybe it's possible with some additional state.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-09-02 Thread Benoît Canet

I don't see the point of using hashes.
Using hashes means that at least one extra read will be done on the target to
compute the candidate target hash.
It's bad for a cloud provider where IOs count is a huge cost.

Another structure to replace a bitmap (smaller on the canonical case) would be
a block table as described in the Hystor paper:
www.cse.ohio-state.edu/~fchen/paper/papers/ics11.pdf

Best regards

Benoît



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 08:11:42AM +, Dietmar Maurer wrote:
   I also consider it safer, because you make sure the data exists (using 
   hash keys
  like SHA1).
  
   I am unsure how you can check if a dirty bitmap contains errors, or is 
   out of
  date?
  
   Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is 
   always
  related to a single image.
   (consider the user remove the latest backup from the backup target).
  
  One disadvantage of Merkle trees is that the client becomes stateful - the 
  client
  needs to store its own Merkle tree and this requires fancier client-side 
  code.
 
 What 'client' do you talk about here?

A backup application, for example.  Previously it could simply use
api.getDirtyBlocks() - [Sector] and it would translate into a single
QMP API call.

Now a Merkle tree needs to be stored on the client side and synced with
the server.  The client-side library becomes more complex.

 But sure, the code gets more complex, and needs considerable amount of RAM
 to store the hash keys .
  
  It is also more expensive to update hashes than a dirty bitmap.  Not 
  because you
  need to hash data but because a small write (e.g. 1 sector) requires that 
  you
  read the surrounding sectors to recompute a hash for the cluster.  
  Therefore you
  can expect worse guest I/O performance than with a dirty bitmap.
 
 There is no need to update any hash - You only need to do that on backup - in 
 fact, all
 those things can be done by the backup driver.

The problem is that if you leave hash calculation until backup time then
you need to read in the entire disk image (100s of GB) from disk.  That
is slow and drains I/O resources.

Maybe the best approach is to maintain a dirty bitmap while the guest is
running, which is fairly cheap.  Then you can use the dirty bitmap to
only hash modified clusters when building the Merkle tree - this avoids
reading the entire disk image.

  I still think it's a cool idea.  Making it work well will require a lot 
  more effort than
  a dirty bitmap.
 
 How do you re-generate a dirty bitmap after a server crash?

The dirty bitmap is invalid after crash.  A full backup is required, all
clusters are considered dirty.

The simplest way to implement this is to mark the persistent bitmap
invalid upon the first guest write.  When QEMU is terminated cleanly,
flush all dirty bitmap updates to disk and then mark the file valid
again.  If QEMU finds the file is invalid on startup, start from
scratch.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-24 Thread Dietmar Maurer
 Maybe the best approach is to maintain a dirty bitmap while the guest is
 running, which is fairly cheap.  Then you can use the dirty bitmap to only 
 hash
 modified clusters when building the Merkle tree - this avoids reading the
 entire disk image.

Yes, this is an good optimization.

   I still think it's a cool idea.  Making it work well will require a
   lot more effort than a dirty bitmap.
 
  How do you re-generate a dirty bitmap after a server crash?
 
 The dirty bitmap is invalid after crash.  A full backup is required, all 
 clusters
 are considered dirty.
 
 The simplest way to implement this is to mark the persistent bitmap invalid
 upon the first guest write.  When QEMU is terminated cleanly, flush all dirty
 bitmap updates to disk and then mark the file valid
 again.  If QEMU finds the file is invalid on startup, start from scratch.

Or you can compared the hash keys in that case?

Although I guess computing all those SHA1 checksums needs a considerable amount 
of CPU time.




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 03:34:18PM +, Dietmar Maurer wrote:
  That sounds like more work than a persistent dirty bitmap.  The advantage 
  is that
  while dirty bitmaps are consumed by a single user, the Merkle tree can be 
  used
  to sync up any number of replicas.
 
 I also consider it safer, because you make sure the data exists (using hash 
 keys like SHA1).
 
 I am unsure how you can check if a dirty bitmap contains errors, or is out of 
 date?
 
 Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is 
 always related to a single image.
 (consider the user remove the latest backup from the backup target). 

One disadvantage of Merkle trees is that the client becomes stateful -
the client needs to store its own Merkle tree and this requires fancier
client-side code.

It is also more expensive to update hashes than a dirty bitmap.  Not
because you need to hash data but because a small write (e.g. 1 sector)
requires that you read the surrounding sectors to recompute a hash for
the cluster.  Therefore you can expect worse guest I/O performance than
with a dirty bitmap.

I still think it's a cool idea.  Making it work well will require a lot
more effort than a dirty bitmap.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-23 Thread Dietmar Maurer
  I also consider it safer, because you make sure the data exists (using hash 
  keys
 like SHA1).
 
  I am unsure how you can check if a dirty bitmap contains errors, or is out 
  of
 date?
 
  Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is 
  always
 related to a single image.
  (consider the user remove the latest backup from the backup target).
 
 One disadvantage of Merkle trees is that the client becomes stateful - the 
 client
 needs to store its own Merkle tree and this requires fancier client-side code.

What 'client' do you talk about here?

But sure, the code gets more complex, and needs considerable amount of RAM
to store the hash keys .
 
 It is also more expensive to update hashes than a dirty bitmap.  Not because 
 you
 need to hash data but because a small write (e.g. 1 sector) requires that you
 read the surrounding sectors to recompute a hash for the cluster.  Therefore 
 you
 can expect worse guest I/O performance than with a dirty bitmap.

There is no need to update any hash - You only need to do that on backup - in 
fact, all
those things can be done by the backup driver.
 
 I still think it's a cool idea.  Making it work well will require a lot more 
 effort than
 a dirty bitmap.

How do you re-generate a dirty bitmap after a server crash?




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-22 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 10:58:47AM +, Dietmar Maurer wrote:
   True, but that would happen only in case the host crashes.  Even for
   a QEMU crash the changes would be safe, I think.  They would be
   written back when the persistent dirty bitmap's mmap() area is
   unmapped, during process exit.
  
   I'd err on the side of caution, mark the persistent dirty bitmap while
   QEMU is running.  Discard the file if there was a power failure.
  
  Agreed.  Though this is something that management must do manually, isn't 
  it?
  QEMU cannot distinguish a SIGKILL from a power failure, while management
  can afford treating SIGKILL as a power failure.
  
   It really depends what the dirty bitmap users are doing.  It could be
   okay to have a tiny chance of missing a modification but it might not.
 
 I just want to mention that there is another way to do incremental backups. 
 Instead
 of using a dirty bitmap, you can compare the content, usually using a digest 
 (SHA1) on clusters.

Reading gigabytes of data from disk is expensive though.  I guess they
keep a Merkle tree so it's easy to find out which parts of the image
must be transferred without re-reading the entire image.

That sounds like more work than a persistent dirty bitmap.  The
advantage is that while dirty bitmaps are consumed by a single user, the
Merkle tree can be used to sync up any number of replicas.

 That way you can also implement async replication to a remote site (like MS 
 do).

Sounds like rsync.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-22 Thread Dietmar Maurer
  That way you can also implement async replication to a remote site (like MS
 do).
 
 Sounds like rsync.

yes, but we need 'snapshots' and something more optimized (rsync compared the 
whole files). 
I think this can be implemented using the backup job with a specialized backup 
driver.





Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-22 Thread Dietmar Maurer
 That sounds like more work than a persistent dirty bitmap.  The advantage is 
 that
 while dirty bitmaps are consumed by a single user, the Merkle tree can be used
 to sync up any number of replicas.

I also consider it safer, because you make sure the data exists (using hash 
keys like SHA1).

I am unsure how you can check if a dirty bitmap contains errors, or is out of 
date?

Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always 
related to a single image.
(consider the user remove the latest backup from the backup target). 






Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
 Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
   You only need to fdatasync() before every guest flush, no?
  No, you need to set the dirty bit before issuing the write on the
  host.  Otherwise the image data may be modified without setting the
  appropriate dirty bit.  That would allow data modifications to go
  undetected!
 
 But data modifications can go undetected until the guest flush returns,
 can't they?

You are thinking about it from the guest perspective - if a flush has
not completed yet then there is no guarantee that the write has reached
disk.

But from a host perspective the dirty bitmap should be conservative so
that the backup application can always restore a bit-for-bit identical
copy of the disk image.  It would be weird if writes can sneak in
unnoticed.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:25:01AM +0800, Wenchao Xia wrote:
 于 2013-5-17 17:14, Stefan Hajnoczi 写道:
 On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
 于 2013-5-16 15:47, Stefan Hajnoczi 写道:
 On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.
 
 I've been thinking about this.  Incremental backups need to know which
 blocks have changed, but keeping a persistent dirty bitmap is expensive
 and unnecessary.
 
Yes, it would be likely another block layer, so hope not do that.
 
 Not at all, persistent dirty bitmaps need to be part of the block layer
 since they need to support any image type - qcow2, Gluster, raw LVM,
 etc.
 
 I don't consider block jobs to be qemu device layer.  It sounds like
 you think the code should be in bdrv_co_do_writev()?
 
I feel a trend of becoming fragility from different solutions,
 and COW is a key feature that block layer provide, so I wonder if it
 can be adjusted under block layer later
 
 The generic block layer includes more than just block.c.  It also
 includes block jobs and the qcow2 metadata cache that Dong Xu has
 extracted recently, for example.  Therefore you need to be more specific
 about what and why.
 
 This copy-on-write backup approach is available as a block job which
 runs on top of any BlockDriverState.  What concrete change are you
 proposing?
 
   Since hard to hide it BlockDriverState now, suggest add some
 document in qemu about the three snapshot types: qcow2 internal,
 backing chain, drive-backup, which are all qemu software based snapshot
 implemention, then user can know the difference with it eaiser.
 
   In long term, I hope to form a library expose those in a unified
 format, perhaps it calls qmp_transaction internally, and make it
 easier to be offloaded if possible, so hope a abstract-driver structure.

Okay, just keep in mind they have different behavior.  That means these
snapshot types solve different problems and may be inappropriate for
some use cases.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Paolo Bonzini
Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
 On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
 Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
 You only need to fdatasync() before every guest flush, no?
 No, you need to set the dirty bit before issuing the write on the
 host.  Otherwise the image data may be modified without setting the
 appropriate dirty bit.  That would allow data modifications to go
 undetected!

 But data modifications can go undetected until the guest flush returns,
 can't they?
 
 You are thinking about it from the guest perspective - if a flush has
 not completed yet then there is no guarantee that the write has reached
 disk.
 
 But from a host perspective the dirty bitmap should be conservative so
 that the backup application can always restore a bit-for-bit identical
 copy of the disk image.  It would be weird if writes can sneak in
 unnoticed.

True, but that would happen only in case the host crashes.  Even for a
QEMU crash the changes would be safe, I think.  They would be written
back when the persistent dirty bitmap's mmap() area is unmapped, during
process exit.

Paolo



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote:
 Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
  On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
  Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
  You only need to fdatasync() before every guest flush, no?
  No, you need to set the dirty bit before issuing the write on the
  host.  Otherwise the image data may be modified without setting the
  appropriate dirty bit.  That would allow data modifications to go
  undetected!
 
  But data modifications can go undetected until the guest flush returns,
  can't they?
  
  You are thinking about it from the guest perspective - if a flush has
  not completed yet then there is no guarantee that the write has reached
  disk.
  
  But from a host perspective the dirty bitmap should be conservative so
  that the backup application can always restore a bit-for-bit identical
  copy of the disk image.  It would be weird if writes can sneak in
  unnoticed.
 
 True, but that would happen only in case the host crashes.  Even for a
 QEMU crash the changes would be safe, I think.  They would be written
 back when the persistent dirty bitmap's mmap() area is unmapped, during
 process exit.

I'd err on the side of caution, mark the persistent dirty bitmap while
QEMU is running.  Discard the file if there was a power failure.

It really depends what the dirty bitmap users are doing.  It could be
okay to have a tiny chance of missing a modification but it might not.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Paolo Bonzini
Il 21/05/2013 12:34, Stefan Hajnoczi ha scritto:
 On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote:
 Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
 On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
 Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
 You only need to fdatasync() before every guest flush, no?
 No, you need to set the dirty bit before issuing the write on the
 host.  Otherwise the image data may be modified without setting the
 appropriate dirty bit.  That would allow data modifications to go
 undetected!

 But data modifications can go undetected until the guest flush returns,
 can't they?

 You are thinking about it from the guest perspective - if a flush has
 not completed yet then there is no guarantee that the write has reached
 disk.

 But from a host perspective the dirty bitmap should be conservative so
 that the backup application can always restore a bit-for-bit identical
 copy of the disk image.  It would be weird if writes can sneak in
 unnoticed.

 True, but that would happen only in case the host crashes.  Even for a
 QEMU crash the changes would be safe, I think.  They would be written
 back when the persistent dirty bitmap's mmap() area is unmapped, during
 process exit.
 
 I'd err on the side of caution, mark the persistent dirty bitmap while
 QEMU is running.  Discard the file if there was a power failure.

Agreed.  Though this is something that management must do manually,
isn't it?  QEMU cannot distinguish a SIGKILL from a power failure, while
management can afford treating SIGKILL as a power failure.

 It really depends what the dirty bitmap users are doing.  It could be
 okay to have a tiny chance of missing a modification but it might not.

Paolo



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Dietmar Maurer
  True, but that would happen only in case the host crashes.  Even for
  a QEMU crash the changes would be safe, I think.  They would be
  written back when the persistent dirty bitmap's mmap() area is
  unmapped, during process exit.
 
  I'd err on the side of caution, mark the persistent dirty bitmap while
  QEMU is running.  Discard the file if there was a power failure.
 
 Agreed.  Though this is something that management must do manually, isn't it?
 QEMU cannot distinguish a SIGKILL from a power failure, while management
 can afford treating SIGKILL as a power failure.
 
  It really depends what the dirty bitmap users are doing.  It could be
  okay to have a tiny chance of missing a modification but it might not.

I just want to mention that there is another way to do incremental backups. 
Instead
of using a dirty bitmap, you can compare the content, usually using a digest 
(SHA1) on clusters.

That way you can also implement async replication to a remote site (like MS do).





Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-20 Thread Stefan Hajnoczi
On Fri, May 17, 2013 at 12:17 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto:
 On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
   After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.

 I've been thinking about this.  Incremental backups need to know which
 blocks have changed, but keeping a persistent dirty bitmap is expensive
 and unnecessary.

 Backup applications need to support the full backup case anyway for
 their first run.  Therefore we can keep a best-effort dirty bitmap which
 is persisted only when the guest is terminated cleanly.  If the QEMU
 process crashes then the on-disk dirty bitmap will be invalid and the
 backup application needs to do a full backup next time.

 The advantage of this approach is that we don't need to fdatasync(2)
 before every guest write operation.

 You only need to fdatasync() before every guest flush, no?

No, you need to set the dirty bit before issuing the write on the
host.  Otherwise the image data may be modified without setting the
appropriate dirty bit.  That would allow data modifications to go
undetected!

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-20 Thread Paolo Bonzini
Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
  You only need to fdatasync() before every guest flush, no?
 No, you need to set the dirty bit before issuing the write on the
 host.  Otherwise the image data may be modified without setting the
 appropriate dirty bit.  That would allow data modifications to go
 undetected!

But data modifications can go undetected until the guest flush returns,
can't they?

Paolo



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-20 Thread Wenchao Xia

于 2013-5-17 17:14, Stefan Hajnoczi 写道:

On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:

于 2013-5-16 15:47, Stefan Hajnoczi 写道:

On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:

   After checking the code, I found it possible to add delta data backup
support also, If an additional dirty bitmap was added.


I've been thinking about this.  Incremental backups need to know which
blocks have changed, but keeping a persistent dirty bitmap is expensive
and unnecessary.


   Yes, it would be likely another block layer, so hope not do that.


Not at all, persistent dirty bitmaps need to be part of the block layer
since they need to support any image type - qcow2, Gluster, raw LVM,
etc.


I don't consider block jobs to be qemu device layer.  It sounds like
you think the code should be in bdrv_co_do_writev()?


   I feel a trend of becoming fragility from different solutions,
and COW is a key feature that block layer provide, so I wonder if it
can be adjusted under block layer later


The generic block layer includes more than just block.c.  It also
includes block jobs and the qcow2 metadata cache that Dong Xu has
extracted recently, for example.  Therefore you need to be more specific
about what and why.

This copy-on-write backup approach is available as a block job which
runs on top of any BlockDriverState.  What concrete change are you
proposing?


  Since hard to hide it BlockDriverState now, suggest add some
document in qemu about the three snapshot types: qcow2 internal,
backing chain, drive-backup, which are all qemu software based snapshot
implemention, then user can know the difference with it eaiser.

  In long term, I hope to form a library expose those in a unified
format, perhaps it calls qmp_transaction internally, and make it
easier to be offloaded if possible, so hope a abstract-driver structure.
--
Best Regards

Wenchao Xia




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-17 Thread Wenchao Xia

于 2013-5-16 15:47, Stefan Hajnoczi 写道:

On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:

   After checking the code, I found it possible to add delta data backup
support also, If an additional dirty bitmap was added.


I've been thinking about this.  Incremental backups need to know which
blocks have changed, but keeping a persistent dirty bitmap is expensive
and unnecessary.


  Yes, it would be likely another block layer, so hope not do that.


Backup applications need to support the full backup case anyway for
their first run.  Therefore we can keep a best-effort dirty bitmap which
is persisted only when the guest is terminated cleanly.  If the QEMU
process crashes then the on-disk dirty bitmap will be invalid and the
backup application needs to do a full backup next time.

The advantage of this approach is that we don't need to fdatasync(2)
before every guest write operation.


Compared with
current solution, I think it is doing COW at qemu device level:

 qemu device
 |
general block layer
 |
virtual format layer
 |
---
| |
qcow2 vmdk

   This will make things complicated when more works comes, a better
place for block COW, is under general block layer. Maybe later we
can adjust block for it.


I don't consider block jobs to be qemu device layer.  It sounds like
you think the code should be in bdrv_co_do_writev()?


  I feel a trend of becoming fragility from different solutions,
and COW is a key feature that block layer provide, so I wonder if it
can be adjusted under block layer later, and leaves an abstract API for
it. Some other operation such as commit, stream, could be also hide
under block.

qemu general testcaseother user
   |   ||
   --
|
core block abstract layer(COW, zero R/W, image dup/backup)
|
  -
  ||
qemu's implement   3'rd party
  ||
---
|   |||
  qcow2vmdk lvm2Enterprise storage integration

  It is not directly related to this serial, but I feel some effort
should be paid when time allows, before things become complicated.


The drive-backup operation doesn't really affect the source
BlockDriverState, it just needs to intercept writes.  Therefore it seems
cleaner for the code to live separately (plus we reuse the code for the
block job loop which copies out data while the guest is running).
Otherwise we would squash all of the blockjob code into block.c and it
would be an even bigger mess than it is today :-).




--
Best Regards

Wenchao Xia




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-17 Thread Stefan Hajnoczi
On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
 于 2013-5-16 15:47, Stefan Hajnoczi 写道:
 On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.
 
 I've been thinking about this.  Incremental backups need to know which
 blocks have changed, but keeping a persistent dirty bitmap is expensive
 and unnecessary.
 
   Yes, it would be likely another block layer, so hope not do that.

Not at all, persistent dirty bitmaps need to be part of the block layer
since they need to support any image type - qcow2, Gluster, raw LVM,
etc.

 I don't consider block jobs to be qemu device layer.  It sounds like
 you think the code should be in bdrv_co_do_writev()?
 
   I feel a trend of becoming fragility from different solutions,
 and COW is a key feature that block layer provide, so I wonder if it
 can be adjusted under block layer later

The generic block layer includes more than just block.c.  It also
includes block jobs and the qcow2 metadata cache that Dong Xu has
extracted recently, for example.  Therefore you need to be more specific
about what and why.

This copy-on-write backup approach is available as a block job which
runs on top of any BlockDriverState.  What concrete change are you
proposing?



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-17 Thread Paolo Bonzini
Il 16/05/2013 09:47, Stefan Hajnoczi ha scritto:
 On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
   After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.
 
 I've been thinking about this.  Incremental backups need to know which
 blocks have changed, but keeping a persistent dirty bitmap is expensive
 and unnecessary.
 
 Backup applications need to support the full backup case anyway for
 their first run.  Therefore we can keep a best-effort dirty bitmap which
 is persisted only when the guest is terminated cleanly.  If the QEMU
 process crashes then the on-disk dirty bitmap will be invalid and the
 backup application needs to do a full backup next time.
 
 The advantage of this approach is that we don't need to fdatasync(2)
 before every guest write operation.

You only need to fdatasync() before every guest flush, no?

Paolo

 Compared with
 current solution, I think it is doing COW at qemu device level:

 qemu device
 |
 general block layer
 |
 virtual format layer
 |
 ---
 | |
 qcow2 vmdk

   This will make things complicated when more works comes, a better
 place for block COW, is under general block layer. Maybe later we
 can adjust block for it.
 
 I don't consider block jobs to be qemu device layer.  It sounds like
 you think the code should be in bdrv_co_do_writev()?
 
 The drive-backup operation doesn't really affect the source
 BlockDriverState, it just needs to intercept writes.  Therefore it seems
 cleaner for the code to live separately (plus we reuse the code for the
 block job loop which copies out data while the guest is running).
 Otherwise we would squash all of the blockjob code into block.c and it
 would be an even bigger mess than it is today :-).
 
 




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-16 Thread Wenchao Xia
于 2013-5-15 22:34, Stefan Hajnoczi 写道:
 Note: These patches apply to my block-next tree.  You can also grab the code
 from git here:
 git://github.com/stefanha/qemu.git block-backup-core
 
 This series adds a new QMP command, drive-backup, which takes a point-in-time
 snapshot of a block device.  The snapshot is copied out to a target block
 device.  A simple example is:
 
drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2
 
 The original drive-backup blockjob was written by Dietmar Maurer
 diet...@proxmox.com.  He is currently busy but I feel the feature is worth
 pushing into QEMU since there has been interest.  This is my version of his
 patch, plus the QMP command and qemu-iotests test case.
 
 QMP 'transaction' support is included since v3.  It adds support for atomic
 snapshots of multiple block devices.  I also added an 'abort' transaction to
 allow testing of the .abort()/.cleanup() code path.  Thanks to Wenchao for
 making qmp_transaction() extensible.
 
 How is this different from block-stream and drive-mirror?
 -
 Both block-stream and drive-mirror do not provide immediate point-in-time
 snapshots.  Instead they copy data into a new file and then switch to it.  In
 other words, the point at which the snapshot is taken cannot be controlled
 directly.
 
 drive-backup intercepts guest writes and saves data into the target block
 device before it is overwritten.  The target block device can be a raw image
 file, backing files are not used to implement this feature.
 
 How can drive-backup be used?
 -
 The simplest use-case is to copy a point-in-time snapshot to a local file.
 
 More advanced users may wish to make the target an NBD URL.  The NBD server
 listening on the other side can process the backup writes any way it wishes.  
 I
 previously posted an RFC series with a backup server that streamed Dietmar's
 VMA backup archive format.
 
 What's next for drive-backup?
 -
 1. Sync modes like drive-mirror (top, full, none).  This makes it possible to
 preserve the backing file chain.
 
 v3:
   * Rename to drive-backup for consistency with drive-mirror [kwolf]
   * Add QMP transaction support [kwolf]
   * Introduce bdrv_add_before_write_cb() to hook writes
   * Mention 'query-block-jobs' lists job of type 'backup' [eblake]
   * Rename rwlock to flush_rwlock [kwolf]
   * Fix space in block/backup.c comment [kwolf]
 
 v2:
   * s/block_backup/block-backup/ in commit message [eblake]
   * Avoid funny spacing in QMP docs [eblake]
   * Document query-block-jobs and block-job-cancel usage [eblake]

  After checking the code, I found it possible to add delta data backup
support also, If an additional dirty bitmap was added. Compared with
current solution, I think it is doing COW at qemu device level:

qemu device
|
general block layer
|
virtual format layer
|
---
| |
qcow2 vmdk

  This will make things complicated when more works comes, a better
place for block COW, is under general block layer. Maybe later we
can adjust block for it.


-- 
Best Regards

Wenchao Xia




Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-16 Thread Stefan Hajnoczi
On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
   After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.

I've been thinking about this.  Incremental backups need to know which
blocks have changed, but keeping a persistent dirty bitmap is expensive
and unnecessary.

Backup applications need to support the full backup case anyway for
their first run.  Therefore we can keep a best-effort dirty bitmap which
is persisted only when the guest is terminated cleanly.  If the QEMU
process crashes then the on-disk dirty bitmap will be invalid and the
backup application needs to do a full backup next time.

The advantage of this approach is that we don't need to fdatasync(2)
before every guest write operation.

 Compared with
 current solution, I think it is doing COW at qemu device level:
 
 qemu device
 |
 general block layer
 |
 virtual format layer
 |
 ---
 | |
 qcow2 vmdk
 
   This will make things complicated when more works comes, a better
 place for block COW, is under general block layer. Maybe later we
 can adjust block for it.

I don't consider block jobs to be qemu device layer.  It sounds like
you think the code should be in bdrv_co_do_writev()?

The drive-backup operation doesn't really affect the source
BlockDriverState, it just needs to intercept writes.  Therefore it seems
cleaner for the code to live separately (plus we reuse the code for the
block job loop which copies out data while the guest is running).
Otherwise we would squash all of the blockjob code into block.c and it
would be an even bigger mess than it is today :-).



[Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-15 Thread Stefan Hajnoczi
Note: These patches apply to my block-next tree.  You can also grab the code
from git here:
git://github.com/stefanha/qemu.git block-backup-core

This series adds a new QMP command, drive-backup, which takes a point-in-time
snapshot of a block device.  The snapshot is copied out to a target block
device.  A simple example is:

  drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2

The original drive-backup blockjob was written by Dietmar Maurer
diet...@proxmox.com.  He is currently busy but I feel the feature is worth
pushing into QEMU since there has been interest.  This is my version of his
patch, plus the QMP command and qemu-iotests test case.

QMP 'transaction' support is included since v3.  It adds support for atomic
snapshots of multiple block devices.  I also added an 'abort' transaction to
allow testing of the .abort()/.cleanup() code path.  Thanks to Wenchao for
making qmp_transaction() extensible.

How is this different from block-stream and drive-mirror?
-
Both block-stream and drive-mirror do not provide immediate point-in-time
snapshots.  Instead they copy data into a new file and then switch to it.  In
other words, the point at which the snapshot is taken cannot be controlled
directly.

drive-backup intercepts guest writes and saves data into the target block
device before it is overwritten.  The target block device can be a raw image
file, backing files are not used to implement this feature.

How can drive-backup be used?
-
The simplest use-case is to copy a point-in-time snapshot to a local file.

More advanced users may wish to make the target an NBD URL.  The NBD server
listening on the other side can process the backup writes any way it wishes.  I
previously posted an RFC series with a backup server that streamed Dietmar's
VMA backup archive format.

What's next for drive-backup?
-
1. Sync modes like drive-mirror (top, full, none).  This makes it possible to
   preserve the backing file chain.

v3:
 * Rename to drive-backup for consistency with drive-mirror [kwolf]
 * Add QMP transaction support [kwolf]
 * Introduce bdrv_add_before_write_cb() to hook writes
 * Mention 'query-block-jobs' lists job of type 'backup' [eblake]
 * Rename rwlock to flush_rwlock [kwolf]
 * Fix space in block/backup.c comment [kwolf]

v2:
 * s/block_backup/block-backup/ in commit message [eblake]
 * Avoid funny spacing in QMP docs [eblake]
 * Document query-block-jobs and block-job-cancel usage [eblake]

Dietmar Maurer (1):
  block: add basic backup support to block driver

Stefan Hajnoczi (7):
  block: add bdrv_add_before_write_cb()
  block: add drive-backup QMP command
  qemu-iotests: add 055 drive-backup test case
  blockdev: rename BlkTransactionStates to singular
  blockdev: add DriveBackup transaction
  blockdev: add Abort transaction
  qemu-iotests: test 'drive-backup' transaction in 055

 block.c|  37 +
 block/Makefile.objs|   1 +
 block/backup.c | 282 
 blockdev.c | 264 +++---
 include/block/block_int.h  |  48 +++
 qapi-schema.json   |  65 -
 qmp-commands.hx|   6 +
 tests/qemu-iotests/055 | 348 +
 tests/qemu-iotests/055.out |   5 +
 tests/qemu-iotests/group   |   1 +
 10 files changed, 1004 insertions(+), 53 deletions(-)
 create mode 100644 block/backup.c
 create mode 100755 tests/qemu-iotests/055
 create mode 100644 tests/qemu-iotests/055.out

-- 
1.8.1.4