Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-04-02 Thread Wen Congyang
On 03/26/2015 02:31 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt

 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 +scene that Secondary VM is not running.
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 +identical right after a VM checkpoint, but becomes different as the VM
 +executes till the next checkpoint. To support disk contents checkpoint,
 +the modified disk contents in the Secondary VM must be buffered, and are
 +only dropped at next checkpoint time. To reduce the network transportation
 +effort at the time of checkpoint, the disk modification operations of
 +Primary disk are asynchronously forwarded to the Secondary node.
 +
 +== Workflow ==
 +The following is the image of block replication workflow:
 +
 ++--+++
 +|Primary Write Requests||Secondary Write Requests|
 ++--+++
 +  |   |
 +  |  (4)
 +  |   V
 +  |  /-\
 +  |  Copy and Forward| |
 +  |-(1)--+   | Disk Buffer |
 +  |  |   | |
 +  | (3)  \-/
 +  | speculative  ^
 +  |write through(2)
 +  |  |   |
 +  V  V   |
 +   +--+   ++
 +   | Primary Disk |   | Secondary Disk |
 +   +--+   ++
 +
 +1) Primary write requests will be copied and forwarded to Secondary
 +   QEMU.
 +2) Before Primary write requests are written to Secondary disk, the
 +   original sector content will be read from Secondary disk and
 +   buffered in the Disk buffer, but it will not overwrite the existing
 +   sector content in the Disk buffer.
 
 Could you elaborate a bit about the existing sector content here? IIUC, it
 could be from either Secondary Write Requests or previous COW of Primary
 Write Requests. Is that right?
 
 +3) Primary write requests will be written to Secondary disk.
 +4) Secondary write requests will be buffered in the Disk buffer and it
 +   will overwrite the existing sector content in the buffer.
 +
 +== Architecture ==
 +We are going to implement COLO block replication from many basic
 +blocks that are already in QEMU.
 +
 + virtio-blk   ||
 + ^||.--
 + |||| Secondary
 +1 Quorum  ||'--
 + /  \ ||
 +/\||
 +   Primary  2 NBD  ---  2 NBD
 + disk   client|| server 
 virtio-blk
 +  ||^   
  ^
 +. |||   
  |
 +Primary | ||  Secondary disk - hidden-disk 4 
 - active-disk 3
 +' |||  backing^   
 backing
 +  ||| |
 +  ||| |
 +  ||'-'
 + 

Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-04-02 Thread Fam Zheng
On Fri, 04/03 10:35, Wen Congyang wrote:
 On 03/26/2015 02:31 PM, Fam Zheng wrote:
  On Wed, 03/25 17:36, Wen Congyang wrote:
  Signed-off-by: Wen Congyang we...@cn.fujitsu.com
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
  Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
  Signed-off-by: Gonglei arei.gong...@huawei.com
  ---
   docs/block-replication.txt | 147 
  +
   1 file changed, 147 insertions(+)
   create mode 100644 docs/block-replication.txt
 
  diff --git a/docs/block-replication.txt b/docs/block-replication.txt
  new file mode 100644
  index 000..874ed8e
  --- /dev/null
  +++ b/docs/block-replication.txt
  @@ -0,0 +1,147 @@
  +Block replication
  +
  +Copyright Fujitsu, Corp. 2015
  +Copyright (c) 2015 Intel Corporation
  +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
  +
  +This work is licensed under the terms of the GNU GPL, version 2 or later.
  +See the COPYING file in the top-level directory.
  +
  +The block replication is used for continuous checkpoints. It is designed
  +for COLO that Secondary VM is running. It can also be applied for FT/HA
  +scene that Secondary VM is not running.
  +
  +This document gives an overview of block replication's design.
  +
  +== Background ==
  +High availability solutions such as micro checkpoint and COLO will do
  +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
  +identical right after a VM checkpoint, but becomes different as the VM
  +executes till the next checkpoint. To support disk contents checkpoint,
  +the modified disk contents in the Secondary VM must be buffered, and are
  +only dropped at next checkpoint time. To reduce the network transportation
  +effort at the time of checkpoint, the disk modification operations of
  +Primary disk are asynchronously forwarded to the Secondary node.
  +
  +== Workflow ==
  +The following is the image of block replication workflow:
  +
  ++--+++
  +|Primary Write Requests||Secondary Write Requests|
  ++--+++
  +  |   |
  +  |  (4)
  +  |   V
  +  |  /-\
  +  |  Copy and Forward| |
  +  |-(1)--+   | Disk Buffer |
  +  |  |   | |
  +  | (3)  \-/
  +  | speculative  ^
  +  |write through(2)
  +  |  |   |
  +  V  V   |
  +   +--+   ++
  +   | Primary Disk |   | Secondary Disk |
  +   +--+   ++
  +
  +1) Primary write requests will be copied and forwarded to Secondary
  +   QEMU.
  +2) Before Primary write requests are written to Secondary disk, the
  +   original sector content will be read from Secondary disk and
  +   buffered in the Disk buffer, but it will not overwrite the existing
  +   sector content in the Disk buffer.
  
  Could you elaborate a bit about the existing sector content here? IIUC, it
  could be from either Secondary Write Requests or previous COW of Primary
  Write Requests. Is that right?
  
  +3) Primary write requests will be written to Secondary disk.
  +4) Secondary write requests will be buffered in the Disk buffer and it
  +   will overwrite the existing sector content in the buffer.
  +
  +== Architecture ==
  +We are going to implement COLO block replication from many basic
  +blocks that are already in QEMU.
  +
  + virtio-blk   ||
  + ^||.--
  + |||| Secondary
  +1 Quorum  ||'--
  + /  \ ||
  +/\||
  +   Primary  2 NBD  ---  2 NBD
  + disk   client|| server   
virtio-blk
  +  ||^ 
 ^
  +. ||| 
 |
  +Primary | ||  Secondary disk - hidden-disk 4 
  - active-disk 3
  +' |||  backing^   
  backing
  +  |||  

Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Wen Congyang
On 03/25/2015 11:38 PM, Eric Blake wrote:
 On 03/25/2015 03:36 AM, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt

 
 Grammar review only (I'll leave the technical review to others)
 
 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 
 Space after comma in English writing.

Yes, but I am not sure I can change it. HUAWEI always use this way.
You can find it in bootdevice.c

Thanks
Wen Congyang

 
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 
 Sounds better as either of:
 The block replication feature is...
 Block replication is...
 
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 
 Please define COLO and FT/HA on first use (okay to abbreviate elsewhere
 in the document, but the first use should not assume the acronym is
 well-known)
 
 s/for COLO that/for COLO (COurse-grain LOck-stepping), where the/
 
 +scene that Secondary VM is not running.
 
 s/for FT/HA scene that/for the FT/HA (Fault-tolerance/High Assurance)
 scenario, where the/
 
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 
 s/checkpoint/checkpoints/
 
 +identical right after a VM checkpoint, but becomes different as the VM
 ...
 +
 +4) The hidden-disk is created automatically. It buffers the original content
 +that is modified by the primary VM. It should also be an empty disk, and
 +the dirver supports bdrv_make_empty().
 
 s/dirver/driver/
 
 +
 +== New block driver interface ==
 +We add three block driver interfaces to control block replication:
 +a. bdrv_start_replication()
 +   Start block replication, called in migration/checkpoint thread.
 +   We must call bdrv_start_replication() in secondary QEMU before
 +   calling bdrv_start_replication() in primary QEMU.
 +b. bdrv_do_checkpoint()
 +   This interface is called after all VM state is transfered to
 
 s/transfered/transferred/
 
 +   Secondary QEMU. The Disk buffer will be dropped in this interface.
 +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
 +   thread.
 +c. bdrv_stop_replication()
 +   It is called when failover. We will flush the Disk buffer into
 
 s/when/on/
 
 +   Secondary Disk and stop block replication. The vm should be stopped
 +   before calling it. The caller must hold the I/O mutex lock if it is
 +   in migration/checkpoint thread.
 +
 +== Usage ==
 +Primary:
 +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
 + children.0.file.filename=1.raw,\
 + children.0.driver=raw,\
 + children.1.file.driver=nbd+colo,\
 + children.1.file.host=xxx,\
 + children.1.file.port=xxx,\
 + children.1.file.export=xxx,\
 + children.1.driver=raw,\
 + children.1.ignore-errors=on
 
 This command line looks like multiple arguments because of the leading
 whitespace on succeeding lines.  I don't know if there is any better way
 to format it, though, to make it obvious that it is all a single
 argument to -drive.
 
 +  Note:
 +  1. NBD Client should not be the first child of quorum.
 +  2. There should be only one NBD Client.
 +  3. host is the secondary physical machine's hostname or IP
 +  4. Each disk must have its own export name.
 
 Maybe a note 5 to call out the formatting aspect of the command line?
 
 +
 +Secondary:
 +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
 +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
 + backing_reference.drive_id=nbd_target1,\
 + backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
 + backing_reference.hidden-disk.driver=qcow2,\
 + backing_reference.hidden-disk.allow-write-backing-file=on
 +  Then run qmp command:
 +nbd_server_start host:port
 +  Note:
 +  1. The export name for the same disk must be the same in primary
 + and secondary QEMU command line
 +  2. The qmp command nbd_server_start must be run before running the
 + qmp command migrate on primary QEMU
 +  3. Don't use nbd_server_start's 

Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Gonglei
On 2015/3/26 20:30, Eric Blake wrote:
 On 03/26/2015 04:28 AM, Gonglei wrote:
 
 Grammar review only (I'll leave the technical review to others)

 
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.

 Space after comma in English writing.
 Yes, but I am not sure I can change it. HUAWEI always use this way.
 
 Copyright lines are one thing that I am reluctant to change if it is not
 my own line (some companies have policies on how their lines must look,
 and I'm not in a position to argue the policy as I am not a lawyer).
 
 You can find it in bootdevice.c
 
 Such existing precedence is a strong argument to NOT changing it, at
 least for a patch author that is not the copyright owner.
 

 Good catch, Eric is right. I will change all of this writing way in Qemu at 
 2.4.
 
 So, sounds like such a change would be a separate patch (probably
 through the -trivial tree), and cover all files in the repo in one go,
 without affecting this series.  Such a patch by a copyright owner would
 have no problem being accepted, if it is wanted.
 
Okay, will do, if it is not too later for rc2. :)

Thanks,
-Gonglei




Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Eric Blake
On 03/26/2015 04:28 AM, Gonglei wrote:

 Grammar review only (I'll leave the technical review to others)


 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.

 Space after comma in English writing.
 Yes, but I am not sure I can change it. HUAWEI always use this way.

Copyright lines are one thing that I am reluctant to change if it is not
my own line (some companies have policies on how their lines must look,
and I'm not in a position to argue the policy as I am not a lawyer).

 You can find it in bootdevice.c

Such existing precedence is a strong argument to NOT changing it, at
least for a patch author that is not the copyright owner.

 
 Good catch, Eric is right. I will change all of this writing way in Qemu at 
 2.4.

So, sounds like such a change would be a separate patch (probably
through the -trivial tree), and cover all files in the repo in one go,
without affecting this series.  Such a patch by a copyright owner would
have no problem being accepted, if it is wanted.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Fam Zheng
On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt
 
 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 +scene that Secondary VM is not running.
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 +identical right after a VM checkpoint, but becomes different as the VM
 +executes till the next checkpoint. To support disk contents checkpoint,
 +the modified disk contents in the Secondary VM must be buffered, and are
 +only dropped at next checkpoint time. To reduce the network transportation
 +effort at the time of checkpoint, the disk modification operations of
 +Primary disk are asynchronously forwarded to the Secondary node.
 +
 +== Workflow ==
 +The following is the image of block replication workflow:
 +
 ++--+++
 +|Primary Write Requests||Secondary Write Requests|
 ++--+++
 +  |   |
 +  |  (4)
 +  |   V
 +  |  /-\
 +  |  Copy and Forward| |
 +  |-(1)--+   | Disk Buffer |
 +  |  |   | |
 +  | (3)  \-/
 +  | speculative  ^
 +  |write through(2)
 +  |  |   |
 +  V  V   |
 +   +--+   ++
 +   | Primary Disk |   | Secondary Disk |
 +   +--+   ++
 +
 +1) Primary write requests will be copied and forwarded to Secondary
 +   QEMU.
 +2) Before Primary write requests are written to Secondary disk, the
 +   original sector content will be read from Secondary disk and
 +   buffered in the Disk buffer, but it will not overwrite the existing
 +   sector content in the Disk buffer.

Could you elaborate a bit about the existing sector content here? IIUC, it
could be from either Secondary Write Requests or previous COW of Primary
Write Requests. Is that right?

 +3) Primary write requests will be written to Secondary disk.
 +4) Secondary write requests will be buffered in the Disk buffer and it
 +   will overwrite the existing sector content in the buffer.
 +
 +== Architecture ==
 +We are going to implement COLO block replication from many basic
 +blocks that are already in QEMU.
 +
 + virtio-blk   ||
 + ^||.--
 + |||| Secondary
 +1 Quorum  ||'--
 + /  \ ||
 +/\||
 +   Primary  2 NBD  ---  2 NBD
 + disk   client|| server  
virtio-blk
 +  ||^
 ^
 +. |||
 |
 +Primary | ||  Secondary disk - hidden-disk 4 
 - active-disk 3
 +' |||  backing^   backing
 +  ||| |
 +  ||| |
 +  ||'-'
 +  ||   drive-backup 

Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Wen Congyang
On 03/26/2015 02:31 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt

 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 +scene that Secondary VM is not running.
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 +identical right after a VM checkpoint, but becomes different as the VM
 +executes till the next checkpoint. To support disk contents checkpoint,
 +the modified disk contents in the Secondary VM must be buffered, and are
 +only dropped at next checkpoint time. To reduce the network transportation
 +effort at the time of checkpoint, the disk modification operations of
 +Primary disk are asynchronously forwarded to the Secondary node.
 +
 +== Workflow ==
 +The following is the image of block replication workflow:
 +
 ++--+++
 +|Primary Write Requests||Secondary Write Requests|
 ++--+++
 +  |   |
 +  |  (4)
 +  |   V
 +  |  /-\
 +  |  Copy and Forward| |
 +  |-(1)--+   | Disk Buffer |
 +  |  |   | |
 +  | (3)  \-/
 +  | speculative  ^
 +  |write through(2)
 +  |  |   |
 +  V  V   |
 +   +--+   ++
 +   | Primary Disk |   | Secondary Disk |
 +   +--+   ++
 +
 +1) Primary write requests will be copied and forwarded to Secondary
 +   QEMU.
 +2) Before Primary write requests are written to Secondary disk, the
 +   original sector content will be read from Secondary disk and
 +   buffered in the Disk buffer, but it will not overwrite the existing
 +   sector content in the Disk buffer.
 
 Could you elaborate a bit about the existing sector content here? IIUC, it
 could be from either Secondary Write Requests or previous COW of Primary
 Write Requests. Is that right?

Yes.

 
 +3) Primary write requests will be written to Secondary disk.
 +4) Secondary write requests will be buffered in the Disk buffer and it
 +   will overwrite the existing sector content in the buffer.
 +
 +== Architecture ==
 +We are going to implement COLO block replication from many basic
 +blocks that are already in QEMU.
 +
 + virtio-blk   ||
 + ^||.--
 + |||| Secondary
 +1 Quorum  ||'--
 + /  \ ||
 +/\||
 +   Primary  2 NBD  ---  2 NBD
 + disk   client|| server 
 virtio-blk
 +  ||^   
  ^
 +. |||   
  |
 +Primary | ||  Secondary disk - hidden-disk 4 
 - active-disk 3
 +' |||  backing^   
 backing
 +  ||| |
 +  ||| |
 +  ||'-'