Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On 03/26/2015 02:31 PM, Fam Zheng wrote: On Wed, 03/25 17:36, Wen Congyang wrote: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..874ed8e --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,147 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +The block replication is used for continuous checkpoints. It is designed +for COLO that Secondary VM is running. It can also be applied for FT/HA +scene that Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoint. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content in the Disk buffer. Could you elaborate a bit about the existing sector content here? IIUC, it could be from either Secondary Write Requests or previous COW of Primary Write Requests. Is that right? +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement COLO block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary 2 NBD --- 2 NBD + disk client|| server virtio-blk + ||^ ^ +. ||| | +Primary | || Secondary disk - hidden-disk 4 - active-disk 3 +' ||| backing^ backing + ||| | + ||| | + ||'-' +
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On Fri, 04/03 10:35, Wen Congyang wrote: On 03/26/2015 02:31 PM, Fam Zheng wrote: On Wed, 03/25 17:36, Wen Congyang wrote: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..874ed8e --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,147 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +The block replication is used for continuous checkpoints. It is designed +for COLO that Secondary VM is running. It can also be applied for FT/HA +scene that Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoint. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content in the Disk buffer. Could you elaborate a bit about the existing sector content here? IIUC, it could be from either Secondary Write Requests or previous COW of Primary Write Requests. Is that right? +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement COLO block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary 2 NBD --- 2 NBD + disk client|| server virtio-blk + ||^ ^ +. ||| | +Primary | || Secondary disk - hidden-disk 4 - active-disk 3 +' ||| backing^ backing + |||
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On 03/25/2015 11:38 PM, Eric Blake wrote: On 03/25/2015 03:36 AM, Wen Congyang wrote: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 docs/block-replication.txt Grammar review only (I'll leave the technical review to others) diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..874ed8e --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,147 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. Space after comma in English writing. Yes, but I am not sure I can change it. HUAWEI always use this way. You can find it in bootdevice.c Thanks Wen Congyang + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +The block replication is used for continuous checkpoints. It is designed Sounds better as either of: The block replication feature is... Block replication is... +for COLO that Secondary VM is running. It can also be applied for FT/HA Please define COLO and FT/HA on first use (okay to abbreviate elsewhere in the document, but the first use should not assume the acronym is well-known) s/for COLO that/for COLO (COurse-grain LOck-stepping), where the/ +scene that Secondary VM is not running. s/for FT/HA scene that/for the FT/HA (Fault-tolerance/High Assurance) scenario, where the/ + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoint. The VM state of Primary VM and Secondary VM is s/checkpoint/checkpoints/ +identical right after a VM checkpoint, but becomes different as the VM ... + +4) The hidden-disk is created automatically. It buffers the original content +that is modified by the primary VM. It should also be an empty disk, and +the dirver supports bdrv_make_empty(). s/dirver/driver/ + +== New block driver interface == +We add three block driver interfaces to control block replication: +a. bdrv_start_replication() + Start block replication, called in migration/checkpoint thread. + We must call bdrv_start_replication() in secondary QEMU before + calling bdrv_start_replication() in primary QEMU. +b. bdrv_do_checkpoint() + This interface is called after all VM state is transfered to s/transfered/transferred/ + Secondary QEMU. The Disk buffer will be dropped in this interface. + The caller must hold the I/O mutex lock if it is in migration/checkpoint + thread. +c. bdrv_stop_replication() + It is called when failover. We will flush the Disk buffer into s/when/on/ + Secondary Disk and stop block replication. The vm should be stopped + before calling it. The caller must hold the I/O mutex lock if it is + in migration/checkpoint thread. + +== Usage == +Primary: + -drive if=xxx,driver=quorum,read-pattern=fifo,\ + children.0.file.filename=1.raw,\ + children.0.driver=raw,\ + children.1.file.driver=nbd+colo,\ + children.1.file.host=xxx,\ + children.1.file.port=xxx,\ + children.1.file.export=xxx,\ + children.1.driver=raw,\ + children.1.ignore-errors=on This command line looks like multiple arguments because of the leading whitespace on succeeding lines. I don't know if there is any better way to format it, though, to make it obvious that it is all a single argument to -drive. + Note: + 1. NBD Client should not be the first child of quorum. + 2. There should be only one NBD Client. + 3. host is the secondary physical machine's hostname or IP + 4. Each disk must have its own export name. Maybe a note 5 to call out the formatting aspect of the command line? + +Secondary: + -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \ + -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\ + backing_reference.drive_id=nbd_target1,\ + backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\ + backing_reference.hidden-disk.driver=qcow2,\ + backing_reference.hidden-disk.allow-write-backing-file=on + Then run qmp command: +nbd_server_start host:port + Note: + 1. The export name for the same disk must be the same in primary + and secondary QEMU command line + 2. The qmp command nbd_server_start must be run before running the + qmp command migrate on primary QEMU + 3. Don't use nbd_server_start's
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On 2015/3/26 20:30, Eric Blake wrote: On 03/26/2015 04:28 AM, Gonglei wrote: Grammar review only (I'll leave the technical review to others) +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. Space after comma in English writing. Yes, but I am not sure I can change it. HUAWEI always use this way. Copyright lines are one thing that I am reluctant to change if it is not my own line (some companies have policies on how their lines must look, and I'm not in a position to argue the policy as I am not a lawyer). You can find it in bootdevice.c Such existing precedence is a strong argument to NOT changing it, at least for a patch author that is not the copyright owner. Good catch, Eric is right. I will change all of this writing way in Qemu at 2.4. So, sounds like such a change would be a separate patch (probably through the -trivial tree), and cover all files in the repo in one go, without affecting this series. Such a patch by a copyright owner would have no problem being accepted, if it is wanted. Okay, will do, if it is not too later for rc2. :) Thanks, -Gonglei
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On 03/26/2015 04:28 AM, Gonglei wrote: Grammar review only (I'll leave the technical review to others) +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. Space after comma in English writing. Yes, but I am not sure I can change it. HUAWEI always use this way. Copyright lines are one thing that I am reluctant to change if it is not my own line (some companies have policies on how their lines must look, and I'm not in a position to argue the policy as I am not a lawyer). You can find it in bootdevice.c Such existing precedence is a strong argument to NOT changing it, at least for a patch author that is not the copyright owner. Good catch, Eric is right. I will change all of this writing way in Qemu at 2.4. So, sounds like such a change would be a separate patch (probably through the -trivial tree), and cover all files in the repo in one go, without affecting this series. Such a patch by a copyright owner would have no problem being accepted, if it is wanted. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On Wed, 03/25 17:36, Wen Congyang wrote: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..874ed8e --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,147 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +The block replication is used for continuous checkpoints. It is designed +for COLO that Secondary VM is running. It can also be applied for FT/HA +scene that Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoint. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content in the Disk buffer. Could you elaborate a bit about the existing sector content here? IIUC, it could be from either Secondary Write Requests or previous COW of Primary Write Requests. Is that right? +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement COLO block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary 2 NBD --- 2 NBD + disk client|| server virtio-blk + ||^ ^ +. ||| | +Primary | || Secondary disk - hidden-disk 4 - active-disk 3 +' ||| backing^ backing + ||| | + ||| | + ||'-' + || drive-backup
Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description
On 03/26/2015 02:31 PM, Fam Zheng wrote: On Wed, 03/25 17:36, Wen Congyang wrote: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..874ed8e --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,147 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +The block replication is used for continuous checkpoints. It is designed +for COLO that Secondary VM is running. It can also be applied for FT/HA +scene that Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoint. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content in the Disk buffer. Could you elaborate a bit about the existing sector content here? IIUC, it could be from either Secondary Write Requests or previous COW of Primary Write Requests. Is that right? Yes. +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement COLO block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary 2 NBD --- 2 NBD + disk client|| server virtio-blk + ||^ ^ +. ||| | +Primary | || Secondary disk - hidden-disk 4 - active-disk 3 +' ||| backing^ backing + ||| | + ||| | + ||'-'