Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Wen Congyang
On 03/25/2015 11:38 PM, Eric Blake wrote:
 On 03/25/2015 03:36 AM, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt

 
 Grammar review only (I'll leave the technical review to others)
 
 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 
 Space after comma in English writing.

Yes, but I am not sure I can change it. HUAWEI always use this way.
You can find it in bootdevice.c

Thanks
Wen Congyang

 
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 
 Sounds better as either of:
 The block replication feature is...
 Block replication is...
 
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 
 Please define COLO and FT/HA on first use (okay to abbreviate elsewhere
 in the document, but the first use should not assume the acronym is
 well-known)
 
 s/for COLO that/for COLO (COurse-grain LOck-stepping), where the/
 
 +scene that Secondary VM is not running.
 
 s/for FT/HA scene that/for the FT/HA (Fault-tolerance/High Assurance)
 scenario, where the/
 
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 
 s/checkpoint/checkpoints/
 
 +identical right after a VM checkpoint, but becomes different as the VM
 ...
 +
 +4) The hidden-disk is created automatically. It buffers the original content
 +that is modified by the primary VM. It should also be an empty disk, and
 +the dirver supports bdrv_make_empty().
 
 s/dirver/driver/
 
 +
 +== New block driver interface ==
 +We add three block driver interfaces to control block replication:
 +a. bdrv_start_replication()
 +   Start block replication, called in migration/checkpoint thread.
 +   We must call bdrv_start_replication() in secondary QEMU before
 +   calling bdrv_start_replication() in primary QEMU.
 +b. bdrv_do_checkpoint()
 +   This interface is called after all VM state is transfered to
 
 s/transfered/transferred/
 
 +   Secondary QEMU. The Disk buffer will be dropped in this interface.
 +   The caller must hold the I/O mutex lock if it is in migration/checkpoint
 +   thread.
 +c. bdrv_stop_replication()
 +   It is called when failover. We will flush the Disk buffer into
 
 s/when/on/
 
 +   Secondary Disk and stop block replication. The vm should be stopped
 +   before calling it. The caller must hold the I/O mutex lock if it is
 +   in migration/checkpoint thread.
 +
 +== Usage ==
 +Primary:
 +  -drive if=xxx,driver=quorum,read-pattern=fifo,\
 + children.0.file.filename=1.raw,\
 + children.0.driver=raw,\
 + children.1.file.driver=nbd+colo,\
 + children.1.file.host=xxx,\
 + children.1.file.port=xxx,\
 + children.1.file.export=xxx,\
 + children.1.driver=raw,\
 + children.1.ignore-errors=on
 
 This command line looks like multiple arguments because of the leading
 whitespace on succeeding lines.  I don't know if there is any better way
 to format it, though, to make it obvious that it is all a single
 argument to -drive.
 
 +  Note:
 +  1. NBD Client should not be the first child of quorum.
 +  2. There should be only one NBD Client.
 +  3. host is the secondary physical machine's hostname or IP
 +  4. Each disk must have its own export name.
 
 Maybe a note 5 to call out the formatting aspect of the command line?
 
 +
 +Secondary:
 +  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
 +  -drive if=xxx,driver=qcow2+colo,file=active_disk.qcow2,export=xxx,\
 + backing_reference.drive_id=nbd_target1,\
 + backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
 + backing_reference.hidden-disk.driver=qcow2,\
 + backing_reference.hidden-disk.allow-write-backing-file=on
 +  Then run qmp command:
 +nbd_server_start host:port
 +  Note:
 +  1. The export name for the same disk must be the same in primary
 + and secondary QEMU command line
 +  2. The qmp command nbd_server_start must be run before running the
 + qmp command migrate on primary QEMU
 +  3. Don't use nbd_server_start's 

Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Gonglei
On 2015/3/26 20:30, Eric Blake wrote:
 On 03/26/2015 04:28 AM, Gonglei wrote:
 
 Grammar review only (I'll leave the technical review to others)

 
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.

 Space after comma in English writing.
 Yes, but I am not sure I can change it. HUAWEI always use this way.
 
 Copyright lines are one thing that I am reluctant to change if it is not
 my own line (some companies have policies on how their lines must look,
 and I'm not in a position to argue the policy as I am not a lawyer).
 
 You can find it in bootdevice.c
 
 Such existing precedence is a strong argument to NOT changing it, at
 least for a patch author that is not the copyright owner.
 

 Good catch, Eric is right. I will change all of this writing way in Qemu at 
 2.4.
 
 So, sounds like such a change would be a separate patch (probably
 through the -trivial tree), and cover all files in the repo in one go,
 without affecting this series.  Such a patch by a copyright owner would
 have no problem being accepted, if it is wanted.
 
Okay, will do, if it is not too later for rc2. :)

Thanks,
-Gonglei




Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Eric Blake
On 03/26/2015 04:28 AM, Gonglei wrote:

 Grammar review only (I'll leave the technical review to others)


 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.

 Space after comma in English writing.
 Yes, but I am not sure I can change it. HUAWEI always use this way.

Copyright lines are one thing that I am reluctant to change if it is not
my own line (some companies have policies on how their lines must look,
and I'm not in a position to argue the policy as I am not a lawyer).

 You can find it in bootdevice.c

Such existing precedence is a strong argument to NOT changing it, at
least for a patch author that is not the copyright owner.

 
 Good catch, Eric is right. I will change all of this writing way in Qemu at 
 2.4.

So, sounds like such a change would be a separate patch (probably
through the -trivial tree), and cover all files in the repo in one go,
without affecting this series.  Such a patch by a copyright owner would
have no problem being accepted, if it is wanted.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication

2015-03-26 Thread Wen Congyang
On 03/27/2015 09:06 AM, Fam Zheng wrote:
 On Thu, 03/26 15:32, Wen Congyang wrote:
 On 03/26/2015 03:21 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  block/nbd.c | 49 +
  1 file changed, 49 insertions(+)

 diff --git a/block/nbd.c b/block/nbd.c
 index 3faf865..753b322 100644
 --- a/block/nbd.c
 +++ b/block/nbd.c
 @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
  bs-full_open_options = opts;
  }
  
 +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
 +  Error **errp)
 +{
 +BDRVNBDState *s = bs-opaque;
 +
 +/*
 + * TODO: support COLO_SECONDARY_MODE if we allow secondary
 + * QEMU becoming primary QEMU.
 + */
 +if (mode != COLO_MODE_PRIMARY) {
 +error_set(errp, QERR_INVALID_PARAMETER, mode);

 Please use error_setg. (Please grep the whole series :)

 Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.
 
 Because error classes are deprecated. See also commit 5b347c5410.

I see. Will fix it in the next version.

Thanks
Wen Congyang

 
 Fam
 .
 




Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Fam Zheng
On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt
 
 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 +scene that Secondary VM is not running.
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 +identical right after a VM checkpoint, but becomes different as the VM
 +executes till the next checkpoint. To support disk contents checkpoint,
 +the modified disk contents in the Secondary VM must be buffered, and are
 +only dropped at next checkpoint time. To reduce the network transportation
 +effort at the time of checkpoint, the disk modification operations of
 +Primary disk are asynchronously forwarded to the Secondary node.
 +
 +== Workflow ==
 +The following is the image of block replication workflow:
 +
 ++--+++
 +|Primary Write Requests||Secondary Write Requests|
 ++--+++
 +  |   |
 +  |  (4)
 +  |   V
 +  |  /-\
 +  |  Copy and Forward| |
 +  |-(1)--+   | Disk Buffer |
 +  |  |   | |
 +  | (3)  \-/
 +  | speculative  ^
 +  |write through(2)
 +  |  |   |
 +  V  V   |
 +   +--+   ++
 +   | Primary Disk |   | Secondary Disk |
 +   +--+   ++
 +
 +1) Primary write requests will be copied and forwarded to Secondary
 +   QEMU.
 +2) Before Primary write requests are written to Secondary disk, the
 +   original sector content will be read from Secondary disk and
 +   buffered in the Disk buffer, but it will not overwrite the existing
 +   sector content in the Disk buffer.

Could you elaborate a bit about the existing sector content here? IIUC, it
could be from either Secondary Write Requests or previous COW of Primary
Write Requests. Is that right?

 +3) Primary write requests will be written to Secondary disk.
 +4) Secondary write requests will be buffered in the Disk buffer and it
 +   will overwrite the existing sector content in the buffer.
 +
 +== Architecture ==
 +We are going to implement COLO block replication from many basic
 +blocks that are already in QEMU.
 +
 + virtio-blk   ||
 + ^||.--
 + |||| Secondary
 +1 Quorum  ||'--
 + /  \ ||
 +/\||
 +   Primary  2 NBD  ---  2 NBD
 + disk   client|| server  
virtio-blk
 +  ||^
 ^
 +. |||
 |
 +Primary | ||  Secondary disk - hidden-disk 4 
 - active-disk 3
 +' |||  backing^   backing
 +  ||| |
 +  ||| |
 +  ||'-'
 +  ||   drive-backup 

Re: [Qemu-block] [RFC PATCH COLO v2 12/13] skip nbd_target when starting block replication

2015-03-26 Thread Wen Congyang
On 03/26/2015 03:03 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  block.c | 18 ++
  1 file changed, 18 insertions(+)

 diff --git a/block.c b/block.c
 index bd7fa9c..3af5ad4 100644
 --- a/block.c
 +++ b/block.c
 @@ -6368,6 +6368,12 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
  void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error 
 **errp)
  {
  BlockDriver *drv = bs-drv;
 +Error *local_err = NULL;
 +
 +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, 
 local_err)) {
 +error_free(local_err);
 
 local_err is set but not used, just pass NULL. Same below.

Yes, will fix it in the next version.

Thanks
Wen Congyang

 
 +return;
 +}
  
  if (drv  drv-bdrv_start_replication) {
  drv-bdrv_start_replication(bs, mode, errp);
 @@ -6381,6 +6387,12 @@ void bdrv_start_replication(BlockDriverState *bs, 
 COLOMode mode, Error **errp)
  void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
  {
  BlockDriver *drv = bs-drv;
 +Error *local_err = NULL;
 +
 +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, 
 local_err)) {
 +error_free(local_err);
 +return;
 +}
  
  if (drv  drv-bdrv_do_checkpoint) {
  drv-bdrv_do_checkpoint(bs, errp);
 @@ -6394,6 +6406,12 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error 
 **errp)
  void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
  {
  BlockDriver *drv = bs-drv;
 +Error *local_err = NULL;
 +
 +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, 
 local_err)) {
 +error_free(local_err);
 +return;
 +}
  
  if (drv  drv-bdrv_stop_replication) {
  drv-bdrv_stop_replication(bs, errp);
 -- 
 2.1.0

 .
 




Re: [Qemu-block] [RFC PATCH COLO v2 01/13] docs: block replication's description

2015-03-26 Thread Wen Congyang
On 03/26/2015 02:31 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  docs/block-replication.txt | 147 
 +
  1 file changed, 147 insertions(+)
  create mode 100644 docs/block-replication.txt

 diff --git a/docs/block-replication.txt b/docs/block-replication.txt
 new file mode 100644
 index 000..874ed8e
 --- /dev/null
 +++ b/docs/block-replication.txt
 @@ -0,0 +1,147 @@
 +Block replication
 +
 +Copyright Fujitsu, Corp. 2015
 +Copyright (c) 2015 Intel Corporation
 +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
 +
 +This work is licensed under the terms of the GNU GPL, version 2 or later.
 +See the COPYING file in the top-level directory.
 +
 +The block replication is used for continuous checkpoints. It is designed
 +for COLO that Secondary VM is running. It can also be applied for FT/HA
 +scene that Secondary VM is not running.
 +
 +This document gives an overview of block replication's design.
 +
 +== Background ==
 +High availability solutions such as micro checkpoint and COLO will do
 +consecutive checkpoint. The VM state of Primary VM and Secondary VM is
 +identical right after a VM checkpoint, but becomes different as the VM
 +executes till the next checkpoint. To support disk contents checkpoint,
 +the modified disk contents in the Secondary VM must be buffered, and are
 +only dropped at next checkpoint time. To reduce the network transportation
 +effort at the time of checkpoint, the disk modification operations of
 +Primary disk are asynchronously forwarded to the Secondary node.
 +
 +== Workflow ==
 +The following is the image of block replication workflow:
 +
 ++--+++
 +|Primary Write Requests||Secondary Write Requests|
 ++--+++
 +  |   |
 +  |  (4)
 +  |   V
 +  |  /-\
 +  |  Copy and Forward| |
 +  |-(1)--+   | Disk Buffer |
 +  |  |   | |
 +  | (3)  \-/
 +  | speculative  ^
 +  |write through(2)
 +  |  |   |
 +  V  V   |
 +   +--+   ++
 +   | Primary Disk |   | Secondary Disk |
 +   +--+   ++
 +
 +1) Primary write requests will be copied and forwarded to Secondary
 +   QEMU.
 +2) Before Primary write requests are written to Secondary disk, the
 +   original sector content will be read from Secondary disk and
 +   buffered in the Disk buffer, but it will not overwrite the existing
 +   sector content in the Disk buffer.
 
 Could you elaborate a bit about the existing sector content here? IIUC, it
 could be from either Secondary Write Requests or previous COW of Primary
 Write Requests. Is that right?

Yes.

 
 +3) Primary write requests will be written to Secondary disk.
 +4) Secondary write requests will be buffered in the Disk buffer and it
 +   will overwrite the existing sector content in the buffer.
 +
 +== Architecture ==
 +We are going to implement COLO block replication from many basic
 +blocks that are already in QEMU.
 +
 + virtio-blk   ||
 + ^||.--
 + |||| Secondary
 +1 Quorum  ||'--
 + /  \ ||
 +/\||
 +   Primary  2 NBD  ---  2 NBD
 + disk   client|| server 
 virtio-blk
 +  ||^   
  ^
 +. |||   
  |
 +Primary | ||  Secondary disk - hidden-disk 4 
 - active-disk 3
 +' |||  backing^   
 backing
 +  ||| |
 +  ||| |
 +  ||'-'

Re: [Qemu-block] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS

2015-03-26 Thread Wen Congyang
On 03/26/2015 03:18 PM, Fam Zheng wrote:
 On Thu, 03/26 15:14, Wen Congyang wrote:
 On 03/26/2015 03:07 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 When opening BDS, we need to create backup jobs for
 image-fleecing.

 How does this commit message relate to the Makefile change?

 Hmm, we need to use backup API in block.c, and block.o will
 be used by qemu-img which doesn't use common-obj.
 
 I see. How about adding the referenced functions to stubs/?

Good idea. I will try it.

Thanks
Wen Congyang

 
 Fam
 

 Thanks
 Wen Congyang



 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Cc: Jeff Cody jc...@redhat.com
 ---
  block/Makefile.objs | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index db2933e..0950973 100644
 --- a/block/Makefile.objs
 +++ b/block/Makefile.objs
 @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
  block-obj-$(CONFIG_LIBSSH2) += ssh.o
  block-obj-y += accounting.o
  block-obj-y += write-threshold.o
 +block-obj-y += backup.o
  
  common-obj-y += stream.o
  common-obj-y += commit.o
 -common-obj-y += backup.o
  
  iscsi.o-cflags := $(LIBISCSI_CFLAGS)
  iscsi.o-libs   := $(LIBISCSI_LIBS)
 -- 
 2.1.0

 .


 .
 




Re: [Qemu-block] [RFC PATCH COLO v2 04/13] Add new block driver interfaces to control block replication

2015-03-26 Thread Wen Congyang
On 03/26/2015 03:12 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Cc: Luiz Capitulino lcapitul...@redhat.com
 Cc: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  block.c   | 39 +++
  include/block/block.h |  4 
  include/block/block_int.h | 11 +++
  qapi/block.json   | 16 
  4 files changed, 70 insertions(+)

 diff --git a/block.c b/block.c
 index 0fe97de..0ff5cf8 100644
 --- a/block.c
 +++ b/block.c
 @@ -6196,3 +6196,42 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
  {
  return bs-stats;
  }
 +
 +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error 
 **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_start_replication) {
 +drv-bdrv_start_replication(bs, mode, errp);
 +} else if (bs-file) {
 +bdrv_start_replication(bs-file, mode, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 
 I think we should use error_setg in new code? (The same to following ones)

Hmm, do you mean that don't use QERR_UNSUPPORTED here?

 
 +}
 +}
 +
 +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_do_checkpoint) {
 +drv-bdrv_do_checkpoint(bs, errp);
 +} else if (bs-file) {
 +bdrv_do_checkpoint(bs-file, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 +}
 +}
 +
 +void bdrv_stop_replication(BlockDriverState *bs, Error **errp)
 +{
 +BlockDriver *drv = bs-drv;
 +
 +if (drv  drv-bdrv_stop_replication) {
 +drv-bdrv_stop_replication(bs, errp);
 +} else if (bs-file) {
 +bdrv_stop_replication(bs-file, errp);
 +} else {
 +error_set(errp, QERR_UNSUPPORTED);
 +}
 +}
 diff --git a/include/block/block.h b/include/block/block.h
 index 4c57d63..68f3b1a 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -569,4 +569,8 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
  
  BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
  
 +void bdrv_start_replication(BlockDriverState *bs, COLOMode mode, Error 
 **errp);
 +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 +void bdrv_stop_replication(BlockDriverState *bs, Error **errp);
 +
  #endif
 diff --git a/include/block/block_int.h b/include/block/block_int.h
 index dccb092..08dd8ba 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
 @@ -290,6 +290,17 @@ struct BlockDriver {
   */
  int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
  
 +
 +void (*bdrv_start_replication)(BlockDriverState *bs, COLOMode mode,
 +   Error **errp);
 
 Need some documentation, but I have a generic question:
 
 Why is a single interface with modes better than different functions for each
 mode (bdrv_start_replication_{primary,secondary}? Asking because the behavior
 is very different between them, and I don't see much sharing -- you implement
 primary operation in quorum, and secondary in qcow2+colo.

No special reason.

 
 +/* Drop Disk buffer when doing checkpoint. */
 +void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
 +/*
 + * After failover, we should flush Disk buffer into secondary disk
 + * and stop block replication.
 + */
 +void (*bdrv_stop_replication)(BlockDriverState *bs, Error **errp);
 +
  QLIST_ENTRY(BlockDriver) list;
  };
  
 diff --git a/qapi/block.json b/qapi/block.json
 index e313465..e640566 100644
 --- a/qapi/block.json
 +++ b/qapi/block.json
 @@ -40,6 +40,22 @@
'data': ['auto', 'none', 'lba', 'large', 'rechs']}
  
  ##
 +# @COLOMode
 +#
 +# An enumeration of COLO mode.
 +#
 +# @unprotected: COLO is not started or after failover
 +#
 +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
 +#
 +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
 +#
 +# Since: 2.4
 +##
 +{ 'enum' : 'COLOMode',
 +  'data' : ['unprotected', 'primary', 'secondary']}
 
 If split bdrv_start_replication, do we still need an enum? I can't find the
 usage in QMP interface, is it in some other series?

I will check it.

Thanks
Wen Congyang

 
 Fam
 
 +
 +##
  # @BlockdevSnapshotInternal
  #
  # @device: the name of the device to generate the snapshot from
 -- 
 2.1.0

 .
 




Re: [Qemu-block] [RFC PATCH COLO v2 08/13] Allow creating backup jobs when opening BDS

2015-03-26 Thread Fam Zheng
On Thu, 03/26 15:14, Wen Congyang wrote:
 On 03/26/2015 03:07 PM, Fam Zheng wrote:
  On Wed, 03/25 17:36, Wen Congyang wrote:
  When opening BDS, we need to create backup jobs for
  image-fleecing.
  
  How does this commit message relate to the Makefile change?
 
 Hmm, we need to use backup API in block.c, and block.o will
 be used by qemu-img which doesn't use common-obj.

I see. How about adding the referenced functions to stubs/?

Fam

 
 Thanks
 Wen Congyang
 
  
 
  Signed-off-by: Wen Congyang we...@cn.fujitsu.com
  Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
  Signed-off-by: Gonglei arei.gong...@huawei.com
  Cc: Jeff Cody jc...@redhat.com
  ---
   block/Makefile.objs | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/block/Makefile.objs b/block/Makefile.objs
  index db2933e..0950973 100644
  --- a/block/Makefile.objs
  +++ b/block/Makefile.objs
  @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
   block-obj-$(CONFIG_LIBSSH2) += ssh.o
   block-obj-y += accounting.o
   block-obj-y += write-threshold.o
  +block-obj-y += backup.o
   
   common-obj-y += stream.o
   common-obj-y += commit.o
  -common-obj-y += backup.o
   
   iscsi.o-cflags := $(LIBISCSI_CFLAGS)
   iscsi.o-libs   := $(LIBISCSI_LIBS)
  -- 
  2.1.0
 
  .
  
 



Re: [Qemu-block] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication

2015-03-26 Thread Fam Zheng
On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  block/nbd.c | 49 +
  1 file changed, 49 insertions(+)
 
 diff --git a/block/nbd.c b/block/nbd.c
 index 3faf865..753b322 100644
 --- a/block/nbd.c
 +++ b/block/nbd.c
 @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
  bs-full_open_options = opts;
  }
  
 +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
 +  Error **errp)
 +{
 +BDRVNBDState *s = bs-opaque;
 +
 +/*
 + * TODO: support COLO_SECONDARY_MODE if we allow secondary
 + * QEMU becoming primary QEMU.
 + */
 +if (mode != COLO_MODE_PRIMARY) {
 +error_set(errp, QERR_INVALID_PARAMETER, mode);

Please use error_setg. (Please grep the whole series :)

Fam



Re: [Qemu-block] [RFC PATCH COLO v2 06/13] NBD client: implement block driver interfaces for block replication

2015-03-26 Thread Wen Congyang
On 03/26/2015 03:21 PM, Fam Zheng wrote:
 On Wed, 03/25 17:36, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  block/nbd.c | 49 +
  1 file changed, 49 insertions(+)

 diff --git a/block/nbd.c b/block/nbd.c
 index 3faf865..753b322 100644
 --- a/block/nbd.c
 +++ b/block/nbd.c
 @@ -458,6 +458,52 @@ static void nbd_refresh_filename(BlockDriverState *bs)
  bs-full_open_options = opts;
  }
  
 +static void nbd_start_replication(BlockDriverState *bs, COLOMode mode,
 +  Error **errp)
 +{
 +BDRVNBDState *s = bs-opaque;
 +
 +/*
 + * TODO: support COLO_SECONDARY_MODE if we allow secondary
 + * QEMU becoming primary QEMU.
 + */
 +if (mode != COLO_MODE_PRIMARY) {
 +error_set(errp, QERR_INVALID_PARAMETER, mode);
 
 Please use error_setg. (Please grep the whole series :)

Why? QERR_INVALID_PARAMETER includes ERROR_CLASS_GENERIC_ERROR.

Thanks
Wen Congyang

 
 Fam
 .
 




Re: [Qemu-block] [RFC PATCH COLO v2 09/13] block: Parse backing_reference option to reference existing BDS

2015-03-26 Thread Fam Zheng
On Wed, 03/25 17:36, Wen Congyang wrote:
 Usage:
 -drive file=xxx,id=Y, \
 -drive 
 file=,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*
 
 It will create such backing chain:
{virtio-blk dev 'Y'}  
 {virtio-blk dev 'X'}
  |
   |
  |
   |
  v
   v
 
 [base] - [mid] - ( Y )  - (hidden target) 
 --- ( X )
 
  v  ^
  v  ^
  v  ^
  v  ^
   drive-backup sync=none 
 
 X's backing file is hidden-disk, and hidden-disk's backing file is Y.
 Disk Y may be opened or reopened in read-write mode, so A block backup
 job is automatically created: source is Y and target is hidden-disk.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
  block.c| 145 
 -
  include/block/block.h  |   1 +
  include/block/block_int.h  |   1 +
  tests/qemu-iotests/051 |  13 
  tests/qemu-iotests/051.out |  13 
  5 files changed, 170 insertions(+), 3 deletions(-)
 
 diff --git a/block.c b/block.c
 index b4d629e..bd7fa9c 100644
 --- a/block.c
 +++ b/block.c
 @@ -1351,6 +1351,113 @@ free_exit:
  return ret;
  }
  
 +static void backing_reference_completed(void *opaque, int ret)
 +{
 +BlockDriverState *hidden_disk = opaque;
 +
 +assert(!hidden_disk-backing_reference);
 +}
 +
 +static int bdrv_open_backing_reference_file(BlockDriverState *bs,
 +QDict *options, Error **errp)
 +{
 +const char *backing_name;
 +QDict *hidden_disk_options = NULL;
 +BlockDriverState *backing_hd, *hidden_disk;
 +BlockBackend *backing_blk;
 +Error *local_err = NULL;
 +int ret = 0;
 +
 +backing_name = qdict_get_try_str(options, drive_id);
 +if (!backing_name) {
 +error_setg(errp, Backing reference needs option drive_id);
 +ret = -EINVAL;
 +goto free_exit;
 +}
 +qdict_del(options, drive_id);
 +
 +qdict_extract_subqdict(options, hidden_disk_options, hidden-disk.);
 +if (!qdict_size(hidden_disk_options)) {
 +error_setg(errp, Backing reference needs option hidden-disk.*);
 +ret = -EINVAL;
 +goto free_exit;
 +}
 +
 +if (qdict_size(options)) {
 +const QDictEntry *entry = qdict_first(options);
 +error_setg(errp, Backing reference used by '%s' doesn't support 
 +   the option '%s', bdrv_get_device_name(bs), entry-key);
 +ret = -EINVAL;
 +goto free_exit;
 +}
 +
 +backing_blk = blk_by_name(backing_name);
 +if (!backing_blk) {
 +error_set(errp, QERR_DEVICE_NOT_FOUND, backing_name);
 +ret = -ENOENT;
 +goto free_exit;
 +}
 +
 +backing_hd = blk_bs(backing_blk);
 +/* Backing reference itself? */
 +if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
 +error_setg(errp, Backing reference itself);
 +ret = -EINVAL;
 +goto free_exit;
 +}
 +
 +if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
 +   errp)) {
 +ret = -EBUSY;
 +goto free_exit;
 +}
 +
 +/* hidden-disk is bs's backing file */
 +ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
 +hidden_disk_options = NULL;
 +if (ret  0) {
 +goto free_exit;
 +}
 +
 +hidden_disk = bs-backing_hd;
 +if (!hidden_disk-drv || !hidden_disk-drv-supports_backing) {
 +ret = -EINVAL;
 +error_setg(errp, Hidden disk's driver doesn't support backing 
 files);
 +goto free_exit;
 +}
 +
 +bdrv_set_backing_hd(hidden_disk, backing_hd);
 +bdrv_ref(backing_hd);
 +
 +/*
 + * backing hd may be opened or reopened in read-write mode, so we
 + * should backup backing hd to hidden disk
 + */
 +bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
 +bs-backing_blocker);
 +bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
 +hidden_disk-backing_blocker);
 +
 +bdrv_ref(hidden_disk);
 +backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE,
 + BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
 + backing_reference_completed, hidden_disk, local_err);

We need to be careful, the backing_hd may not be safe to access, depending on
which AioContext it is running on. At 

[Qemu-block] Enable debugging in a running vServer

2015-03-26 Thread Peter Lieven

Hi Block people,

we recently observed some strange I/O stalls on some vServers. I suspect a bug 
in the target and
already added some debugging output to libiscsi that could have helped to track 
the issue.

However, to enable this debugging I would need to call

iscsi_set_log_level

during runtime from the hmp or qmp.

I wonder what would be the best way to do this and/or if it would be 
interesting to have a generic
way to tell a block backend to enter debugging whereas I would leave it up to 
the backend driver
what exactly that means?

Other option would be to set an enviroment variable during runtime. But as far 
as I know thats
not possible.

Any thoughts, thank you,
Peter