Re: [Qemu-block] [Qemu-devel] [Patch v12 00/10] Block replication for continuous checkpoints

2015-12-01 Thread Hailiang Zhang

On 2015/12/1 18:40, Dr. David Alan Gilbert wrote:

* Wen Congyang (we...@cn.fujitsu.com) wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04949.html
2. http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg06043.html

You can get the patch here:
https://github.com/coloft/qemu/tree/wency/block-replication-v12

You can get the patch with framework here:
https://github.com/coloft/qemu/tree/wency/colo_framework_v11.2


Neither of these links work for me, and I see that  only messages 0..7 in the
series hit the list.



Hi Dave,

You can refer to https://github.com/coloft/qemu/tree/colo-v2.2-periodic-mode,
The block replication part in this link is also the newest version.

Congyang has deleted this confused branch, we will pay attention to this later 
in next version.

Thanks,
Hailiang





TODO:
1. Continuous block replication. It will be started after basic functions
are accepted.

Changs Log:
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
   unblock backup operations in backing file
   Store parent BDS in BdrvChild
   Backup: clear all bitmap when doing block checkpoint
   Allow creating backup jobs when opening BDS
   docs: block replication's description
   Add new block driver interfaces to control block replication
   quorum: implement block driver interfaces for block replication
   Implement new driver for block replication
   support replication driver in blockdev-add
   Add a new API to start/stop replication, do checkpoint to all BDSes

  block.c| 145 
  block/Makefile.objs|   3 +-
  block/backup.c |  14 ++
  block/quorum.c |  78 +++
  block/replication.c| 549 +
  blockjob.c |  11 +
  docs/block-replication.txt | 227 +++
  include/block/block.h  |   9 +
  include/block/block_int.h  |  15 ++
  include/block/blockjob.h   |  12 +
  qapi/block-core.json   |  34 ++-
  11 files changed, 1093 insertions(+), 4 deletions(-)
  create mode 100644 block/replication.c
  create mode 100644 docs/block-replication.txt

--
2.5.0




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.







Re: [Qemu-block] [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error

2015-12-22 Thread Hailiang Zhang

On 2015/12/19 18:02, Markus Armbruster wrote:

Copying qemu-block because this seems related to generalising block jobs
to background jobs.



Er, this event just used to help users to know what happened to VM with COLO FT
on. If users get this event, they can make further check what's wrong, and
decide which side should take over the work.


zhanghailiang  writes:


If some errors happen during VM's COLO FT stage, it's important to notify the 
users
of this event. Together with 'colo_lost_heartbeat', users can intervene in 
COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster 
Cc: Michael Roth 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v11:
- Fix several typos found by Eric

Signed-off-by: zhanghailiang 
---
  docs/qmp-events.txt | 17 +
  migration/colo.c| 11 +++
  qapi-schema.json| 16 
  qapi/event.json | 17 +
  4 files changed, 61 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index d2f1ce4..19f68fc 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,23 @@ Example:
  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
  event.

+COLO_EXIT
+-
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.


How would the event's recipient distinguish between "due to error" and
"at the user's request"?



If they get this event with 'reason' is 'request', it is 'at the user's 
request',
Or, it will be 'due to error' (The key for 'reason' will be 'error', and we 
have an optional
error message which may help to figure out what happened.)


+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason":  the exit reason, internal error or external request. 
(json-string)
+ - "error": error message (json-string, operation)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+


Pardon my ignorance again...  Does "VM finishes COLO mode" means have
some kind of COLO background job, and it just finished for whatever
reason?



As above, what i have said.


If yes, this COLO job could be an instance of the general background job
concept we're trying to grow from the existing block job concept.

I'm not asking you to rebase your work onto the background job
infrastructure, not least for the simple reason that it doesn't exist,
yet.  But I think it would be fruitful to compare your COLO job
management QMP interface with the one we have for block jobs.  Not only
may that avoid unnecessary inconsistency, it could also help shape the
general background job interface.



Interesting, i'm not quite familiar with this block background job 
infrastructure.
If we consider COLO FT as a background job, we can certainly use it. I will 
have a look
at it.


Quick overview of the block job QMP interface:

* Commands to create a job: block-commit, block-stream, drive-mirror,
   drive-backup.

* Get information on jobs: query-block-jobs

* Pause a job: block-job-pause

* Resume a job: block-job-resume

* Cancel a job: block-job-cancel

* Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED

* Block job error event: BLOCK_JOB_ERROR

* Block job synchronous completion: event BLOCK_JOB_READY and command
   block-job-complete


  DEVICE_DELETED
  --

diff --git a/migration/colo.c b/migration/colo.c
index d1dd4e1..d06c14f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@
  #include "qemu/error-report.h"
  #include "qemu/sockets.h"
  #include "migration/failover.h"
+#include "qapi-event.h"

  /* colo buffer */
  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
  out:
  if (ret < 0) {
  error_report("%s: %s", __func__, strerror(-ret));
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
+  true, strerror(-ret), NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
+  false, NULL, NULL);
  }

  qsb_free(buffer);
@@ -516,6 +522,11 @@ out:
  if (ret < 0) {
  error_report("colo incoming thread will exit, detect error: %s",
   strerror(-ret));
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
+  true, strerror(-ret), NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY, 
COLO_EXIT_REASON_REQUEST,
+  false, NULL, NULL);
  }

  if (fb) {
diff --git a/qapi-schema.json b/qapi-schema.json
index feb7d53..f6ecb88 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -778

Re: [Qemu-block] [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error

2015-12-22 Thread Hailiang Zhang

On 2015/12/22 5:14, John Snow wrote:



On 12/19/2015 05:02 AM, Markus Armbruster wrote:

Copying qemu-block because this seems related to generalising block jobs
to background jobs.

zhanghailiang  writes:


If some errors happen during VM's COLO FT stage, it's important to notify the 
users
of this event. Together with 'colo_lost_heartbeat', users can intervene in 
COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster 
Cc: Michael Roth 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v11:
- Fix several typos found by Eric

Signed-off-by: zhanghailiang 
---
  docs/qmp-events.txt | 17 +
  migration/colo.c| 11 +++
  qapi-schema.json| 16 
  qapi/event.json | 17 +
  4 files changed, 61 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index d2f1ce4..19f68fc 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,23 @@ Example:
  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
  event.

+COLO_EXIT
+-
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.


How would the event's recipient distinguish between "due to error" and
"at the user's request"?


+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason":  the exit reason, internal error or external request. 
(json-string)
+ - "error": error message (json-string, operation)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+


Pardon my ignorance again...  Does "VM finishes COLO mode" means have
some kind of COLO background job, and it just finished for whatever
reason?

If yes, this COLO job could be an instance of the general background job
concept we're trying to grow from the existing block job concept.

I'm not asking you to rebase your work onto the background job
infrastructure, not least for the simple reason that it doesn't exist,
yet.  But I think it would be fruitful to compare your COLO job
management QMP interface with the one we have for block jobs.  Not only
may that avoid unnecessary inconsistency, it could also help shape the
general background job interface.



Yes. The "background job" concept doesn't exist in a formal way outside
of the block layer yet, but we're looking to expand it as we re-tool the
block jobs themselves.

It may be the case that the COLO commands and events need to go in as
they are now, but later we can bring them back into the generalized job
infrastructure.



Agreed. ;)


Quick overview of the block job QMP interface:

* Commands to create a job: block-commit, block-stream, drive-mirror,
   drive-backup.

* Get information on jobs: query-block-jobs

* Pause a job: block-job-pause

* Resume a job: block-job-resume

* Cancel a job: block-job-cancel

* Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED

* Block job error event: BLOCK_JOB_ERROR

* Block job synchronous completion: event BLOCK_JOB_READY and command
   block-job-complete



The block-agnostic version of these commands would likely be:

query-jobs
job-pause
job-resume
job-cancel
job-complete

Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY.


It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it
occurred due to an error, we'd also see JOB_ERROR emitted.



Yes, if we use this job frame for COLO, the COLO_EXIT will be like that.


  DEVICE_DELETED
  --

diff --git a/migration/colo.c b/migration/colo.c
index d1dd4e1..d06c14f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@
  #include "qemu/error-report.h"
  #include "qemu/sockets.h"
  #include "migration/failover.h"
+#include "qapi-event.h"

  /* colo buffer */
  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
  out:
  if (ret < 0) {
  error_report("%s: %s", __func__, strerror(-ret));
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
+  true, strerror(-ret), NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
+  false, NULL, NULL);
  }

  qsb_free(buffer);
@@ -516,6 +522,11 @@ out:
  if (ret < 0) {
  error_report("colo incoming thread will exit, detect error: %s",
   strerror(-ret));
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
+  true, strerror(-ret), NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY, 
COLO_EXIT_REASON_REQUEST,
+  false, NULL, NULL);
  }

  if (fb) {
diff --git

Re: [Qemu-block] [PATCH v14 0/8] Block replication for continuous checkpoints

2016-01-19 Thread Hailiang Zhang

ping...

It seems that there is no feedback for a long time, we hope COLO prototype could
be merged in QEMU 2.6, it depends on this series, so please help us.

Thanks.
zhanghailiang

On 2016/1/14 9:12, Changlong Xie wrote:

It seems i missed someone in CC list, add them.

Thanks
 -Xie

On 01/13/2016 05:18 PM, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v14

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v13

TODO:
1. Continuous block replication. It will be started after basic functions
are accepted.

Changs Log:
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (8):
   unblock backup operations in backing file
   Store parent BDS in BdrvChild
   Backup: clear all bitmap when doing block checkpoint
   Allow creating backup jobs when opening BDS
   docs: block replication's description
   auto complete active commit
   Implement new driver for block replication
   support replication driver in blockdev-add

  block.c  |  19 ++
  block/Makefile.objs  |   3 +-
  block/backup.c   |  14 +
  block/mirror.c   |  13 +-
  block/replication-comm.c |  66 +
  block/replication.c  | 590 +++
  blockdev.c   |   2 +-
  blockjob.c   |  11 +
  docs/block-replication.txt   | 229 +++
  include/block/block_int.h|   4 +-
  include/block/blockjob.h |  12 +
  include/block/replication-comm.h |  50 
  qapi/block-core.json |  33 ++-
  qemu-img.c   |   2 +-
  14 files changed, 1038 insertions(+), 10 deletions(-)
  create mode 100644 block/replication-comm.c
  create mode 100644 block/replication.c
  create mode 100644 docs/block-replication.txt
  create mode 100644 include/block/replication-comm.h





.







Re: [Qemu-block] [Qemu-devel] [PATCH RFC 7/7] nbd/replication: implement .bdrv_get_info() for nbd and replication driver

2016-10-23 Thread Hailiang Zhang

Hi,

On 2016/10/20 23:34, Eric Blake wrote:

On 10/20/2016 08:57 AM, zhanghailiang wrote:

Without this callback, there will be an error reports in the primary side:
"qemu-system-x86_64: Couldn't determine the cluster size of the target image,
which has no backing file: Operation not supported
Aborting, since this may create an unusable destination image"

For nbd driver, it doesn't have cluster size, so here we return
a fake value for it.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
---
  block/nbd.c | 12 
  block/replication.c |  6 ++
  2 files changed, 18 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 6bc06d6..96d7023 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -40,6 +40,8 @@

  #define EN_OPTSTR ":exportname="

+#define NBD_FAKE_CLUSTER_SIZE 512


Why 512?  NBD allows byte-addressable operations (even if it is more
efficient on aligned I/O); and I've been working hard to convert things
to the point that NBD does not enforce alignment on other layers.
Wouldn't 1 be better?



Yes, that makes no difference for block replication driver. :)


+static int nbd_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+bdi->cluster_size  = NBD_FAKE_CLUSTER_SIZE;


I also have patches written (but waiting for NBD write zeroes support to
be reviewed first) that add support for the experimental NBD block info,
that lets a server advertise actual sizes to the client rather than
having to guess.  Here's the last time I posted a preview of it:

https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03567.html

It would be nice to use that instead of just faking things.



That's great, that's what we really want, here it is just a
temporary solution, I'll drop this patch after you nbd patch been merged.

Thanks,
Hailiang




Re: [Qemu-block] [PATCH 1/1] migration: fix compiler warning on uninitialized variable

2016-10-31 Thread Hailiang Zhang

On 2016/11/1 5:50, Jeff Cody wrote:

Some older GCC versions (e.g. 4.4.7) report a warning on an
uninitialized variable for 'request', even though all possible code
paths that reference 'request' will be initialized.   To appease
these versions, initialize the variable to 0.



Thanks for reporting this,  I think this patch can go through trivial
branch.

Cc: Michael Tokarev 
Cc: qemu-triv...@nongnu.org


Reported-by: Mark Cave-Ayland 
Signed-off-by: Jeff Cody 


Reviewed-by: zhanghailiang 


---
  migration/colo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index e7224b8..93c85c5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -439,7 +439,7 @@ void *colo_process_incoming_thread(void *opaque)
  }

  while (mis->state == MIGRATION_STATUS_COLO) {
-int request;
+int request = 0;

  colo_wait_handle_message(mis->from_src_file, &request, &local_err);
  if (local_err) {






Re: [Qemu-block] [PATCH RFC 0/7] COLO block replication supports shared disk case

2016-11-22 Thread Hailiang Zhang

On 2016/11/22 18:33, Stefan Hajnoczi wrote:

On Thu, Oct 20, 2016 at 09:57:33PM +0800, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it.

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

  virtio-blk ||   
.--
  /  ||   | 
Secondary
 /   ||   
'--
/|| 
virtio-blk
   / || 
 |
   | ||   
replication(5)
   |NBD  >   NBD   (2)  
 |
   |  client ||server ---> hidden disk <-- 
active disk(4)
   | ^   ||  |
   |  replication(1) ||  |
   | |   ||  |
   |   +-'   ||  |
  (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
   V |   |
+---+|
|   shared disk | <--+
+---+
1) Primary writes will read original data and forward it to Secondary
QEMU.
2) The hidden-disk will buffers the original content that is modified
by the primary VM. It should also be an empty disk, and
the driver supports bdrv_make_empty() and backing file.
3) Primary write requests will be written to Shared disk.
4) Secondary write requests will be buffered in the active disk and it
will overwrite the existing sector content in the buffe


This design looks good.  I have not reviewed the patches in detail but
will review the next revision.



Thanks very much, I'll update it with the recent upstream. :)


Stefan






Re: [Qemu-block] [PATCH RFC 1/7] docs/block-replication: Add description for shared-disk case

2016-11-27 Thread Hailiang Zhang


On 2016/10/25 17:03, Changlong Xie wrote:

On 10/20/2016 09:57 PM, zhanghailiang wrote:

Introuduce the scenario of shared-disk block replication
and how to use it.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
   docs/block-replication.txt | 131 
+++--
   1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..97fcfc1 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network 
transportation
   effort during a vmstate checkpoint, the disk modification operations of
   the Primary disk are asynchronously forwarded to the Secondary node.

-== Workflow ==
+== Non-shared disk workflow ==
   The following is the image of block replication workflow:

   +--+++
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
   4) Secondary write requests will be buffered in the Disk buffer and it
  will overwrite the existing sector content in the buffer.

-== Architecture ==
+== None-shared disk architecture ==


s/None-shared/Non-shared/g




   We are going to implement block replication from many basic
   blocks that are already in QEMU.

@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative 
write-through
   of the NBD server into the secondary disk. So before block replication,
   the primary disk and secondary disk should contain the same data.

+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  | (2)Forward and write through | |
+  | +--> | Disk Buffer |
+  | || |
+  | |\-/
+  | |(1)read   |
+  | |  |
+   (3)write   | |  | backing file
+  V |  |
+ +-+   |
+ | Shared Disk | <-+
+ +-+
+
+1) Primary writes will read original data and forward it to Secondary
+   QEMU.
+2) Before Primary write requests are written to Shared disk, the
+   original sector content will be read from Shared disk and
+   forwarded and buffered in the Disk buffer on the secondary site,
+   but it will not overwrite the existing


extra spaces at the end of line




+   sector content(it could be from either "Secondary Write Requests" or


Need a space before "(" for better style.




+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Shared disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk ||   
.--
+ /  ||   | 
Secondary
+/   ||   
'--
+   /|| 
virtio-blk
+  / || 
 |
+  | ||   
replication(5)
+  |NBD  >   NBD   (2)  
 |
+  |  client ||server ---> hidden disk <-- 
active disk(4)
+  | ^   ||  |
+  |  replication(1) ||  |
+  | |   ||  |
+  |   +-'   ||  |
+ (3)  |drive-backup sync=none   ||  |
+. |   +-+   ||  |
+Primary | | |   ||   backing|
+' | |  

Re: [Qemu-block] [PATCH RFC 1/7] docs/block-replication: Add description for shared-disk case

2016-11-27 Thread Hailiang Zhang

On 2016/11/28 14:00, Changlong Xie wrote:

On 11/28/2016 01:13 PM, Hailiang Zhang wrote:


On 2016/10/25 17:03, Changlong Xie wrote:

On 10/20/2016 09:57 PM, zhanghailiang wrote:

Introuduce the scenario of shared-disk block replication
and how to use it.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
docs/block-replication.txt | 131
+++--
1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..97fcfc1 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the
network transportation
effort during a vmstate checkpoint, the disk modification
operations of
the Primary disk are asynchronously forwarded to the Secondary node.

-== Workflow ==
+== Non-shared disk workflow ==
The following is the image of block replication workflow:

+--+
++
@@ -57,7 +57,7 @@ The following is the image of block replication
workflow:
4) Secondary write requests will be buffered in the Disk
buffer and it
   will overwrite the existing sector content in the buffer.

-== Architecture ==
+== None-shared disk architecture ==


s/None-shared/Non-shared/g




We are going to implement block replication from many basic
blocks that are already in QEMU.

@@ -106,6 +106,74 @@ any state that would otherwise be lost by the
speculative write-through
of the NBD server into the secondary disk. So before block
replication,
the primary disk and secondary disk should contain the same data.

+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  | (2)Forward and write through | |
+  | +--> | Disk Buffer |
+  | || |
+  | |\-/
+  | |(1)read   |
+  | |  |
+   (3)write   | |  | backing file
+  V |  |
+ +-+   |
+ | Shared Disk | <-+
+ +-+
+
+1) Primary writes will read original data and forward it to
Secondary
+   QEMU.
+2) Before Primary write requests are written to Shared disk, the
+   original sector content will be read from Shared disk and
+   forwarded and buffered in the Disk buffer on the secondary site,
+   but it will not overwrite the existing


extra spaces at the end of line




+   sector content(it could be from either "Secondary Write
Requests" or


Need a space before "(" for better style.




+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Shared disk.
+4) Secondary write requests will be buffered in the Disk buffer
and it
+   will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk
||   .--
+ /
||   | Secondary
+/
||   '--
+   /
|| virtio-blk
+  /
||  |
+  |
||   replication(5)
+  |NBD  >   NBD
(2)   |
+  |  client ||server ---> hidden
disk <-- active disk(4)
+  | ^   ||  |
+  |  replication(1) ||  |
+  | |   ||  |
+  |   +-'   ||  |
+ (3)  |drive-backup sync=none   ||  |
+. |   +-+   ||  |
+Primary | | |   ||   backing|
+' |   

Re: [Qemu-block] [PATCH RFC 2/7] block-backend: Introduce blk_root() helper

2016-12-04 Thread Hailiang Zhang

On 2016/10/25 17:58, Changlong Xie wrote:

I know you need blk->root in the next patch, but we strongly don't
recommend your current solution.  Please refer Kevin's cf2ab8fc

1409 /* XXX Ugly way to get blk->root, but that's a feature, not a
bug. This
1410  * hack makes it obvious that vhdx_write_header() bypasses the
BlockBackend
1411  * here, which it really shouldn't be doing. */
1412 child = QLIST_FIRST(&bs->parents);
1413 assert(!QLIST_NEXT(child, next_parent));

Then you can drop this commit.



OK, got it, I'll drop this patch in next version, thanks.


On 10/20/2016 09:57 PM, zhanghailiang wrote:

With this helper function, we can get the BdrvChild struct
from BlockBackend

Signed-off-by: zhanghailiang 
---
   block/block-backend.c  | 5 +
   include/sysemu/block-backend.h | 1 +
   2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1a724a8..66387f0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -389,6 +389,11 @@ BlockDriverState *blk_bs(BlockBackend *blk)
   return blk->root ? blk->root->bs : NULL;
   }

+BdrvChild *blk_root(BlockBackend *blk)
+{
+return blk->root;
+}
+
   static BlockBackend *bdrv_first_blk(BlockDriverState *bs)
   {
   BdrvChild *child;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index b07159b..867f9f5 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -99,6 +99,7 @@ void blk_remove_bs(BlockBackend *blk);
   void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
   bool bdrv_has_blk(BlockDriverState *bs);
   bool bdrv_is_root_node(BlockDriverState *bs);
+BdrvChild *blk_root(BlockBackend *blk);

   void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
   void blk_iostatus_enable(BlockBackend *blk);





.






Re: [Qemu-block] [PATCH RFC 3/7] replication: add shared-disk and shared-disk-id options

2016-12-04 Thread Hailiang Zhang

On 2016/10/25 18:01, Changlong Xie wrote:

On 10/20/2016 09:57 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
   block/replication.c | 33 +
   1 file changed, 33 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..2a2fdb2 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
   typedef struct BDRVReplicationState {
   ReplicationMode mode;
   int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
   BdrvChild *active_disk;
   BdrvChild *hidden_disk;
   BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
   char *top_id;
   ReplicationState *rs;
   Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,

   #define REPLICATION_MODE"mode"
   #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
   static QemuOptsList replication_runtime_opts = {
   .name = "replication",
   .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
   .name = REPLICATION_TOP_ID,
   .type = QEMU_OPT_STRING,
   },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
   { /* end of list */ }
   },
   };
@@ -85,6 +99,8 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
   QemuOpts *opts = NULL;
   const char *mode;
   const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;

   ret = -EINVAL;
   opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, 
&error_abort);
@@ -114,6 +130,22 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  "The option mode's value should be primary or secondary");
   goto fail;
   }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);


g_free(s->shared_disk_id);



Will fix in next version, thanks


+goto fail;
+}
+s->primary_disk = blk_root(blk);
+}

   s->rs = replication_new(bs, &replication_ops);

@@ -130,6 +162,7 @@ static void replication_close(BlockDriverState *bs)
   {
   BDRVReplicationState *s = bs->opaque;

+g_free(s->shared_disk_id);
   if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
   replication_stop(s->rs, false, NULL);
   }





.






Re: [Qemu-block] [PATCH RFC 4/7] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()

2016-12-04 Thread Hailiang Zhang

On 2016/10/26 9:40, Changlong Xie wrote:

On 10/20/2016 09:57 PM, zhanghailiang wrote:

The helper backup_do_checkpoint() will be used for primary related
codes. Here we split it out from secondary_do_checkpoint().

Besides, it is unnecessary to call backup_do_checkpoint() in
replication starting and normally stop replication path.


This patch is unnecessary. We *really* need clean
backup_job->done_bitmap in replication_start/stop path.



After we support internal job ('BLOCK_JOB_INTERNAL'), do we still need
to call backup_do?
IMHO, we don't have to clean 'done_bitmap', because
its initial value is zero, and we don't have to call it in
stop path either, the backup job will be cleaned.



We only need call it while do real checkpointing.

Signed-off-by: zhanghailiang 
---
   block/replication.c | 36 +++-
   1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 2a2fdb2..d687ffc 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -320,20 +320,8 @@ static bool 
replication_recurse_is_first_non_filter(BlockDriverState *bs,

   static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
   {
-Error *local_err = NULL;
   int ret;

-if (!s->secondary_disk->bs->job) {
-error_setg(errp, "Backup job was cancelled unexpectedly");
-return;
-}
-
-backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
   ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
   if (ret < 0) {
   error_setg(errp, "Cannot make active disk empty");
@@ -539,6 +527,8 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
   aio_context_release(aio_context);
   return;
   }
+
+secondary_do_checkpoint(s, errp);
   break;
   default:
   aio_context_release(aio_context);
@@ -547,10 +537,6 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,

   s->replication_state = BLOCK_REPLICATION_RUNNING;

-if (s->mode == REPLICATION_MODE_SECONDARY) {
-secondary_do_checkpoint(s, errp);
-}
-
   s->error = 0;
   aio_context_release(aio_context);
   }
@@ -560,13 +546,29 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
   BlockDriverState *bs = rs->opaque;
   BDRVReplicationState *s;
   AioContext *aio_context;
+Error *local_err = NULL;

   aio_context = bdrv_get_aio_context(bs);
   aio_context_acquire(aio_context);
   s = bs->opaque;

-if (s->mode == REPLICATION_MODE_SECONDARY) {
+switch (s->mode) {
+case REPLICATION_MODE_PRIMARY:
+break;
+case REPLICATION_MODE_SECONDARY:
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+break;
+}
+backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+break;
+}
   secondary_do_checkpoint(s, errp);
+break;
+default:
+abort();
   }
   aio_context_release(aio_context);
   }





.






Re: [Qemu-block] [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete

2016-12-07 Thread Hailiang Zhang

Hi,

On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:

* Kevin Wolf (kw...@redhat.com) wrote:

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, we have another case in that error path to abort QEMU
because of the same reason:
 migration_thread()
 migration_completion()
bdrv_inactivate_all() > inactivate images
qemu_savevm_state_complete_precopy()
socket_writev_buffer() > error because destination fails
  qemu_fflush() ---> set error on migration stream
qemu_mutex_unlock_iothread() --> unlock
 qmp_migrate_cancel() -> user cancelled migration
 migrate_set_state() --> set migrate CANCELLING


Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().


 migration_completion() -> go on to fail_invalidate
 if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
 migration_thread() ---> break migration loop
   vm_start() -> restart guest with inactive
 images
We failed to regain the control of images because we only regain it
while the migration state is "active", but here users cancelled the migration
when they found some errors happened (for example, libvirtd daemon is shutdown
in destination unexpectedly).

Signed-off-by: zhanghailiang 
---
  migration/migration.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..0c1ee6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1752,7 +1752,8 @@ fail_invalidate:
  /* If not doing postcopy, vm_start() will be called: let's regain
   * control on images.
   */
-if (s->state == MIGRATION_STATUS_ACTIVE) {


This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.


+if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_CANCELLING) {


This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.


  Error *local_err = NULL;

  bdrv_invalidate_cache_all(&local_err);


So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...


Hmm, this case is messy; I created this function having split it out
of the main loop a couple of years back but it did get more messy
with more s->state checks; as far as I can tell it's always
done the transition to COMPLETED at the end well after the locked
section, so there's always been that chance that cancellation sneaks
in just before or just after the locked section.

Some of the bad cases that can happen:
a) A cancel sneaks in after the ACTIVE check but before or after
  the locked section;  should we reactivate the disks? Well that
  depends on whether the destination actually got the full migration
  stream - we don't know!
 If the destination actually starts running we must not reactivate
 the disk on the source even if the CPU is stopped.



Yes, we didn't have a mechanism to know exactly whether or not the VM in
destination is well received.


b) If the bdrv_inactive_all fails for one device, but the others
   are fine, we go down the fail: label and don't reactivate, so
   the source dies even though it might have been mostly OK.




We can

Re: [Qemu-block] [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete

2016-12-21 Thread Hailiang Zhang

On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:

* Kevin Wolf (kw...@redhat.com) wrote:

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, we have another case in that error path to abort QEMU
because of the same reason:
  migration_thread()
  migration_completion()
 bdrv_inactivate_all() > inactivate images
 qemu_savevm_state_complete_precopy()
 socket_writev_buffer() > error because destination 
fails
   qemu_fflush() ---> set error on migration stream
 qemu_mutex_unlock_iothread() --> unlock
  qmp_migrate_cancel() -> user cancelled migration
  migrate_set_state() --> set migrate CANCELLING


Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().


  migration_completion() -> go on to fail_invalidate
  if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
  migration_thread() ---> break migration loop
vm_start() -> restart guest with inactive
  images
We failed to regain the control of images because we only regain it
while the migration state is "active", but here users cancelled the migration
when they found some errors happened (for example, libvirtd daemon is shutdown
in destination unexpectedly).

Signed-off-by: zhanghailiang 
---
   migration/migration.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..0c1ee6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1752,7 +1752,8 @@ fail_invalidate:
   /* If not doing postcopy, vm_start() will be called: let's regain
* control on images.
*/
-if (s->state == MIGRATION_STATUS_ACTIVE) {


This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.


+if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_CANCELLING) {


This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.


   Error *local_err = NULL;

   bdrv_invalidate_cache_all(&local_err);


So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...


Hmm, this case is messy; I created this function having split it out
of the main loop a couple of years back but it did get more messy
with more s->state checks; as far as I can tell it's always
done the transition to COMPLETED at the end well after the locked
section, so there's always been that chance that cancellation sneaks
in just before or just after the locked section.

Some of the bad cases that can happen:
 a) A cancel sneaks in after the ACTIVE check but before or after
   the locked section;  should we reactivate the disks? Well that
   depends on whether the destination actually got the full migration
   stream - we don't know!
  If the destination actually starts running we must not reactivate
  the disk on the source even if the CPU is stopped.



Yes, we didn't have a mechanism to

Re: [Qemu-block] [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete

2016-12-27 Thread Hailiang Zhang

On 2016/12/22 10:56, Hailiang Zhang wrote:

On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:

* Kevin Wolf (kw...@redhat.com) wrote:

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, we have another case in that error path to abort QEMU
because of the same reason:
   migration_thread()
   migration_completion()
  bdrv_inactivate_all() > inactivate images
  qemu_savevm_state_complete_precopy()
  socket_writev_buffer() > error because destination 
fails
qemu_fflush() ---> set error on migration stream
  qemu_mutex_unlock_iothread() --> unlock
   qmp_migrate_cancel() -> user cancelled migration
   migrate_set_state() --> set migrate CANCELLING


Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().


   migration_completion() -> go on to fail_invalidate
   if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
   migration_thread() ---> break migration loop
 vm_start() -> restart guest with inactive
   images
We failed to regain the control of images because we only regain it
while the migration state is "active", but here users cancelled the migration
when they found some errors happened (for example, libvirtd daemon is shutdown
in destination unexpectedly).

Signed-off-by: zhanghailiang 
---
migration/migration.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..0c1ee6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1752,7 +1752,8 @@ fail_invalidate:
/* If not doing postcopy, vm_start() will be called: let's regain
 * control on images.
 */
-if (s->state == MIGRATION_STATUS_ACTIVE) {


This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.


+if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_CANCELLING) {


This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.


Error *local_err = NULL;

bdrv_invalidate_cache_all(&local_err);


So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...


Hmm, this case is messy; I created this function having split it out
of the main loop a couple of years back but it did get more messy
with more s->state checks; as far as I can tell it's always
done the transition to COMPLETED at the end well after the locked
section, so there's always been that chance that cancellation sneaks
in just before or just after the locked section.

Some of the bad cases that can happen:
  a) A cancel sneaks in after the ACTIVE check but before or after
the locked section;  should we reactivate the disks? Well that
depends on whether the destination actually got the full migration
stream - we don't know!
   If the destination actually starts running we must not reactivate
   the disk on the sour

Re: [Qemu-block] [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete

2017-01-10 Thread Hailiang Zhang

ping .. ?

Any comments ? Or should I send a for formal patch ?

On 2016/12/22 10:56, Hailiang Zhang wrote:

On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:

* Kevin Wolf (kw...@redhat.com) wrote:

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, we have another case in that error path to abort QEMU
because of the same reason:
   migration_thread()
   migration_completion()
  bdrv_inactivate_all() > inactivate images
  qemu_savevm_state_complete_precopy()
  socket_writev_buffer() > error because destination 
fails
qemu_fflush() ---> set error on migration stream
  qemu_mutex_unlock_iothread() --> unlock
   qmp_migrate_cancel() -> user cancelled migration
   migrate_set_state() --> set migrate CANCELLING


Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().


   migration_completion() -> go on to fail_invalidate
   if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
   migration_thread() ---> break migration loop
 vm_start() -> restart guest with inactive
   images
We failed to regain the control of images because we only regain it
while the migration state is "active", but here users cancelled the migration
when they found some errors happened (for example, libvirtd daemon is shutdown
in destination unexpectedly).

Signed-off-by: zhanghailiang 
---
migration/migration.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..0c1ee6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1752,7 +1752,8 @@ fail_invalidate:
/* If not doing postcopy, vm_start() will be called: let's regain
 * control on images.
 */
-if (s->state == MIGRATION_STATUS_ACTIVE) {


This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.


+if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_CANCELLING) {


This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.


Error *local_err = NULL;

bdrv_invalidate_cache_all(&local_err);


So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...


Hmm, this case is messy; I created this function having split it out
of the main loop a couple of years back but it did get more messy
with more s->state checks; as far as I can tell it's always
done the transition to COMPLETED at the end well after the locked
section, so there's always been that chance that cancellation sneaks
in just before or just after the locked section.

Some of the bad cases that can happen:
  a) A cancel sneaks in after the ACTIVE check but before or after
the locked section;  should we reactivate the disks? Well that
depends on whether the destination actually got the full migration
stream - we don't know!
   If the destination actually starts 

Re: [Qemu-block] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case

2017-01-17 Thread Hailiang Zhang

Hi Stefan,

On 2017/1/17 21:19, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:35:03PM +0800, zhanghailiang wrote:

@@ -663,8 +695,12 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)

  switch (s->mode) {
  case REPLICATION_MODE_PRIMARY:
-s->replication_state = BLOCK_REPLICATION_DONE;
-s->error = 0;
+if (s->is_shared_disk && s->primary_disk->bs->job) {
+block_job_cancel(s->primary_disk->bs->job);


Should this be block_job_cancel_sync()?



No, here it is different from the secondary side which needs to wait
until backup job been canceled before resumes to run (Or there will be
an error, https://patchwork.kernel.org/patch/9128841/).

For primary VM, Just as you can see the design scenario in patch 1,
It accesses the shared disk directly, the backup job whose source side
is just the shared disk does not influence primary VM's running,
So IMHO, it is safe to call block_job_cancel here.

Thanks,
Hailiang



+} else {
+s->replication_state = BLOCK_REPLICATION_DONE;
+s->error = 0;
+}
  break;
  case REPLICATION_MODE_SECONDARY:
  /*
--
1.8.3.1







Re: [Qemu-block] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option

2017-01-17 Thread Hailiang Zhang

On 2016/12/20 20:42, Changlong Xie wrote:

On 12/05/2016 04:35 PM, zhanghailiang wrote:

Some code logic only be needed in non-shared disk, here
we adjust these codes to prepare for shared disk scenario.

Signed-off-by: zhanghailiang 
---
   block/replication.c | 47 ---
   1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 35e9ab3..6574cc2 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -531,21 +531,28 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
   aio_context_release(aio_context);
   return;
   }
-bdrv_op_block_all(top_bs, s->blocker);
-bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);

-job = backup_job_create(NULL, s->secondary_disk->bs, 
s->hidden_disk->bs,
-0, MIRROR_SYNC_MODE_NONE, NULL, false,
+/*
+ * Only in the case of non-shared disk,
+ * the backup job is in the secondary side
+ */
+if (!s->is_shared_disk) {
+bdrv_op_block_all(top_bs, s->blocker);
+bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+job = backup_job_create(NULL, s->secondary_disk->bs,
+s->hidden_disk->bs, 0,
+MIRROR_SYNC_MODE_NONE, NULL, false,
   BLOCKDEV_ON_ERROR_REPORT,
   BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
   backup_job_completed, bs, NULL, &local_err);


Coding style here.



Will fix it in next version, thanks.


-if (local_err) {
-error_propagate(errp, local_err);
-backup_job_cleanup(bs);
-aio_context_release(aio_context);
-return;
+if (local_err) {
+error_propagate(errp, local_err);
+backup_job_cleanup(bs);
+aio_context_release(aio_context);
+return;
+}
+block_job_start(job);
   }
-block_job_start(job);

   secondary_do_checkpoint(s, errp);
   break;
@@ -575,14 +582,16 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
   case REPLICATION_MODE_PRIMARY:
   break;
   case REPLICATION_MODE_SECONDARY:
-if (!s->secondary_disk->bs->job) {
-error_setg(errp, "Backup job was cancelled unexpectedly");
-break;
-}
-backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-break;
+if (!s->is_shared_disk) {
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+break;
+}
+backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+break;
+}
   }
   secondary_do_checkpoint(s, errp);
   break;
@@ -663,7 +672,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
* before the BDS is closed, because we will access hidden
* disk, secondary disk in backup_job_completed().
*/
-if (s->secondary_disk->bs->job) {
+if (!s->is_shared_disk && s->secondary_disk->bs->job) {
   block_job_cancel_sync(s->secondary_disk->bs->job);
   }






.






Re: [Qemu-block] [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options

2017-01-17 Thread Hailiang Zhang

On 2016/12/6 0:22, Eric Blake wrote:

On 12/05/2016 02:35 AM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v2:
- add these two options for BlockdevOptionsReplication to support
   qmp blockdev-add command.
- fix a memory leak found by Changlong
---



+++ b/qapi/block-core.json
@@ -2232,12 +2232,19 @@
  #  node who owns the replication node chain. Must not be given in
  #  primary mode.
  #
+# @shared-disk-id: #optional The id of shared disk while in replication mode.
+#
+# @shared-disk: #optional To indicate whether or not a disk is shared by
+#   primary VM and secondary VM.


Both of these need a '(since 2.9)' designation since they've missed 2.8,
and could probably also use documentation on the default value assumed
when the parameter is omitted.



OK, i will add it in next version, thanks.


+#
  # Since: 2.8
  ##
  { 'struct': 'BlockdevOptionsReplication',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'mode': 'ReplicationMode',
-'*top-id': 'str' } }
+'*top-id': 'str',
+'*shared-disk-id': 'str',
+'*shared-disk': 'bool' } }

  ##
  # @NFSTransport








Re: [Qemu-block] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options

2017-01-17 Thread Hailiang Zhang

On 2017/1/17 19:25, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:35:00PM +0800, zhanghailiang wrote:

@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  QemuOpts *opts = NULL;
  const char *mode;
  const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;

  ret = -EINVAL;
  opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
@@ -119,6 +136,25 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 "The option mode's value should be primary or secondary");
  goto fail;
  }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+g_free(s->shared_disk_id);
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+goto fail;


Please move the g_free() to the fail label to prevent future code
changes from introducing a memory leak.



OK, I will fix it in next version, thanks very much.




Re: [Qemu-block] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-18 Thread Hailiang Zhang

On 2017/1/13 21:41, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:

+Issue qmp command:
+  { 'execute': 'blockdev-add',
+'arguments': {
+'driver': 'replication',
+'node-name': 'rep',
+'mode': 'primary',
+'shared-disk-id': 'primary_disk0',
+'shared-disk': true,
+'file': {
+'driver': 'nbd',
+'export': 'hidden_disk0',
+'server': {
+'type': 'inet',
+'data': {
+'host': 'xxx.xxx.xxx.xxx',
+'port': 'yyy'
+}
+}


block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.



Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().

Thanks,
Hailiang




Reviewed-by: Stefan Hajnoczi 






Re: [Qemu-block] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-19 Thread Hailiang Zhang

On 2017/1/20 0:41, Stefan Hajnoczi wrote:

On Thu, Jan 19, 2017 at 10:50:19AM +0800, Hailiang Zhang wrote:

On 2017/1/13 21:41, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:

+Issue qmp command:
+  { 'execute': 'blockdev-add',
+'arguments': {
+'driver': 'replication',
+'node-name': 'rep',
+'mode': 'primary',
+'shared-disk-id': 'primary_disk0',
+'shared-disk': true,
+'file': {
+'driver': 'nbd',
+'export': 'hidden_disk0',
+'server': {
+'type': 'inet',
+'data': {
+'host': 'xxx.xxx.xxx.xxx',
+'port': 'yyy'
+}
+}


block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.



Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().


Yes, this is the vm_stop() problem again.  There is no reliable way to
cancel I/O requests so instead QEMU waits...forever.  A solution is
needed so COLO doesn't hang on network failure.



Yes, COLO needs to detect this situation and cancel the requests in a proper
way.


I'm not sure how to solve the problem.  The secondary still has the last
successful checkpoint so it could resume instead of waiting for the
current checkpoint to commit.

There may still be NBD I/O in flight, so the would need to drain it or
fence storage to prevent interference once the secondary VM is running.



Agreed, we need to think this carefully. We'll put these reliabilities
developing in future after COLO's basic function completed.

Thanks,
Hailiang


Stefan






Re: [Qemu-block] [PATCH] migration: re-active images while migration been canceled after inactive them

2017-01-23 Thread Hailiang Zhang

On 2017/1/23 21:40, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, there are another two cases can trigger the same error reports:
" bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed",

Case 1, codes path:
migration_thread()
 migration_completion()
 bdrv_inactivate_all() > inactivate images
 qemu_savevm_state_complete_precopy()
 socket_writev_buffer() > error because destination fails
 qemu_fflush() ---> set error on migration 
stream
-> qmp_migrate_cancel() -> user cancelled migration 
concurrently
 -> migrate_set_state() --> set migrate CANCELLIN
 migration_completion() -> go on to fail_invalidate
if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch

Case 2, codes path:
migration_thread()
 migration_completion()
 bdrv_inactivate_all() > inactivate images
 migreation_completion() finished
-> qmp_migrate_cancel() -> user cancelled migration 
concurrently
 qemu_mutex_lock_iothread();
 qemu_bh_schedule (s->cleanup_bh);

As we can see from above, qmp_migrate_cancel can slip in whenever
migration_thread does not hold the global lock. If this happens after
bdrv_inactive_all() been called, the above error reports will appear.

To prevent this, we can call bdrv_invalidate_cache_all() in qmp_migrate_cancel()
directly if we find images become inactive.

Signed-off-by: zhanghailiang 
---
Hi,

I have sent another patch before to fix this problem, but didn't cover
all the scenes, and there are some discussions about this problem,
For more detail, please refer to
https://lists.gnu.org/archive/html/qemu-block/2016-12/msg3.html
---
  include/migration/migration.h |  3 +++
  migration/migration.c | 13 +
  2 files changed, 16 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..2d5b724 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -177,6 +177,9 @@ struct MigrationState
  /* Flag set once the migration thread is running (and needs joining) */
  bool migration_thread_running;

+/* Flag set once the migration thread called bdrv_inactivate_all */
+bool block_inactive;
+
  /* Queue of outstanding page requests from the destination */
  QemuMutex src_page_req_mutex;
  QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..9defb3e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1006,6 +1006,16 @@ static void migrate_fd_cancel(MigrationState *s)
  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
  qemu_file_shutdown(f);
  }
+if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
+Error *local_err = NULL;
+
+bdrv_invalidate_cache_all(&local_err);
+if (local_err) {
+error_report_err(local_err);
+} else {
+s->block_inactive = false;
+}
+}
  }

  void add_migration_state_change_notifier(Notifier *notify)
@@ -1705,6 +1715,7 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
  if (ret >= 0) {
  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
  qemu_savevm_state_complete_precopy(s->to_dst_file, false);
+s->block_inactive = true;
  }
  }
  qemu_mutex_unlock_iothread();
@@ -1758,6 +1769,8 @@ fail_invalidate:
  bdrv_invalidate_cache_all(&local_err);
  if (local_err) {
  error_report_err(local_err);
+} else {
+s->block_inactive = false;
  }


I think the fe904 commit also add the problem that this 
bdrv_invalidate_cache_all
is done outside the big lock (Stefan and Kevin tell me bdrv_* calls generally
need the lock).



Ha, you are right, I didn't noticed that all the time.
So should I just add the big lock there ? Is that enough?

Thanks,
Hailiang


Dave


  }

--
1.8.3.1



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.






[Qemu-block] [Block Replication] Question about supporting COLO in libvirt

2017-02-06 Thread Hailiang Zhang

Hi,
I'm trying to implement supporting COLO in libvirt,
But i found an annoying problem that libvirt does not
support the command line option argument syntax we used
for block replication in QEMU.

That is libvirt does not support the bellow syntax for block:
-drive driver=qcow2,file.filename=test:a.qcow2
-drive file=test.qcow2,\
backing.file.filename=/dev/fdset/1

It seems to be a new syntax that libvirt does not support
thought it has been exist in QEMU for a time.
I found some introductions from
http://www.linux-kvm.org/images/3/34/Kvm-forum-2013-block-dev-configuration.pdf

The command line we use for COLO is just like the above syntax,
For example, for the shared disk in COLO, it is:
 -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
backing.driver=raw,backing.file.filename=1.raw \
 -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
file.driver=qcow2,top-id=active-disk0,\
file.file.filename=/mnt/ramfs/active_disk.img,\
file.backing=hidden_disk0,shared-disk=on

For the none-shared disk in COLO, it is quite same with the shared-disk:
  -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
  -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
 file.file.filename=active_disk.qcow2,\
 file.driver=qcow2,\
 file.backing.file.filename=hidden_disk.qcow2,\
 file.backing.driver=qcow2,\
 file.backing.backing=colo1

So there seems to be two ways to solve this problem.

One is to support this new option argument syntax in libvirt,
but I'm not sure if it is difficult or not to implement it,
and i don't know where to start either.

Another way is to convert these command line options in QEMU totally,
I mean hidden the descriptions of 'active_disk' and 'hidden_disk' disks.
Create/add them dynamicly by qmp commands while users want to make VM goes
into COLO state. That's just like to support live image clone in QEMU.

Any ideas ?


Thanks,
Hailiang




Re: [Qemu-block] [libvirt] [Block Replication] Question about supporting COLO in libvirt

2017-02-08 Thread Hailiang Zhang

On 2017/2/6 20:39, Daniel P. Berrange wrote:

On Mon, Feb 06, 2017 at 08:34:28PM +0800, Hailiang Zhang wrote:

Hi,
I'm trying to implement supporting COLO in libvirt,
But i found an annoying problem that libvirt does not
support the command line option argument syntax we used
for block replication in QEMU.

That is libvirt does not support the bellow syntax for block:
-drive driver=qcow2,file.filename=test:a.qcow2
-drive file=test.qcow2,\
backing.file.filename=/dev/fdset/1

It seems to be a new syntax that libvirt does not support
thought it has been exist in QEMU for a time.
I found some introductions from
http://www.linux-kvm.org/images/3/34/Kvm-forum-2013-block-dev-configuration.pdf

The command line we use for COLO is just like the above syntax,
For example, for the shared disk in COLO, it is:
  -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
 backing.driver=raw,backing.file.filename=1.raw \
  -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
 file.driver=qcow2,top-id=active-disk0,\
 file.file.filename=/mnt/ramfs/active_disk.img,\
 file.backing=hidden_disk0,shared-disk=on

For the none-shared disk in COLO, it is quite same with the shared-disk:
   -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
   -drive if=xxx,id=topxxx,driver=replication,mode=secondary,top-id=topxxx\
  file.file.filename=active_disk.qcow2,\
  file.driver=qcow2,\
  file.backing.file.filename=hidden_disk.qcow2,\
  file.backing.driver=qcow2,\
  file.backing.backing=colo1

So there seems to be two ways to solve this problem.

One is to support this new option argument syntax in libvirt,
but I'm not sure if it is difficult or not to implement it,
and i don't know where to start either.


Libvirt has to start supporting this new syntax. It is required for
many different use cases beyond just colo. For example, to be able
to explicitly given qemu details about a backing file format to
remove probing, or to be able to set LUKS passwords on backing files,
and more beside



Thanks for your quick reply, do you or other developers in libvirt
have plan to implement it ?

Thanks,
Hailiang



Regards,
Daniel






Re: [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation

2016-02-14 Thread Hailiang Zhang

On 2016/2/5 12:18, Changlong Xie wrote:

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
  Makefile.objs|  1 +
  qapi/block-core.json | 13 
  replication.c| 94 
  replication.h| 53 +
  4 files changed, 161 insertions(+)
  create mode 100644 replication.c
  create mode 100644 replication.h

diff --git a/Makefile.objs b/Makefile.objs
index 06b95c7..a8c74b7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
  block-obj-$(CONFIG_WIN32) += aio-win32.o
  block-obj-y += block/
  block-obj-y += qemu-io-cmds.o
+block-obj-y += replication.o

  block-obj-m = block/

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7e9e8fe..12362b8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2002,6 +2002,19 @@
  '*read-pattern': 'QuorumReadPattern' } }

  ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.6
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
  # @BlockdevOptions
  #
  # Options for creating a block device.
diff --git a/replication.c b/replication.c
new file mode 100644
index 000..e8ac2f0
--- /dev/null
+++ b/replication.c
@@ -0,0 +1,94 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "replication.h"
+
+static QLIST_HEAD(, ReplicationState) replication_states;
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
+{
+ReplicationState *rs;
+
+rs = g_new0(ReplicationState, 1);
+rs->opaque = opaque;
+rs->ops = ops;
+QLIST_INSERT_HEAD(&replication_states, rs, node);
+
+return rs;
+}
+
+void replication_remove(ReplicationState *rs)
+{
+QLIST_REMOVE(rs, node);
+g_free(rs);
+}
+
+/*
+ * The caller of the function *MUST* make sure vm stopped
+ */
+void replication_start_all(ReplicationMode mode, Error **errp)
+{


Is this public API is only used for block ?
If yes, I'd like it with a 'block_' prefix.


+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->start) {
+rs->ops->start(rs, mode, errp);
+}
+if (*errp != NULL) {


This is incorrect, you miss checking if errp is NULL,
if errp is NULL, there will be an error that accessing memory at address 0x0.
Same with other places in this patch.


+return;
+}
+}
+}
+
+void replication_do_checkpoint_all(Error **errp)
+{
+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->checkpoint) {
+rs->ops->checkpoint(rs, errp);
+}
+if (*errp != NULL) {
+return;



+}
+}
+}
+
+void replication_get_error_all(Error **errp)
+{
+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->get_error) {
+rs->ops->get_error(rs, errp);
+}
+if (*errp != NULL) {



+return;
+}
+}
+}
+
+void replication_stop_all(bool failover, Error **errp)
+{
+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->stop) {
+rs->ops->stop(rs, failover, errp);
+}
+if (*errp != NULL) {
+return;
+}
+}
+}
diff --git a/replication.h b/replication.h
new file mode 100644
index 000..faea649
--- /dev/null
+++ b/replication.h
@@ -0,0 +1,53 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REPLICATION_H
+#define REPLICATION_H
+
+#include "sysemu/sysemu.h"
+
+typedef struct ReplicationOps ReplicationOps;
+typedef struct ReplicationState ReplicationState;
+typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error 
**errp);
+typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
+typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
+typedef void (*GetError)(ReplicationState *rs, Error **errp);
+
+struct ReplicationState {
+void *opaque;
+Replicatio

Re: [Qemu-block] [PATCH v15 0/9] Block replication for continuous checkpoints

2016-02-17 Thread Hailiang Zhang

ping...

COLO prototype is based on this series, but it seems that this
series didn't got enough reviewing and feedback, we will miss
the train of qemu 2.6 version :(
Since COLO is still a prototype, some problems could be fixed in
later developing, and we hope COLO prototype to be merged as quickly
as possible, could you please speed up the process of reviewing ?

Thanks,
Hailiang

On 2016/2/5 12:17, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

Patch status:
1. Acked patches: none
2. Reviewed patches: patch 4
3. Updated patches: patch 3, 4, 5, 7, 8
Note: patch 7 is a new patch that introduces genernal start/stop/checkpoint/
get_error
APIs.

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v15

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v14

TODO:
1. Continuous block replication. It will be started after basic functions
are accepted.

Changs Log:
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
2) Update the message and description for [PATCH 4/9]
3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Changlong Xie (1):
   Introduce new APIs to do replication operation

Wen Congyang (8):
   unblock backup operations in backing file
   Store parent BDS in BdrvChild
   Backup: clear all bitmap when doing block checkpoint
   Link backup into block core
   docs: block replication's description
   auto complete active commit
   Implement new driver for block replication
   support replication driver in blockdev-add

  Makefile.objs  |   1 +
  block.c|  19 ++
  block/Makefile.objs|   3 +-
  block/backup.c |  15 ++
  block/mirror.c |  13 +-
  block/replication.c| 594 +
  blockdev.c |   2 +-
  docs/block-replication.txt | 238 ++
  include/block/block_int.h  |   6 +-
  qapi/block-core.json   |  33 ++-
  qemu-img.c |   2 +-
  replication.c  |  94 +++
  replication.h  |  53 
  13 files changed, 1063 insertions(+), 10 deletions(-)
  create mode 100644 block/replication.c
  create mode 100644 docs/block-replication.txt
  create mode 100644 replicati

Re: [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation

2016-02-19 Thread Hailiang Zhang

Hi,

On 2016/2/15 9:13, Wen Congyang wrote:

On 02/15/2016 08:57 AM, Hailiang Zhang wrote:

On 2016/2/5 12:18, Changlong Xie wrote:

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
   Makefile.objs|  1 +
   qapi/block-core.json | 13 
   replication.c| 94 

   replication.h| 53 +
   4 files changed, 161 insertions(+)
   create mode 100644 replication.c
   create mode 100644 replication.h

diff --git a/Makefile.objs b/Makefile.objs
index 06b95c7..a8c74b7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
   block-obj-$(CONFIG_WIN32) += aio-win32.o
   block-obj-y += block/
   block-obj-y += qemu-io-cmds.o
+block-obj-y += replication.o

   block-obj-m = block/

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7e9e8fe..12362b8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2002,6 +2002,19 @@
   '*read-pattern': 'QuorumReadPattern' } }

   ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.6
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
   # @BlockdevOptions
   #
   # Options for creating a block device.
diff --git a/replication.c b/replication.c
new file mode 100644
index 000..e8ac2f0
--- /dev/null
+++ b/replication.c
@@ -0,0 +1,94 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "replication.h"
+
+static QLIST_HEAD(, ReplicationState) replication_states;
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
+{
+ReplicationState *rs;
+
+rs = g_new0(ReplicationState, 1);
+rs->opaque = opaque;
+rs->ops = ops;
+QLIST_INSERT_HEAD(&replication_states, rs, node);
+
+return rs;
+}
+
+void replication_remove(ReplicationState *rs)
+{
+QLIST_REMOVE(rs, node);
+g_free(rs);
+}
+
+/*
+ * The caller of the function *MUST* make sure vm stopped
+ */
+void replication_start_all(ReplicationMode mode, Error **errp)
+{


Is this public API is only used for block ?
If yes, I'd like it with a 'block_' prefix.


No, we hope it can be used for nic too.



OK, i got why you designed these APIs, I like this idea that
use the callback/notifier to notify the status of COLO for block/nic.

But let's do something more graceful.
For COLO, we can consider it has four states:
Prepare/start checkpoint(with VM stopped)/finish checkpoint (with VM 
resumed)/failover.
So here we need four operation functions according to these four states.
I'd like these public APIs with a 'colo_' prefix,

What about colo_replication_prepare()/colo_replication_start_checkpoint()/

colo_replication_finish_checkpoint()/colo_replication_failover() ?

Besides, It's better to rename the replicaton.c, and move it to the 'migration' 
folder.
Because it seems to be only used by COLO, maybe colo_ops.c or maybe another 
choice
move all these codes to colo-comm.c. (If we do this, we need to merge 
colo-comm.c
first as prerequisite).

Thanks,
Hailiang






+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->start) {
+rs->ops->start(rs, mode, errp);
+}
+if (*errp != NULL) {


This is incorrect, you miss checking if errp is NULL,
if errp is NULL, there will be an error that accessing memory at address 0x0.
Same with other places in this patch.


+return;
+}
+}
+}
+
+void replication_do_checkpoint_all(Error **errp)
+{
+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->checkpoint) {
+rs->ops->checkpoint(rs, errp);
+}
+if (*errp != NULL) {
+return;



+}
+}
+}
+
+void replication_get_error_all(Error **errp)
+{
+ReplicationState *rs, *next;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->get_error) {
+rs->ops->get_error(rs, errp);
+}
+if (*errp != NULL) {



+return;
+}
+}
+}
+
+void replication_stop_all(bool failover, Error **err

Re: [Qemu-block] [Qemu-devel] [PATCH v18 0/8] Block replication for continuous checkpoints

2016-05-05 Thread Hailiang Zhang

ping^2 ...

Sorry for the noise, but this series is the prerequisite of COLO
and it got no feedback for almost three weeks ...

On 2016/4/15 16:10, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02093.html

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v18

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v17

TODO:
1. Continuous block replication. It will be started after basic functions
are accepted.

Changs Log:
V18:
p6: add local_err in all replication callbacks to prevent "errp == NULL"
p7: add missing qemu_iovec_destroy(xxx)
V17:
1. Rebase to the lastest codes
p2: refactor backup_do_checkpoint addressed comments from Jeff Cody
p4: fix bugs in "drive_add buddy xxx" hmp commands
p6: add "since: 2.7"
p7: fix bug in replication_close(), add missing "qapi/error.h", add 
test-replication
p8: add "since: 2.7"
V16:
1. Rebase to the newest codes
2. Address comments from Stefan & hailiang
p3: we don't need this patch now
p4: add "top-id" parameters for secondary
p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs,
add doc comments that explain the semantics of Replication
p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
*Note*: I'm working on replication testcase now, will send out in V17
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
2) Update the message and description for [PATCH 4/9]
3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Changlong Xie (1):
   Introduce new APIs to do replication operation

Wen Congyang (7):
   unblock backup operations in backing file
   Backup: clear all bitmap when doing block checkpoint
   Link backup into block core
   docs: block replication's description
   auto complete active commit
   Implement new driver for block replication
   support replication driver in blockdev-add

  Makefile.objs  |   1 +
  block.c|  17 ++
  block/Makefile.objs|   3 +-
  block/backup.c |  17 ++
  block/mirror.c |  13 +-
  block/replication.c| 625 +
  blockdev.c |   2 +-
  docs/block-replication.txt | 239 +
  include/block/block_int.h  |   5 

Re: [Qemu-block] [PATCH v3 0/6] COLO block replication supports shared disk case

2017-02-23 Thread Hailiang Zhang

ping ... ?

On 2017/1/20 11:47, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the third version.

Last posted series patches:
https://lists.gnu.org/archive/html/qemu-block/2016-12/msg00039.html
You can refer to the above link if want to test it.

I have uploaded the new version to github:
https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-1-20

Please review and any commits are welcomed.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Dr. David Alan Gilbert (git) 
Cc: eddie.d...@intel.com

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
   docs/block-replication: Add description for shared-disk case
   replication: add shared-disk and shared-disk-id options
   replication: Split out backup_do_checkpoint() from
 secondary_do_checkpoint()
   replication: fix code logic with the new shared_disk option
   replication: Implement block replication for shared disk case
   nbd/replication: implement .bdrv_get_info() for nbd and replication
 driver

  block/nbd.c|  12 
  block/replication.c| 156 +++--
  docs/block-replication.txt | 139 ++--
  qapi/block-core.json   |  10 ++-
  4 files changed, 279 insertions(+), 38 deletions(-)






Re: [Qemu-block] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case

2017-02-28 Thread Hailiang Zhang

On 2017/2/28 0:46, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote:

+Secondary:
+ -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+file.driver=qcow2,top-id=active-disk0,\
+file.file.filename=/mnt/ramfs/active_disk.img,\
+file.backing=hidden_disk0,shared-disk=on


I don't see where this patch series adds a -drive shared-disk=on option.
Did I miss it?



In patch 2, we add this option.

Thanks.




Re: [Qemu-block] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options

2017-02-28 Thread Hailiang Zhang

On 2017/2/28 1:10, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:

@@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 "The option mode's value should be primary or secondary");
  goto fail;
  }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We can't access root member of BlockBackend directly */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(&tmp_bs->parents);


Why is this necessary?

We already have the BlockBackend for the primary disk.  I'm not sure why
the BdrvChild is needed.



Er, the helper backup_job_create() needs the BlockDriverState for the primary 
disk,
besides, we want to make it same with 'secondary_disk' in struct 
BDRVReplicationState.

Thanks.
Hailiang


Stefan






Re: [Qemu-block] [PATCH v3 5/6] replication: Implement block replication for shared disk case

2017-03-07 Thread Hailiang Zhang

Hi Stefan,

Sorry for the delayed reply.

On 2017/2/28 1:37, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

  virtio-blk ||   
.--
  /  ||   | 
Secondary
 /   ||   
'--
/|| 
virtio-blk
   / || 
 |
   | ||   
replication(5)
   |NBD  >   NBD   (2)  
 |
   |  client ||server ---> hidden disk <-- 
active disk(4)
   | ^   ||  |
   |  replication(1) ||  |
   | |   ||  |
   |   +-'   ||  |
  (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
   V |   |
+---+|
|   shared disk | <--+
+---+

 1) Primary writes will read original data and forward it to Secondary
QEMU.
 2) The hidden-disk is created automatically. It buffers the original 
content
that is modified by the primary VM. It should also be an empty disk, and
the driver supports bdrv_make_empty() and backing file.
 3) Primary write requests will be written to Shared disk.
 4) Secondary write requests will be buffered in the active disk and it
will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 


Are there any restrictions on the shared disk?  For example the -drive
cache= mode must be 'none'.  If the cache mode isn't 'none' the
secondary host might have old data in the host page cache.  The


While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all()
will be called, is it enough ?


Secondary QEMU would have an inconsistent view of the shared disk.

Are image file formats like qcow2 supported for the shared disk?  Extra


In the above scenario, it has no limitation of formats for the shared disk.


steps are required to achieve consistency, see bdrv_invalidate_cache().



Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint.


Thanks,
Hailiang


Stefan






Re: [Qemu-block] [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-04-16 Thread Hailiang Zhang

On 2017/4/12 22:28, Eric Blake wrote:

On 04/12/2017 09:05 AM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
  #  node who owns the replication node chain. Must not be given in
  #  primary mode.
  #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#
+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#
  # Since: 2.9
  ##
  { 'struct': 'BlockdevOptionsReplication',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'mode': 'ReplicationMode',
-'*top-id': 'str' } }
+'*top-id': 'str',
+'*shared-disk-id': 'str',
+'*shared-disk': 'bool' } }

Do we need a separate bool and string? Or is it sufficient to say that
if shared-disk is omitted, we are not sharing, and that if a shared-disk
string is present, then we are sharing and it names the id of the shared
disk.


Er,  Yes, We need both of them, the command line of secondary sides is like:

 -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
file.driver=qcow2,top-id=active-disk0,\
file.file.filename=/mnt/ramfs/active_disk.img,\
file.backing=hidden_disk0,shared-disk=on
We only need the bool shared-disk to indicate whether disk is sharing or not, 
but
for primary side, we need to the blockdev-add command to tell primary which 
disk is shared.
  { 'execute': 'blockdev-add',
'arguments': {
'driver': 'replication',
'node-name': 'rep',
'mode': 'primary',
'shared-disk-id': 'primary_disk0',
'shared-disk': true,
'file': {
'driver': 'nbd',
'export': 'hidden_disk0',
'server': {
'type': 'inet',
'data': {
'host': 'xxx.xxx.xxx.xxx',
'port': 'yyy'
}
}
}
 }
  }






Re: [Qemu-block] [Qemu-devel] [PULL 2/8] replication: clarify permissions

2017-04-17 Thread Hailiang Zhang

On 2017/4/18 9:23, Eric Blake wrote:

On 03/17/2017 08:15 AM, Kevin Wolf wrote:

From: Changlong Xie 

Even if hidden_disk, secondary_disk are backing files, they all need
write permissions in replication scenario. Otherwise we will encouter
below exceptions on secondary side during adding nbd server:

{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': 
true } }
{"error": {"class": "GenericError", "desc": "Conflicts with use by 
hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}}

CC: Zhang Hailiang 
CC: Zhang Chen 
CC: Wen Congyang 

This address for Wen Congyang is different than the one listed in
MAINTAINERS for replication (M: Wen Congyang ),
and different still from addresses my mailer has harvested from other
posts (wencongy...@gmail.com).  The MAINTAINERS entry is now resulting
in 'undelivered mail' bounce messages, can you please submit an update
to MAINTAINERS with your new preferred address? [or gently correct me if
I'm confusing two people with the same name?]



No, the same people, he just left his job from fujitsu, the entry in MAINTAINERS
file needs to be updated.

Cc: Changlong Xie 
Hi Changlong, would you please send a patch to update it ?

Hailiang





Re: [Qemu-block] [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"

2017-04-22 Thread Hailiang Zhang

Hi,

I think the bellow patch can fix your problme.
[PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH
https://patchwork.kernel.org/patch/9591885/

Actually, we encounter the same problem in our test, we fix it with the follow 
patch:

 From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001
 From: zhanghailiang
 Date: Tue, 21 Mar 2017 09:44:36 +0800
 Subject: [PATCH] migration: Re-activate blocks whenever migration been
  cancelled

 In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug
 'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed'
 which occured in migration cancelling process.

 But it seems that we didn't cover all the cases, we caught such a case 
which
 slipped from the old fixup in our test: if libvirtd cancelled the migration
 process for a shutting down VM, it will send 'system_reset' command first,
 and then 'cont' command behind, after VM resumes to run, it will trigger 
the above
 error reports, because we didn't regain the control of blocks for VM.

 Signed-off-by: zhanghailiang
 Signed-off-by: Hongyang Yang
 ---
  block.c   | 12 +++-
  include/block/block.h |  1 +
  include/migration/migration.h |  3 ---
  migration/migration.c |  7 +--
  qmp.c |  4 +---
  5 files changed, 14 insertions(+), 13 deletions(-)

 diff --git a/block.c b/block.c
 index 6e906ec..c2555b0 100644
 --- a/block.c
 +++ b/block.c
 @@ -3875,6 +3875,13 @@ void bdrv_invalidate_cache(BlockDriverState *bs, 
Error **errp)
  }
  }

 +static bool is_inactivated;
 +
 +bool bdrv_is_inactivated(void)
 +{
 +return is_inactivated;
 +}
 +
  void bdrv_invalidate_cache_all(Error **errp)
  {
  BlockDriverState *bs;
 @@ -3892,6 +3899,7 @@ void bdrv_invalidate_cache_all(Error **errp)
  return;
  }
  }
 +is_inactivated = false;
  }

  static int bdrv_inactivate_recurse(BlockDriverState *bs,
 @@ -3948,7 +3956,9 @@ out:
  for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
  aio_context_release(bdrv_get_aio_context(bs));
  }
 -
 +if (!ret) {
 +is_inactivated = true;
 +}
  return ret;
  }

 diff --git a/include/block/block.h b/include/block/block.h
 index 5149260..f77b57f 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -365,6 +365,7 @@ int bdrv_co_ioctl(BlockDriverState *bs, int req, void 
*buf);
  void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
  void bdrv_invalidate_cache_all(Error **errp);
  int bdrv_inactivate_all(void);
 +bool bdrv_is_inactivated(void);

  /* Ensure contents are flushed to disk.  */
  int bdrv_flush(BlockDriverState *bs);
 diff --git a/include/migration/migration.h b/include/migration/migration.h
 index 5720c88..a9a2071 100644
 --- a/include/migration/migration.h
 +++ b/include/migration/migration.h
 @@ -183,9 +183,6 @@ struct MigrationState
  /* Flag set once the migration thread is running (and needs joining) 
*/
  bool migration_thread_running;

 -/* Flag set once the migration thread called bdrv_inactivate_all */
 -bool block_inactive;
 -
  /* Queue of outstanding page requests from the destination */
  QemuMutex src_page_req_mutex;
  QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
 diff --git a/migration/migration.c b/migration/migration.c
 index 54060f7..7c3d593 100644
 --- a/migration/migration.c
 +++ b/migration/migration.c
 @@ -1015,14 +1015,12 @@ static void migrate_fd_cancel(MigrationState *s)
  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
  qemu_file_shutdown(f);
  }
 -if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
 +if (bdrv_is_inactivated()) {
  Error *local_err = NULL;

  bdrv_invalidate_cache_all(&local_err);
  if (local_err) {
  error_report_err(local_err);
 -} else {
 -s->block_inactive = false;
  }
  }
  }
 @@ -1824,7 +1822,6 @@ static void migration_completion(MigrationState *s, 
int current_active_state,
  if (ret >= 0) {
  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
  qemu_savevm_state_complete_precopy(s->to_dst_file, false);
 -s->block_inactive = true;
  }
  }
  qemu_mutex_unlock_iothread();
 @@ -1879,8 +1876,6 @@ fail_invalidate:
  bdrv_invalidate_cache_all(&local_err);
  if (local_err) {
  error_report_err(local_err);
   

Re: [Qemu-block] [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"

2017-04-25 Thread Hailiang Zhang

On 2017/4/24 15:59, Kashyap Chamarthy wrote:

On Sat, Apr 22, 2017 at 05:23:49PM +0800, Hailiang Zhang wrote:

Hi,

Hi Hailiang,


I think the bellow patch can fix your problme.
[PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH
https://patchwork.kernel.org/patch/9591885/

Hmm, the above patch ("qmp-cont: invalidate on RUN_STATE_PRELAUNCH") is
not merged in Git, as it's stalled on design discussion between Kevin
Wolf and Vladimir.

And the below patch, from you, seems to be not submitted upstream (2.8
stable tree, perhaps).  Do you intend to do so?


Er, since this patch does the same thing with the above patch, I'm not sure if 
i should
send this patch ...


Actually, we encounter the same problem in our test, we fix it with the follow 
patch:

  From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001
  From: zhanghailiang
  Date: Tue, 21 Mar 2017 09:44:36 +0800
  Subject: [PATCH] migration: Re-activate blocks whenever migration been
   cancelled

  In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug
  'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed'
  which occured in migration cancelling process.

  But it seems that we didn't cover all the cases, we caught such a case 
which
  slipped from the old fixup in our test: if libvirtd cancelled the 
migration
  process for a shutting down VM, it will send 'system_reset' command first,
  and then 'cont' command behind, after VM resumes to run, it will trigger 
the above
  error reports, because we didn't regain the control of blocks for VM.

  Signed-off-by: zhanghailiang
  Signed-off-by: Hongyang Yang
  ---
   block.c   | 12 +++-
   include/block/block.h |  1 +
   include/migration/migration.h |  3 ---
   migration/migration.c |  7 +--
   qmp.c |  4 +---
   5 files changed, 14 insertions(+), 13 deletions(-)

[...]







Re: [Qemu-block] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-11 Thread Hailiang Zhang

On 2017/4/18 13:59, Xie Changlong wrote:


On 04/12/2017 10:05 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
   block/replication.c  | 43 +--
   qapi/block-core.json | 10 +-
   2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index bf3c395..418b81b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
   typedef struct BDRVReplicationState {
   ReplicationMode mode;
   int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
   BdrvChild *active_disk;
   BdrvChild *hidden_disk;
   BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
   char *top_id;
   ReplicationState *rs;
   Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,
   
   #define REPLICATION_MODE"mode"

   #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
   static QemuOptsList replication_runtime_opts = {
   .name = "replication",
   .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
   .name = REPLICATION_TOP_ID,
   .type = QEMU_OPT_STRING,
   },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
   { /* end of list */ }
   },
   };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
   QemuOpts *opts = NULL;
   const char *mode;
   const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;
   
   bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,

  false, errp);
@@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  "The option mode's value should be primary or secondary");
   goto fail;
   }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+

What If secondary side is supplied with 'REPLICATION_SHARED_DISK_ID'?
Pls refer f4f2539bc to pefect the logical.


Hmm, we should not configure it for secondary side, i'll fix it in next version.



  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We have a BlockBackend for the primary disk but use BdrvChild for
+ * consistency - active_disk, secondary_disk, etc are also BdrvChild.
+ */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
+}
   
   s->rs = replication_new(bs, &replication_ops);
   
-ret = 0;

-
+qemu_opts_del(opts);
+return 0;
   fail:
+g_free(s->shared_disk_id);
   qemu_opts_del(opts);
   error_propagate(errp, local_err);
   
@@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs)

   {
   BDRVReplicationState *s = bs->opaque;
   
+g_free(s->shared_disk_id);

   if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
   replication_stop(s->rs, false, NULL);
   }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..361c932 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
   #  node who owns the replication node chain. Must not be given in
   #  primary mode.
   #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#

Further explanations:

For @shared-disk-id, it must/only be given when @shared-disk enable on
Primary side.


OK.

+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#

Further explanations:

For @shared-disk, it must be given or not-given on both side at the same
time.


OK, will fix it, thanks.


   # Since: 2.9
  

Re: [Qemu-block] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-11 Thread Hailiang Zhang

On 2017/5/12 3:08, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:17PM +0800, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
  block/replication.c  | 43 +--
  qapi/block-core.json | 10 +-
  2 files changed, 50 insertions(+), 3 deletions(-)

Aside from the ongoing discussion about this patch...

Reviewed-by: Stefan Hajnoczi 


Thanks,  I'll fix the related problems found by changlong.




Re: [Qemu-block] [Qemu-devel] [PATCH v4 5/6] replication: Implement block replication for shared disk case

2017-05-12 Thread Hailiang Zhang

On 2017/5/12 3:15, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:20PM +0800, zhanghailiang wrote:

@@ -612,6 +644,16 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
  error_propagate(errp, local_err);
  break;
  }
+} else {
+/*
+ * For shared disk, we need to force SVM to re-read metadata
+ * that is loaded in memory, or there will be inconsistent.
+ */
+bdrv_invalidate_cache(s->secondary_disk->bs, &local_err);

I'm not sure this call has any effect:

 if (!(bs->open_flags & BDRV_O_INACTIVE)) {
 return;
 }

Is BDRV_O_INACTIVE set?


No, you are right, it does not take any effect. So should we set this flag for 
secondary_disk ?
Is it enough to set this flag only, or should we call bdrv_inactivate_recurse() 
?
To be honest, i'm not quite familiar with this parts.

Thanks,
Hailiang





Re: [Qemu-block] [PATCH v4 0/6] COLO block replication supports shared disk case

2017-05-12 Thread Hailiang Zhang

On 2017/5/12 3:17, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:15PM +0800, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the 4th version.

Please review and any commits are welcomed.

Cc: Dr. David Alan Gilbert (git) 
Cc: eddie.d...@intel.com

Sorry for the delay.  Feel free to ping me if I don't review within a
few days when you post a patch.


That is OK. :) , I was doing other things these days, and it is not quite 
urgent ... thanks.


v4:
- Add proper comment for primary_disk in patch 2 (Stefan)
- Call bdrv_invalidate_cache() while do checkpoint for shared disk in patch 5

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
   docs/block-replication: Add description for shared-disk case
   replication: add shared-disk and shared-disk-id options
   replication: Split out backup_do_checkpoint() from
 secondary_do_checkpoint()
   replication: fix code logic with the new shared_disk option
   replication: Implement block replication for shared disk case
   nbd/replication: implement .bdrv_get_info() for nbd and replication
 driver

  block/nbd.c|  12 +++
  block/replication.c| 198 ++---
  docs/block-replication.txt | 139 ++-
  qapi/block-core.json   |  10 ++-
  4 files changed, 306 insertions(+), 53 deletions(-)

--
1.8.3.1