Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-19 Thread Hailiang Zhang

On 2017/1/20 0:41, Stefan Hajnoczi wrote:

On Thu, Jan 19, 2017 at 10:50:19AM +0800, Hailiang Zhang wrote:

On 2017/1/13 21:41, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:

+Issue qmp command:
+  { 'execute': 'blockdev-add',
+'arguments': {
+'driver': 'replication',
+'node-name': 'rep',
+'mode': 'primary',
+'shared-disk-id': 'primary_disk0',
+'shared-disk': true,
+'file': {
+'driver': 'nbd',
+'export': 'hidden_disk0',
+'server': {
+'type': 'inet',
+'data': {
+'host': 'xxx.xxx.xxx.xxx',
+'port': 'yyy'
+}
+}


block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.



Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().


Yes, this is the vm_stop() problem again.  There is no reliable way to
cancel I/O requests so instead QEMU waits...forever.  A solution is
needed so COLO doesn't hang on network failure.



Yes, COLO needs to detect this situation and cancel the requests in a proper
way.


I'm not sure how to solve the problem.  The secondary still has the last
successful checkpoint so it could resume instead of waiting for the
current checkpoint to commit.

There may still be NBD I/O in flight, so the would need to drain it or
fence storage to prevent interference once the secondary VM is running.



Agreed, we need to think this carefully. We'll put these reliabilities
developing in future after COLO's basic function completed.

Thanks,
Hailiang


Stefan






Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-19 Thread Stefan Hajnoczi
On Thu, Jan 19, 2017 at 10:50:19AM +0800, Hailiang Zhang wrote:
> On 2017/1/13 21:41, Stefan Hajnoczi wrote:
> > On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
> > > +Issue qmp command:
> > > +  { 'execute': 'blockdev-add',
> > > +'arguments': {
> > > +'driver': 'replication',
> > > +'node-name': 'rep',
> > > +'mode': 'primary',
> > > +'shared-disk-id': 'primary_disk0',
> > > +'shared-disk': true,
> > > +'file': {
> > > +'driver': 'nbd',
> > > +'export': 'hidden_disk0',
> > > +'server': {
> > > +'type': 'inet',
> > > +'data': {
> > > +'host': 'xxx.xxx.xxx.xxx',
> > > +'port': 'yyy'
> > > +}
> > > +}
> > 
> > block/nbd.c does have good error handling and recovery in case there is
> > a network issue.  There are no reconnection attempts or timeouts that
> > deal with a temporary loss of network connectivity.
> > 
> > This is a general problem with block/nbd.c and not something to solve in
> > this patch series.  I'm just mentioning it because it may affect COLO
> > replication.
> > 
> > I'm sure these limitations in block/nbd.c can be fixed but it will take
> > some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
> > code could also benefit from generic network connection recovery.
> > 
> 
> Hmm, good suggestion, but IMHO, here, COLO is a little different from
> other scenes, if the reconnection method has been implemented,
> it still needs a mechanism to identify the temporary loss of network
> connection or real broken in network connection.
> 
> I did a simple test, just ifconfig down the network card that be used
> by block replication, It seems that NBD in qemu doesn't has a ability to
> find the connection has been broken, there was no error reports
> and COLO just got stuck in vm_stop() where it called aio_poll().

Yes, this is the vm_stop() problem again.  There is no reliable way to
cancel I/O requests so instead QEMU waits...forever.  A solution is
needed so COLO doesn't hang on network failure.

I'm not sure how to solve the problem.  The secondary still has the last
successful checkpoint so it could resume instead of waiting for the
current checkpoint to commit.

There may still be NBD I/O in flight, so the would need to drain it or
fence storage to prevent interference once the secondary VM is running.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-18 Thread Hailiang Zhang

On 2017/1/13 21:41, Stefan Hajnoczi wrote:

On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:

+Issue qmp command:
+  { 'execute': 'blockdev-add',
+'arguments': {
+'driver': 'replication',
+'node-name': 'rep',
+'mode': 'primary',
+'shared-disk-id': 'primary_disk0',
+'shared-disk': true,
+'file': {
+'driver': 'nbd',
+'export': 'hidden_disk0',
+'server': {
+'type': 'inet',
+'data': {
+'host': 'xxx.xxx.xxx.xxx',
+'port': 'yyy'
+}
+}


block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.



Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().

Thanks,
Hailiang




Reviewed-by: Stefan Hajnoczi 






Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2017-01-13 Thread Stefan Hajnoczi
On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
> +Issue qmp command:
> +  { 'execute': 'blockdev-add',
> +'arguments': {
> +'driver': 'replication',
> +'node-name': 'rep',
> +'mode': 'primary',
> +'shared-disk-id': 'primary_disk0',
> +'shared-disk': true,
> +'file': {
> +'driver': 'nbd',
> +'export': 'hidden_disk0',
> +'server': {
> +'type': 'inet',
> +'data': {
> +'host': 'xxx.xxx.xxx.xxx',
> +'port': 'yyy'
> +}
> +}

block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case

2016-12-20 Thread Changlong Xie

On 12/05/2016 04:34 PM, zhanghailiang wrote:

Introuduce the scenario of shared-disk block replication
and how to use it.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v2:
- fix some problems found by Changlong
---
  docs/block-replication.txt | 139 +++--
  1 file changed, 135 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network 
transportation
  effort during a vmstate checkpoint, the disk modification operations of
  the Primary disk are asynchronously forwarded to the Secondary node.

-== Workflow ==
+== Non-shared disk workflow ==
  The following is the image of block replication workflow:

  +--+++
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
  4) Secondary write requests will be buffered in the Disk buffer and it
 will overwrite the existing sector content in the buffer.

-== Architecture ==
+== Non-shared disk architecture ==
  We are going to implement block replication from many basic
  blocks that are already in QEMU.

@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative 
write-through
  of the NBD server into the secondary disk. So before block replication,
  the primary disk and secondary disk should contain the same data.

+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  | (2)Forward and write through | |
+  | +--> | Disk Buffer |
+  | || |
+  | |\-/
+  | |(1)read   |
+  | |  |
+   (3)write   | |  | backing file
+  V |  |
+ +-+   |
+ | Shared Disk | <-+
+ +-+
+
+1) Primary writes will read original data and forward it to Secondary
+   QEMU.
+2) Before Primary write requests are written to Shared disk, the
+   original sector content will be read from Shared disk and
+   forwarded and buffered in the Disk buffer on the secondary site,
+   but it will not overwrite the existing sector content (it could be
+   from either "Secondary Write Requests" or previous COW of "Primary
+   Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Shared disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk ||   
.--
+ /  ||   | 
Secondary
+/   ||   
'--
+   /|| 
virtio-blk
+  / || 
 |
+  | ||   
replication(5)
+  |NBD  >   NBD   (2)  
 |
+  |  client ||server ---> hidden disk <-- 
active disk(4)
+  | ^   ||  |
+  |  replication(1) ||  |
+  | |   ||  |
+  |   +-'   ||  |
+ (3)  |drive-backup sync=none   ||  |
+. |   +-+   ||  |
+Primary | | |   ||   backing|
+' | |   ||  |
+  V |   |
+   +--