[Qemu-devel] about qemu-img info on block dev

2017-11-06 Thread wang.guang55
hello


if we create a qcow2 file on a block dev.


we can,t get the right disk size by qemu-img info。






[root@host-120-79 qemu]# ./qemu-img create -f qcow2 /dev/zs/lvol0 1G

Formatting '/dev/zs/lvol0', fmt=qcow2 size=1073741824 cluster_size=65536 
lazy_refcounts=off refcount_bits=16

[root@host-120-79 qemu]# ./qemu-img info /dev/zs/lvol0 

image: /dev/zs/lvol0

file format: qcow2

virtual size: 1.0G (1073741824 bytes)

disk size: 0

cluster_size: 65536

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false






Then we get disk size from L1 table and L2 table in 
qcow2_get_allocated_file_size.


May be we can submit the patch to qemu master??

[Qemu-devel] 答复: [PATCH v2 05/18] COLO: Handle shutdown command for VMin COLO state

2017-05-03 Thread wang.guang55
hi

The patch may cause qemu_system_reset_request call  qemu_notify_event() twice.

old:



void qemu_system_reset_request(void)

{

if (no_reboot) {

shutdown_requested = 1

} else {

reset_requested = 1

}

cpu_stop_current()

qemu_notify_event()

}


new:

qemu_system_reset_request ->qemu_system_shutdown_request->qemu_notify_event()


And qemu_system_reset_request  may call   replay_shutdown_request() ,old 
version dose not call replay_shutdown_request here.





















原始邮件



发件人: <zhang.zhanghaili...@huawei.com>
收件人: <qemu-devel@nongnu.org> <gilb...@redhat.com>
抄送人: <lizhij...@cn.fujitsu.com> <xiecl.f...@cn.fujitsu.com> 
<zhang.zhanghaili...@huawei.com> <zhangchen.f...@cn.fujitsu.com> 
<quint...@redhat.com> <pbonz...@redhat.com>
日 期 :2017年04月22日 16:55
主 题 :[Qemu-devel] [PATCH v2 05/18] COLO: Handle shutdown command for VMin COLO 
state





If VM is in COLO FT state, we need to do some extra works before
starting normal shutdown process.

Secondary VM will ignore the shutdown command if users issue it directly
to Secondary VM. COLO will capture shutdown command and after
shutdown request from user.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 include/migration/colo.h |  1 +
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c | 46 +-
 qapi-schema.json |  4 +++-
 vl.c | 19 ---
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2bbff9e..aadd040 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -37,4 +37,5 @@ COLOMode get_colo_mode(void)
 void colo_do_failover(MigrationState *s)
 
 void colo_checkpoint_notify(void *opaque)
+bool colo_handle_shutdown(void)
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 16175f7..8054f53 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -49,6 +49,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason
 
+extern int colo_shutdown_requested
+
 void qemu_system_reset_request(void)
 void qemu_system_suspend_request(void)
 void qemu_register_suspend_notifier(Notifier *notifier)
@@ -56,6 +58,7 @@ void qemu_system_wakeup_request(WakeupReason reason)
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled)
 void qemu_register_wakeup_notifier(Notifier *notifier)
 void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 void qemu_system_powerdown_request(void)
 void qemu_register_powerdown_notifier(Notifier *notifier)
 void qemu_system_debug_request(void)
diff --git a/migration/colo.c b/migration/colo.c
index a3344ce..c4fc865 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -384,6 +384,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out
 }
 
+if (colo_shutdown_requested) {
+colo_send_message(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN,
+  _err)
+if (local_err) {
+error_free(local_err)
+/* Go on the shutdown process and throw the error message */
+error_report("Failed to send shutdown message to SVM")
+}
+qemu_fflush(s->to_dst_file)
+colo_shutdown_requested = 0
+qemu_system_shutdown_request_core()
+/* Fix me: Just let the colo thread exit ? */
+qemu_thread_exit(0)
+}
+
 ret = 0
 
 qemu_mutex_lock_iothread()
@@ -449,7 +464,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out
 }
 
-qemu_sem_wait(>colo_checkpoint_sem)
+if (!colo_shutdown_requested) {
+qemu_sem_wait(>colo_checkpoint_sem)
+}
 
 ret = colo_do_checkpoint_transaction(s, bioc, fb)
 if (ret < 0) {
@@ -534,6 +551,16 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 *checkpoint_request = 1
 break
+case COLO_MESSAGE_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread()
+vm_stop_force_state(RUN_STATE_COLO)
+qemu_system_shutdown_request_core()
+qemu_mutex_unlock_iothread()
+/*
+ * The main thread will be exit and terminate the whole
+ * process, do need some cleanup ?
+ */
+qemu_thread_exit(0)
 default:
 *checkpoint_request = 0
 error_setg(errp, "Got unknown COLO message: %d", msg)
@@ -696,3 +723,20 @@ out:
 
 return NULL
 }
+
+bool colo_handle_shutdown(void)
+{
+/*
+ * If VM is in COLO-FT mode, we need do some significant work before
+ * respond to the shutdown request. Besides, Secondary VM will ignore
+ * the shutdown request from users.
+ */
+if 

[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-22 Thread wang.guang55
hi:

yes.it is better.

And should we delete 




#ifdef WIN32

QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL)

#endif




in qio_channel_socket_accept?

qio_channel_socket_new already have it.












原始邮件



发件人: <zhang.zhanghaili...@huawei.com>
收件人:王广10165992
抄送人: <xuqu...@huawei.com> <dgilb...@redhat.com> <zhangchen.f...@cn.fujitsu.com> 
<qemu-devel@nongnu.org>
日 期 :2017年03月22日 15:03
主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang





Hi,

On 2017/3/22 9:42, wang.guan...@zte.com.cn wrote:
> diff --git a/migration/socket.c b/migration/socket.c
>
>
> index 13966f1..d65a0ea 100644
>
>
> --- a/migration/socket.c
>
>
> +++ b/migration/socket.c
>
>
> @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
>
>
>   }
>
>
>
>
>
>   trace_migration_socket_incoming_accepted()
>
>
>
>
>
>   qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
>
>
> +qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
>
>
>   migration_channel_process_incoming(migrate_get_current(),
>
>
>  QIO_CHANNEL(sioc))
>
>
>   object_unref(OBJECT(sioc))
>
>
>
>
> Is this patch ok?
>

Yes, i think this works, but a better way maybe to call 
qio_channel_set_feature()
in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
socket accept fd,
Or fix it by this:

diff --git a/io/channel-socket.c b/io/channel-socket.c
index f546c68..ce6894c 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
Error **errp)
  {
  QIOChannelSocket *cioc
-
-cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
-cioc->fd = -1
+
+cioc = qio_channel_socket_new()
  cioc->remoteAddrLen = sizeof(ioc->remoteAddr)
  cioc->localAddrLen = sizeof(ioc->localAddr)


Thanks,
Hailiang

> I have test it . The test could not hang any more.
>
>
>
>
>
>
>
>
>
>
>
>
> 原始邮件
>
>
>
> 发件人: <zhang.zhanghaili...@huawei.com>
> 收件人: <dgilb...@redhat.com> <berra...@redhat.com>
> 抄送人: <xuqu...@huawei.com> <qemu-devel@nongnu.org> 
<zhangchen.f...@cn.fujitsu..com>王广10165992
> 日 期 :2017年03月22日 09:11
> 主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
>
>
>
>
>
> On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
> > * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> >> Hi,
> >>
> >> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
> >>
> >> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
> >> case COLO thread/incoming thread is stuck in read/write() while do 
failover,
> >> but it didn't take effect, because all the fd used by COLO (also migration)
> >> has been wrapped by qio channel, and it will not call the shutdown API if
> >> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
> >>
> >> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
> >>
> >> I doubted migration cancel has the same problem, it may be stuck in write()
> >> if we tried to cancel migration.
> >>
> >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
> >> {
> >>  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
> >>  migration_channel_connect(s, ioc, NULL)
> >>  ... ...
> >> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
> >> and the
> >> migrate_fd_cancel()
> >> {
> >>   ... ...
> >>  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
> >>  qemu_file_shutdown(f)  --> This will not take effect. No ?
> >>  }
> >> }
> >
> > (cc'd in Daniel Berrange).
> > I see that we call qio_channel_set_feature(ioc, 
QIO_CHANNEL_FEATURE_SHUTDOWN) at the
> > top of qio_channel_socket_new  so I think that's safe isn't it?
> >
>
> Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.
>
> > Dave
> >
> >> Thanks,
> >> Hailiang
> >>
> >> On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
> >>> Thank you。
> >>>
> >>> I have test aready。
> >>>
> >>> When the Primary Node panic,the Secondary Node qemu hang at the same 
place。
> >>>
> >>> Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
> >>>
> >>> I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
> >>>
> >>>
> >>> when failover,channel_shutdown could not shut down the channel.
> >>>
> >>>
> >>> so the colo_process_incoming_thread will hang at recvmsg.
> >>>
> >>>
> >>> I test a patch:
> >>>
> >>>
> >>> diff --git a/migration/socket.c b/migration/socket.c
> >>>
> >>>
> >>> index 13966f1..d65a0ea 100644
> >>>
> >>>
> >>> --- a/migration/socket.c
> >>>
> >>>
> >>> +++ b/migration/socket.c
> >>>
> >>>
> >>> @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel 

[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread wang.guang55
diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


 }


 


 trace_migration_socket_incoming_accepted()





 qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


 migration_channel_process_incoming(migrate_get_current(),


QIO_CHANNEL(sioc))


 object_unref(OBJECT(sioc))




Is this patch ok? 

I have test it . The test could not hang any more.












原始邮件



发件人: <zhang.zhanghaili...@huawei.com>
收件人: <dgilb...@redhat.com> <berra...@redhat.com>
抄送人: <xuqu...@huawei.com> <qemu-devel@nongnu.org> 
<zhangchen.f...@cn.fujitsu.com>王广10165992
日 期 :2017年03月22日 09:11
主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang





On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
>> Hi,
>>
>> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
>>
>> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
>> case COLO thread/incoming thread is stuck in read/write() while do failover,
>> but it didn't take effect, because all the fd used by COLO (also migration)
>> has been wrapped by qio channel, and it will not call the shutdown API if
>> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN).
>>
>> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
>>
>> I doubted migration cancel has the same problem, it may be stuck in write()
>> if we tried to cancel migration.
>>
>> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
Error **errp)
>> {
>>  qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
>>  migration_channel_connect(s, ioc, NULL)
>>  ... ...
>> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN) above,
>> and the
>> migrate_fd_cancel()
>> {
>>   ... ...
>>  if (s->state == MIGRATION_STATUS_CANCELLING && f) {
>>  qemu_file_shutdown(f)  --> This will not take effect. No ?
>>  }
>> }
>
> (cc'd in Daniel Berrange).
> I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
at the
> top of qio_channel_socket_new  so I think that's safe isn't it?
>

Hmm, you are right, this problem is only exist for the migration incoming fd, 
thanks.

> Dave
>
>> Thanks,
>> Hailiang
>>
>> On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
>>> Thank you。
>>>
>>> I have test aready。
>>>
>>> When the Primary Node panic,the Secondary Node qemu hang at the same place。
>>>
>>> Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node 
qemu will not produce the problem,but Primary Node panic can。
>>>
>>> I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
>>>
>>>
>>> when failover,channel_shutdown could not shut down the channel.
>>>
>>>
>>> so the colo_process_incoming_thread will hang at recvmsg.
>>>
>>>
>>> I test a patch:
>>>
>>>
>>> diff --git a/migration/socket.c b/migration/socket.c
>>>
>>>
>>> index 13966f1..d65a0ea 100644
>>>
>>>
>>> --- a/migration/socket.c
>>>
>>>
>>> +++ b/migration/socket.c
>>>
>>>
>>> @@ -147,8 +147,9 @@ static gboolean 
socket_accept_incoming_migration(QIOChannel *ioc,
>>>
>>>
>>>}
>>>
>>>
>>>
>>>
>>>
>>>trace_migration_socket_incoming_accepted()
>>>
>>>
>>>
>>>
>>>
>>>qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
>>>
>>>
>>> +qio_channel_set_feature(QIO_CHANNEL(sioc), 
QIO_CHANNEL_FEATURE_SHUTDOWN)
>>>
>>>
>>>migration_channel_process_incoming(migrate_get_current(),
>>>
>>>
>>>   QIO_CHANNEL(sioc))
>>>
>>>
>>>object_unref(OBJECT(sioc))
>>>
>>>
>>>
>>>
>>> My test will not hang any more.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 原始邮件
>>>
>>>
>>>
>>> 发件人: <zhangchen.f...@cn.fujitsu..com>
>>> 收件人:王广10165992 <zhang.zhanghaili...@huawei.com>
>>> 抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com>
>>> 日 期 :2017年03月21日 15:58
>>> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
>>>
>>>
>>>
>>>
>>>
>>> Hi,Wang.
>>>
>>> You can test this branch:
>>>
>>> 
https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
>>>
>>> and please follow wiki ensure your own configuration correctly.
>>>
>>> http://wiki.qemu-project.org/Features/COLO
>>>
>>>
>>> Thanks
>>>
>>> Zhang Chen
>>>
>>>
>>> On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
>>> >
>>> > hi.
>>> >
>>> > I test the git qemu master have the same problem.
>>> >
>>> > (gdb) bt
>>> >
>>> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
>>> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
>>> >
>>> > 

[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread wang.guang55
amost like wiki,but panic in Primary Node.




setp:

1 

Primary Node.

x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio 
-vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
-usbdevice tablet\

  -drive 
if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,

   
children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2
 -S \

  -netdev 
tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \

  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \

  -netdev 
tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \

  -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \

  -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev 
socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \

  -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev 
socket,id=compare0-0,host=9.61.1.8,port=9001 \

  -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \

  -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \

  -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \

  -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out 
-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \

  -object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0

2 Second node:

x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
-usbdevice tablet\

  -drive 
if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0
 \

  -drive 
if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0
  \

   -netdev 
tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \

  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \

  -netdev 
tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \

  -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev 
socket,id=red0,host=9.61.1.8,port=9003 \

  -chardev socket,id=red1,host=9.61.1.8,port=9004 \

  -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \

  -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \

  -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:

3  Secondary node:

{'execute':'qmp_capabilities'}

{ 'execute': 'nbd-server-start',

  'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': 
'8889'} } }

}

{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
true } }

4:Primary Node:

{'execute':'qmp_capabilities'}


{ 'execute': 'human-monitor-command',

  'arguments': {'command-line': 'drive_add -n buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}}

{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
'node0' } }

{ 'execute': 'migrate-set-capabilities',

  'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } 
] } }

{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:' } }




then can see two runing VMs, whenever you make changes to PVM, SVM will be 
synced.  




5:Primary Node:

echo c > /proc/sysrq-trigger




6:Secondary node:

{ 'execute': 'nbd-server-stop' }

{ "execute": "x-colo-lost-heartbeat" }




then can see the Secondary node hang at recvmsg recvmsg .












原始邮件



发件人: <zhang.zhanghaili...@huawei.com>
收件人:王广10165992 <zhangchen.f...@cn.fujitsu.com>
抄送人: <xuqu...@huawei.com> <qemu-devel@nongnu.org>
日 期 :2017年03月21日 16:27
主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang





Hi,

On 2017/3/21 16:10, wang.guan...@zte.com.cn wrote:
> Thank you。
>
> I have test aready。
>
> When the Primary Node panic,the Secondary Node qemu hang at the same place。
>
> Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu 
will not produce the problem,but Primary Node panic can。
>
> I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.
>
>

Yes, you are right, when we do failover for primary/secondary VM, we will 
shutdown the related
fd in case it is stuck in the read/write fd.

It seems that you didn't follow the above introduction exactly to do the test. 
Could you
share your test procedures ? Especially the commands used in the test.

Thanks,
Hailiang

> when failover,channel_shutdown could not shut down the channel.
>
>
> so the colo_process_incoming_thread will hang at recvmsg.
>
>
> I test a patch:
>
>
> diff --git a/migration/socket.c b/migration/socket.c
>
>
> index 13966f1..d65a0ea 100644

[Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread wang.guang55
Thank you。

I have test aready。

When the Primary Node panic,the Secondary Node qemu hang at the same place。

Incorrding http://wiki.qemu-project.org/Features/COLO ,kill Primary Node qemu 
will not produce the problem,but Primary Node panic can。

I think due to the feature of channel does not support 
QIO_CHANNEL_FEATURE_SHUTDOWN.


when failover,channel_shutdown could not shut down the channel.


so the colo_process_incoming_thread will hang at recvmsg.


I test a patch:


diff --git a/migration/socket.c b/migration/socket.c


index 13966f1..d65a0ea 100644


--- a/migration/socket.c


+++ b/migration/socket.c


@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
*ioc,


 }


 


 trace_migration_socket_incoming_accepted()





 qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")


+qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)


 migration_channel_process_incoming(migrate_get_current(),


QIO_CHANNEL(sioc))


 object_unref(OBJECT(sioc))




My test will not hang any more.

















原始邮件



发件人: <zhangchen.f...@cn.fujitsu.com>
收件人:王广10165992 <zhang.zhanghaili...@huawei.com>
抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com>
日 期 :2017年03月21日 15:58
主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang





Hi,Wang.

You can test this branch:

https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk

and please follow wiki ensure your own configuration correctly.

http://wiki.qemu-project.org/Features/COLO


Thanks

Zhang Chen


On 03/21/2017 03:27 PM, wang.guan...@zte.com.cn wrote:
>
> hi.
>
> I test the git qemu master have the same problem.
>
> (gdb) bt
>
> #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
> niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
>
> #1  0x7f658e4aa0c2 in qio_channel_read 
> (ioc=ioc@entry=0x7f65911b4e50, buf=buf@entry=0x7f65907cb838 "", 
> buflen=buflen@entry=32768, errp=errp@entry=0x0) at io/channel.c:114
>
> #2  0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, 
> buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at 
> migration/qemu-file-channel.c:78
>
> #3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
> migration/qemu-file.c:295
>
> #4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, 
> offset=offset@entry=0) at migration/qemu-file.c:555
>
> #5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at 
> migration/qemu-file.c:568
>
> #6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at 
> migration/qemu-file.c:648
>
> #7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
> errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244
>
> #8  0x7f658e3e681e in colo_receive_check_message (f=<optimized 
> out>, expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, 
> errp=errp@entry=0x7f64ef3fda08)
>
> at migration/colo.c:264
>
> #9  0x7f658e3e740e in colo_process_incoming_thread 
> (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
>
> #10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0
>
> #11 0x7f65881983ed in clone () from /lib64/libc.so.6
>
> (gdb) p ioc->name
>
> $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
>
> (gdb) p ioc->featuresDo not support QIO_CHANNEL_FEATURE_SHUTDOWN
>
> $3 = 0
>
>
> (gdb) bt
>
> #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
> condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
>
> #1  0x7fdcc6966350 in g_main_dispatch (context=<optimized out>) at 
> gmain.c:3054
>
> #2  g_main_context_dispatch (context=<optimized out>, 
> context@entry=0x7fdccce9f590) at gmain.c:3630
>
> #3  0x7fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
>
> #4  os_host_main_loop_wait (timeout=<optimized out>) at 
> util/main-loop.c:258
>
> #5  main_loop_wait (nonblocking=nonblocking@entry=0) at 
> util/main-loop.c:506
>
> #6  0x7fdccb526187 in main_loop () at vl.c:1898
>
> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized 
> out>) at vl.c:4709
>
> (gdb) p ioc->features
>
> $1 = 6
>
> (gdb) p ioc->name
>
> $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
>
>
> May be socket_accept_incoming_migration should 
> call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
>
>
> thank you.
>
>
>
>
>
> 原始邮件
> *发件人:*<zhangchen.f...@cn.fujitsu.com>
> *收件人:*王广10165992<qemu-devel@nongnu.org>
> *抄送人:*<zhangchen.f...@cn.fujitsu.com><zhang.zhanghaili...@huawei.com>
> *日 期 :*2017年03月16日 14:46
> *主 题 :**Re: [Qemu-devel] COLO failover hang*
>
>
>
>
> On 03/15/2017 05:06 PM, wangguang wrote:
> >   am testing QEMU COLO feature described here [QEMU
> > Wiki](http://wiki.qemu-project.org/Features/COLO).
> >
> > When the Primary Node panic,the Secondary Node qemu hang.
> > hang at recvmsg in qio_channel_socket_readv.
> > And  I run  { 

[Qemu-devel] 答复: Re: [BUG]COLO failover hang

2017-03-21 Thread wang.guang55
hi.


I test the git qemu master have the same problem.


(gdb) bt


#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, 
fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461


#1  0x7f658e4aa0c2 in qio_channel_read (ioc=ioc@entry=0x7f65911b4e50, 
buf=buf@entry=0x7f65907cb838 "", buflen=buflen@entry=32768, 
errp=errp@entry=0x0) at io/channel.c:114


#2  0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, 
buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at 
migration/qemu-file-channel.c:78


#3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
migration/qemu-file.c:295


#4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, 
offset=offset@entry=0) at migration/qemu-file.c:555


#5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at 
migration/qemu-file.c:568


#6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at 
migration/qemu-file.c:648


#7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244


#8  0x7f658e3e681e in colo_receive_check_message (f=<optimized out>, 
expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, 
errp=errp@entry=0x7f64ef3fda08)


at migration/colo.c:264


#9  0x7f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 
<mis_current.31286>) at migration/colo.c:577


#10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0


#11 0x7f65881983ed in clone () from /lib64/libc.so.6


(gdb) p ioc->name


$2 = 0x7f658ff7d5c0 "migration-socket-incoming"


(gdb) p ioc->features  Do not support QIO_CHANNEL_FEATURE_SHUTDOWN


$3 = 0





(gdb) bt


#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, 
opaque=0x7fdcceeafa90) at migration/socket.c:137


#1  0x7fdcc6966350 in g_main_dispatch (context=<optimized out>) at 
gmain.c:3054


#2  g_main_context_dispatch (context=<optimized out>, 
context@entry=0x7fdccce9f590) at gmain.c:3630


#3  0x7fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213


#4  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:258


#5  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:506


#6  0x7fdccb526187 in main_loop () at vl.c:1898


#7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
vl.c:4709


(gdb) p ioc->features


$1 = 6


(gdb) p ioc->name


$2 = 0x7fdcce1b1ab0 "migration-socket-listener"





May be socket_accept_incoming_migration should call 
qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??





thank you.















原始邮件



发件人: <zhangchen.f...@cn.fujitsu.com>
收件人:王广10165992 <qemu-devel@nongnu.org>
抄送人: <zhangchen.f...@cn.fujitsu.com> <zhang.zhanghaili...@huawei.com>
日 期 :2017年03月16日 14:46
主 题 :Re: [Qemu-devel] COLO failover hang







On 03/15/2017 05:06 PM, wangguang wrote:
>   am testing QEMU COLO feature described here [QEMU
> Wiki](http://wiki.qemu-project.org/Features/COLO).
>
> When the Primary Node panic,the Secondary Node qemu hang.
> hang at recvmsg in qio_channel_socket_readv.
> And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
> "x-colo-lost-heartbeat" } in Secondary VM's
> monitor,the  Secondary Node qemu still hang at recvmsg .
>
> I found that the colo in qemu is not complete yet.
> Do the colo have any plan for development?

Yes, We are developing. You can see some of patch we pushing.

> Has anyone ever run it successfully? Any help is appreciated!

In our internal version can run it successfully,
The failover detail you can ask Zhanghailiang for help.
Next time if you have some question about COLO,
please cc me and zhanghailiang <zhang.zhanghaili...@huawei.com>.


Thanks
Zhang Chen


>
>
>
> centos7.2+qemu2.7.50
> (gdb) bt
> #0  0x7f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
> #1  0x7f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
> io/channel-socket.c:497
> #2  0x7f3e03329472 in qio_channel_read (ioc=ioc@entry=0x7f3e05110e40,
> buf=buf@entry=0x7f3e05910f38 "", buflen=buflen@entry=32768,
> errp=errp@entry=0x0) at io/channel.c:97
> #3  0x7f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
> buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
> migration/qemu-file-channel.c:78
> #4  0x7f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
> migration/qemu-file.c:257
> #5  0x7f3e03274a41 in qemu_peek_byte (f=f@entry=0x7f3e05910f00,
> offset=offset@entry=0) at migration/qemu-file.c:510
> #6  0x7f3e03274aab in qemu_get_byte (f=f@entry=0x7f3e05910f00) at
> migration/qemu-file.c:523
> #7  0x7f3e03274cb2 in qemu_get_be32 (f=f@entry=0x7f3e05910f00) at
> migration/qemu-file.c:603
> #8  0x7f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
> errp=errp@entry=0x7f3d62bfaa50) at migration/colo.c:215
> #9  0x7f3e0327250d in 

[Qemu-devel]   [PATCHV2] COLO: COLO-FT.txt err

2017-03-17 Thread wang.guang55
This is an error in COLO-FT.txt.


secondeary-disk0 should be secondary-disk0.






Signed-off-by: Guang Wang<wang.guan...@zte.com.cn>

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt

index e289be2..754efda 100644

--- a/docs/COLO-FT.txt

+++ b/docs/COLO-FT.txt

@@ -139,7 +139,7 @@ Secondary:

 { 'execute': 'nbd-server-start',

   'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

 }

-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

+{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 
'writable': true } }

 

 Note:

   a. The qmp command nbd-server-start and nbd-server-add must be run















原始邮件



发件人:王广10165992
收件人: <zhang.zhanghaili...@huawei.com>
抄送人: <qemu-devel@nongnu.org> <zhangchen.f...@cn.fujitsu.com>
日 期 :2017年03月18日 13:38
主 题 :[Qemu-devel] [PATCH] COLO: COLO-FT.txt err






This is an error in COLO-FT.txt.

secondeary-disk0 should be secondary-disk0.




Signed-off-by: Guang Wang<wang.guan...@zte.com.cn>

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt

index e289be2..754efda 100644

--- a/docs/COLO-FT.txt

+++ b/docs/COLO-FT.txt

@@ -139,7 +139,7 @@ Secondary:

 { 'execute': 'nbd-server-start',

   'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

 }

-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

+{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 
'writable': true } }

[Qemu-devel] [PATCH] COLO: COLO-FT.txt err

2017-03-17 Thread wang.guang55
This is an error in COLO-FT.txt.

secondeary-disk0 should be secondary-disk0.




Signed-off-by: Guang Wang<wang.guan...@zte.com.cn>

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt

index e289be2..754efda 100644

--- a/docs/COLO-FT.txt

+++ b/docs/COLO-FT.txt

@@ -139,7 +139,7 @@ Secondary:

 { 'execute': 'nbd-server-start',

   'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

 }

-{'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

+{'execute': 'nbd-server-add', 'arguments': {'device': 'secondary-disk0', 
'writable': true } }

[Qemu-devel] 答复: Re: COLO :ailover hang

2017-03-16 Thread wang.guang55
hi.


I test the git qemu master have the same problem.

(gdb) bt

#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, 
fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461

#1  0x7f658e4aa0c2 in qio_channel_read (ioc=ioc@entry=0x7f65911b4e50, 
buf=buf@entry=0x7f65907cb838 "", buflen=buflen@entry=32768, 
errp=errp@entry=0x0) at io/channel.c:114

#2  0x7f658e3ea990 in channel_get_buffer (opaque=<optimized out>, 
buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at 
migration/qemu-file-channel.c:78

#3  0x7f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
migration/qemu-file.c:295

#4  0x7f658e3ea2e1 in qemu_peek_byte (f=f@entry=0x7f65907cb800, 
offset=offset@entry=0) at migration/qemu-file.c:555

#5  0x7f658e3ea34b in qemu_get_byte (f=f@entry=0x7f65907cb800) at 
migration/qemu-file.c:568

#6  0x7f658e3ea552 in qemu_get_be32 (f=f@entry=0x7f65907cb800) at 
migration/qemu-file.c:648

#7  0x7f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
errp=errp@entry=0x7f64ef3fd9b0) at migration/colo.c:244

#8  0x7f658e3e681e in colo_receive_check_message (f=<optimized out>, 
expect_msg=expect_msg@entry=COLO_MESSAGE_VMSTATE_SEND, 
errp=errp@entry=0x7f64ef3fda08)

at migration/colo.c:264

#9  0x7f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 
<mis_current.31286>) at migration/colo.c:577

#10 0x7f658be09df3 in start_thread () from /lib64/libpthread.so.0

#11 0x7f65881983ed in clone () from /lib64/libc.so.6

(gdb) p ioc->name

$2 = 0x7f658ff7d5c0 "migration-socket-incoming"

(gdb) p ioc->features  Do not support QIO_CHANNEL_FEATURE_SHUTDOWN

$3 = 0




(gdb) bt

#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, 
opaque=0x7fdcceeafa90) at migration/socket.c:137

#1  0x7fdcc6966350 in g_main_dispatch (context=<optimized out>) at 
gmain.c:3054

#2  g_main_context_dispatch (context=<optimized out>, 
context@entry=0x7fdccce9f590) at gmain.c:3630

#3  0x7fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213

#4  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:258

#5  main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:506

#6  0x7fdccb526187 in main_loop () at vl.c:1898

#7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
vl.c:4709

(gdb) p ioc->features

$1 = 6

(gdb) p ioc->name

$2 = 0x7fdcce1b1ab0 "migration-socket-listener"




May be socket_accept_incoming_migration should call 
qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??




thank you.









原始邮件



发件人: <zhangchen.f...@cn.fujitsu.com>
收件人:王广10165992 <qemu-devel@nongnu.org>
抄送人: <zhangchen.f...@cn.fujitsu.com> <zhang.zhanghaili...@huawei.com>
日 期 :2017年03月16日 14:46
主 题 :Re: [Qemu-devel] COLO failover hang







On 03/15/2017 05:06 PM, wangguang wrote:
>   am testing QEMU COLO feature described here [QEMU
> Wiki](http://wiki.qemu-project.org/Features/COLO).
>
> When the Primary Node panic,the Secondary Node qemu hang.
> hang at recvmsg in qio_channel_socket_readv.
> And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
> "x-colo-lost-heartbeat" } in Secondary VM's
> monitor,the  Secondary Node qemu still hang at recvmsg .
>
> I found that the colo in qemu is not complete yet.
> Do the colo have any plan for development?

Yes, We are developing. You can see some of patch we pushing.

> Has anyone ever run it successfully? Any help is appreciated!

In our internal version can run it successfully,
The failover detail you can ask Zhanghailiang for help.
Next time if you have some question about COLO,
please cc me and zhanghailiang <zhang.zhanghaili...@huawei.com>.


Thanks
Zhang Chen


>
>
>
> centos7.2+qemu2.7.50
> (gdb) bt
> #0  0x7f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
> #1  0x7f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
> io/channel-socket.c:497
> #2  0x7f3e03329472 in qio_channel_read (ioc=ioc@entry=0x7f3e05110e40,
> buf=buf@entry=0x7f3e05910f38 "", buflen=buflen@entry=32768,
> errp=errp@entry=0x0) at io/channel.c:97
> #3  0x7f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
> buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
> migration/qemu-file-channel.c:78
> #4  0x7f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
> migration/qemu-file.c:257
> #5  0x7f3e03274a41 in qemu_peek_byte (f=f@entry=0x7f3e05910f00,
> offset=offset@entry=0) at migration/qemu-file.c:510
> #6  0x7f3e03274aab in qemu_get_byte (f=f@entry=0x7f3e05910f00) at
> migration/qemu-file.c:523
> #7  0x7f3e03274cb2 in qemu_get_be32 (f=f@entry=0x7f3e05910f00) at
> migration/qemu-file.c:603
> #8  0x7f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
> errp=errp@entry=0x7f3d62bfaa50) at migration/colo.c:215
> #9  0x7f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
> 

[Qemu-devel] colo doc problem

2017-03-16 Thread wang.guang55
hi


http://git.qemu-project.org/?p=qemu.gita=blobf=docs/COLO-FT.txth=e289be2f419578645475b36861e4303ff624cb0fhb=HEAD




docs/COLO-FT




 138 {'execute':'qmp_capabilities'}

 139 { 'execute': 'nbd-server-start',

 140   'arguments': {'addr': {'type': 'inet', 'data': {'host': 'xx.xx.xx.xx', 
'port': '8889'} } }

 141 }

 142 {'execute': 'nbd-server-add', 'arguments': {'device': 'secondeary-disk0', 
'writable': true } }

secondeary-disk0 should be secondary-disk0 .







thank you.















原始邮件



发件人: <zhang.zhanghaili...@huawei.com>
收件人:王广10165992
抄送人: <xuqu...@huawei.com>
日 期 :2017年03月16日 09:24
主 题 :Re: about colo development





Hi,

On 2017/3/15 15:01, wang.guan...@zte.com.cn wrote:
> hi:
> I am testing QEMU COLO feature described here [QEMU 
Wiki](http://wiki.qemu-project.org/Features/COLO).
>
> I found that colo in qemu have not been complete。

Yes, you are right, there are still lots of work need to be done to perfect it.

> Do the have any plan for developing colo?

Sure, I have posted "[PATCH 00/15] COLO: integrate colo frame with block 
replication and net compare"
series, and it is planned to be merged in next version of QEMU. Besides this, 
there are some other
series are on the road.

> How long the colo will be complete for use?

It all depends on the reviewing speed.

> please replay,thank you.
>
>
>
>
>
>
>
> When the Primary Node panic,the Secondary Node qemu hang.
> hang at recvmsg in qio_channel_socket_readv.
> And  I run  { 'execute': 'nbd-server-stop' } and { "execute": 
"x-colo-lost-heartbeat" } in Secondary VM's
> monitor,the  Secondary Node qemu still hang at recvmsg .
>
> Did the recvmsg should be interrupt after failover??
> Has anyone ever run it successfully? Any help is appreciated!
>

We will shutdown the related fd so that it will not be blocked in read/write.
 From your backtrace, it seems that, it didn't go the place call shutdown(fd).

Maybe you need to debug the process of 'x-colo-lost-heartbeat' to figure out 
what happens.

Thanks,
Hailiang

>
>
> centos7.2+qemu2.7.50
> (gdb) bt
> #0  0x7f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
> #1  0x7f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>, 
iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at 
io/channel-socket.c:497
> #2  0x7f3e03329472 in qio_channel_read (ioc=ioc@entry=0x7f3e05110e40, 
buf=buf@entry=0x7f3e05910f38 "", buflen=buflen@entry=32768, 
errp=errp@entry=0x0) at io/channel.c:97
> #3  0x7f3e032750e0 in channel_get_buffer (opaque=<optimized out>, 
buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at 
migration/qemu-file-channel.c:78
> #4  0x7f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at 
migration/qemu-file.c:257
> #5  0x7f3e03274a41 in qemu_peek_byte (f=f@entry=0x7f3e05910f00, 
offset=offset@entry=0) at migration/qemu-file.c:510
> #6  0x7f3e03274aab in qemu_get_byte (f=f@entry=0x7f3e05910f00) at 
migration/qemu-file.c:523
> #7  0x7f3e03274cb2 in qemu_get_be32 (f=f@entry=0x7f3e05910f00) at 
migration/qemu-file.c:603
> #8  0x7f3e03271735 in colo_receive_message (f=0x7f3e05910f00, 
errp=errp@entry=0x7f3d62bfaa50) at migration/colo.c:215
> #9  0x7f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, 
checkpoint_request=<synthetic pointer>, f=<optimized out>) at 
migration/colo.c:546
> #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at 
migration/colo.c:649
> #11 0x7f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
> #12 0x7f3dfc9c03ed in clone () from /lib64/libc.so.6
>
> _
> Sent from http://qemu.11.n7.nabble.com
>
>
> .
>