I dump the net packet: ARP: 0000000 0000 0000 0000 0000 0000 0000 be00 d175 0000010 fcf1 5452 1200 5634 0608 0100 0008 0406 0000020 0100 5452 1200 5634 790a 70db 0000 0000 0000030 0000 790a fedb
IP: 0000000 0001 0000 0000 0022 0010 0000 be00 d175 0000010 fcf1 5452 1200 5634 0008 0045 2800 aa1f 0000020 0040 0680 0b93 790a 71db 1275 1ded 2fc0 0000030 5000 9ed8 8837 071e 1b7a 1450 0000 3548 0000040 0000 It may take some time to figure out where the 12Bytes before ETH Head come from. I am not very familiar with network protocols. I will do my best. -----邮件原件----- 发件人: [email protected] <[email protected]> 代表 Zhang, Chen 发送时间: 2019年3月13日 13:53 收件人: wenzt <[email protected]> 抄送: 'qemu-discuss' <[email protected]> 主题: Re: [Qemu-discuss] Latest Qemu-COLO Problems Sure, If it is convenient, you can try to debug under your environment and send a patch to Qemu community. I am very happy to review it. Thanks Zhang Chen From: wenzt [mailto:[email protected]] Sent: Wednesday, March 13, 2019 1:49 PM To: Zhang, Chen <[email protected]> Cc: 'qemu-discuss' <[email protected]> Subject: 答复: Latest Qemu-COLO Problems Your answer make sense to me. Different network environment may result in that status. I think more attention should be paid on the compatibility of COLO Proxy. 发件人: Zhang, Chen <[email protected]<mailto:[email protected]>> 发送时间: 2019年3月13日 11:49 收件人: wenzt <[email protected]<mailto:[email protected]>> 抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>> 主题: RE: Latest Qemu-COLO Problems From: wenzt [mailto:[email protected]] Sent: Wednesday, March 6, 2019 6:28 PM To: Zhang, Chen <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: 答复: Latest Qemu-COLO Problems I have tested Proxy with QMP: "{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }" I got this nothing except this logs on PVM side: [email protected]:colo_compare_main<mailto:[email protected]:col o_compare_main> : secondary: unsupported packet in [email protected]:colo_compare_main<mailto:[email protected]:col o_compare_main> : secondary: unsupported packet in [email protected]:colo_compare_main<mailto:[email protected]:col o_compare_main> : secondary: unsupported packet in [email protected]:colo_compare_main<mailto:[email protected]:col o_compare_main> : primary: unsupported packet in [email protected]:colo_compare_main<mailto:[email protected]:col o_compare_main> : secondary: unsupported packet in My guest OS is Centos 7.5. I did nothing but boot up the OS. After that, firing some net IO still get those logs. I did some debug, maybe some parse error in parse_packet_early(), get the wrong ETH_P_protocolName Hi Zhengtao, I think your test environment have some net issue, can you get IP in the guest? Without COLO guest’s status? Or you use Jiaoyuwang to test? network switch do some job in ETH level(like vlan)? In my side primary node proxy report like that: [email protected]:colo_send_message<mailto:[email protected]:colo_ send_message> Send 'checkpoint-request' message [email protected]:colo_receive_message<mailto:[email protected]:co lo_receive_message> Receive 'checkpoint-reply' message {"timestamp": {"seconds": 1552455102, "microseconds": 148903}, "event": "STOP"} [email protected]:colo_vm_state_change<mailto:[email protected]:co lo_vm_state_change> Change 'run' => 'stop' [email protected]:colo_send_message<mailto:[email protected]:colo_ send_message> Send 'vmstate-send' message [email protected]:colo_send_message<mailto:[email protected]:colo_ send_message> Send 'vmstate-size' message [email protected]:colo_receive_message<mailto:[email protected]:co lo_receive_message> Receive 'vmstate-received' message [email protected]:colo_receive_message<mailto:[email protected]:co lo_receive_message> Receive 'vmstate-loaded' message {"timestamp": {"seconds": 1552455102, "microseconds": 277064}, "event": "RESUME"} [email protected]:colo_vm_state_change<mailto:[email protected]:co lo_vm_state_change> Change 'stop' => 'run' [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : compare udp [email protected]:colo_compare_ip_info<mailto:[email protected]:co lo_compare_ip_info> ppkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.248.2.5, spkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.248.2.5 [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : packet same and release packet [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : compare udp [email protected]:colo_compare_ip_info<mailto:[email protected]:co lo_compare_ip_info> ppkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.239.27.228, spkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.239.27.228 [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : packet same and release packet [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : compare udp [email protected]:colo_compare_ip_info<mailto:[email protected]:co lo_compare_ip_info> ppkt size = 81, ip_src = 10.239.161.136, ip_dst = 172.17.6.9, spkt size = 81, ip_src = 10.239.161.136, ip_dst = 172.17.6.9 [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : packet same and release packet [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : compare udp [email protected]:colo_compare_ip_info<mailto:[email protected]:co lo_compare_ip_info> ppkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.248.2.5, spkt size = 81, ip_src = 10.239.161.136, ip_dst = 10.248.2.5 [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : packet same and release packet [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : compare icmp [email protected]:colo_compare_ip_info<mailto:[email protected]:co lo_compare_ip_info> ppkt size = 157, ip_src = 10.239.161.136, ip_dst = 172.17.6.9, spkt size = 157, ip_src = 10.239.161.136, ip_dst = 172.17.6.9 [email protected]:colo_compare_main<mailto:[email protected]:colo_ compare_main> : packet same and release packet Thanks Zhang Chen Thanks, Zhengtao 发件人: Zhang, Chen <[email protected]<mailto:[email protected]>> 发送时间: 2019年3月5日 23:32 收件人: wenzt <[email protected]<mailto:[email protected]>> 抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>> 主题: RE: Latest Qemu-COLO Problems From: wenzt [mailto:[email protected]] Sent: Thursday, February 28, 2019 10:00 AM To: Zhang, Chen <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: 答复: Latest Qemu-COLO Problems This version: https://github.com/coloft/qemu/tree/colo-v4.1-periodic-mode This is old version from 3 years ago, please drop it, use qemu upstream codes. Another question: What is the relationship between Proxy and Checkpoint ? When PVM and SVM send different net packet, proxy will send a signal to COLO-frame to do a checkpoint. Do they work together ? I guess we should set checkpoint interval longer like 20s. Yes, they work together, at the same time, we have periodic checkpoint mechanism, like a timer. You can set the time too. Does Proxy only works under network workload ? In my test, I feel like Proxy not working. Yes, as wiki said, colo-proxy compare the PVM and SVM packet to decide if do checkpoint. You can enable the COLO debug info to see proxy’s job in primary node like this: "{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable': true} }" Thanks Zhang Chen 发件人: Zhang, Chen <[email protected]<mailto:[email protected]>> 发送时间: 2019年2月28日 9:34 收件人: wenzt <[email protected]<mailto:[email protected]>> 抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>> 主题: RE: Latest Qemu-COLO Problems Which version? COLO project always said the PVM and SVM execute in parallel. Thanks Zhang Chen From: wenzt [mailto:[email protected]] Sent: Thursday, February 28, 2019 9:21 AM To: Zhang, Chen <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: 答复: Latest Qemu-COLO Problems But in earlier version, I noticed that SVM always inmigration status even doing checkpoint. No operation can be performed on SVM. Thanks, Zhengtao 发件人: Zhang, Chen <[email protected]<mailto:[email protected]>> 发送时间: 2019年2月27日 18:45 收件人: wenzt <[email protected]<mailto:[email protected]>> 抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>> 主题: RE: Latest Qemu-COLO Problems From: wenzt [mailto:[email protected]] Sent: Wednesday, February 27, 2019 6:04 PM To: Zhang, Chen <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: 答复: Latest Qemu-COLO Problems Thanks for help ! I don’t know why we keep switching SVM between Run and Stop ? Why we don’t keep SVM inmigration status ? Because we need do checkpoint to sync all status between PVM and SVM. We can’t guarantee that their status will be the same after a while. Thanks Zhang Chen Thanks, Zhengtao 发件人: Zhang, Chen <[email protected]<mailto:[email protected]>> 发送时间: 2019年2月26日 18:41 收件人: wenzt <[email protected]<mailto:[email protected]>> 抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>> 主题: RE: Latest Qemu-COLO Problems By the way, please read the COLO wiki use this command to trigger failover in secondary node: { 'execute': 'nbd-server-stop' } { "execute": "x-colo-lost-heartbeat" } Thanks Zhang Chen From: Zhang, Chen Sent: Tuesday, February 26, 2019 2:46 PM To: 'wenzt' <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: RE: Latest Qemu-COLO Problems Sorry for slow response. I have fixed this bug in this series: https://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg06920.html Please test it. Thanks Zhang Chen From: wenzt [mailto:[email protected]] Sent: Friday, February 15, 2019 7:54 PM To: Zhang, Chen <[email protected]<mailto:[email protected]>> Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>> Subject: Latest Qemu-COLO Problems Hi Zhang, I have tested COLO with qemu-3.1.0 follow https://wiki.qemu.org/Features/COLO I got this problems on PVM: {"timestamp": {"seconds": 1550230616, "microseconds": 644348}, "event": "STOP"} {"timestamp": {"seconds": 1550230616, "microseconds": 719003}, "event": "RESUME"} {"timestamp": {"seconds": 1550230616, "microseconds": 743554}, "event": "STOP"} qemu-system-x86_64: Can't receive COLO message: Input/output error qemu-system-x86_64: Can't receive COLO message: Input/output error {"timestamp": {"seconds": 1550230618, "microseconds": 257209}, "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "error"}} And on SVM: {"timestamp": {"seconds": 1550230616, "microseconds": 731544}, "event": "STOP"} [email protected]:colo_vm_state_change<mailto:[email protected]: colo_vm_state_change> Change 'run' => 'stop' [email protected]:colo_send_message<mailto:[email protected]:col o_send_message> Send 'checkpoint-reply' message [email protected]:colo_receive_message<mailto:[email protected]: colo_receive_message> Receive 'vmstate-send' message [email protected]:colo_flush_ram_cache_begin<mailto:[email protected] 59522:colo_flush_ram_cache_begin> dirty_pages 18446744073708498780 [email protected]:colo_flush_ram_cache_end<mailto:[email protected] 575:colo_flush_ram_cache_end> [email protected]:colo_receive_message<mailto:[email protected]: colo_receive_message> Receive 'vmstate-size' message [email protected]:colo_send_message<mailto:[email protected]:col o_send_message> Send 'vmstate-received' message {"timestamp": {"seconds": 1550230616, "microseconds": 837436}, "event": "RESUME"} qemu-system-x86_64: block.c:5062: bdrv_detach_aio_context: Assertion `!bs->walking_aio_notifiers' failed. Aborted (core dumped)
