from:"zhanghailiang"

RE: [PATCH v2] MAINTAINERS: Change my email address

2021-12-14 Thread zhanghailiang

> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Tuesday, December 14, 2021 6:18 PM
> To: Daniel P. Berrangé 
> Cc: zhanghailiang ; qemu-devel@nongnu.org;
> Gonglei ; Wen Congyang
> ; dgilb...@redhat.com; quint...@redhat.com
> Subject: Re: [PATCH v2] MAINTAINERS: Change my email address
> 
> On 12/14/21 10:22, Daniel P. Berrangé wrote:
> > On Tue, Dec 14, 2021 at 10:04:03AM +0100, Philippe Mathieu-Daudé wrote:
> >> On 12/14/21 08:54, Hailiang Zhang wrote:
> >>> The zhang.zhanghaili...@huawei.com email address has been stopped.
> >>> Change it to my new email address.
> >>>
> >>> Signed-off-by: Hailiang Zhang 
> >>> ---
> >>> hi Juan & Dave,
> >>>
> >>> Firstly, thank you for your working on maintaining the COLO framework.
> >>> I didn't have much time on it in the past days.
> >>>
> >>> I may have some time in the next days since my job has changed.
> >>>
> >>> Because of my old email being stopped, i can not use it to send this 
> >>> patch.
> >>> Please help me to merge this patch.
> >>
> >> Can we have an Ack-by from someone working at Huawei?
> >
> > Why do we need that ?
> 
> To avoid anyone impersonating Hailiang Zhang...
> 

Agreed. This check is necessary. I totally understand.

> But it doesn't have to be from the same company, as long as someone knowing
> him vouch the change. Anyhow I am not nacking this patch, I am trying to have 
> a
> safer process.
> 
> > Subsystems are not owned by companies.
> >
> > If someone moves company and wants to carry on in their existing role
> > as maintainer that is fine and doesn't need approva from their old
> > company IMHO.
> 
> I agree, this is why it is better to send that kind of change from the
> to-be-stopped email address while it is still valid.
> 

I should send this patch before my old email address stopped, but I didn't get 
permission of
Sending email in new company until now. ☹

Thanks.

> Thanks,
> 
> Phil.

RE: [PATCH v2] MAINTAINERS: Change my email address

2021-12-14 Thread zhanghailiang

Yes.

I'll tell Gonglei to help confirm this patch. 

Thanks.

-Original Message-
From: Philippe Mathieu-Daudé  
Sent: Tuesday, December 14, 2021 5:04 PM
To: zhanghailiang ; qemu-devel@nongnu.org; Gonglei 
; Wen Congyang 
Cc: dgilb...@redhat.com; quint...@redhat.com
Subject: Re: [PATCH v2] MAINTAINERS: Change my email address

On 12/14/21 08:54, Hailiang Zhang wrote:
> The zhang.zhanghaili...@huawei.com email address has been stopped. 
> Change it to my new email address.
> 
> Signed-off-by: Hailiang Zhang 
> ---
> hi Juan & Dave,
> 
> Firstly, thank you for your working on maintaining the COLO framework.
> I didn't have much time on it in the past days.
> 
> I may have some time in the next days since my job has changed.
> 
> Because of my old email being stopped, i can not use it to send this patch.
> Please help me to merge this patch.

Can we have an Ack-by from someone working at Huawei?

> Thanks,
> Hailiang
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS index 7543eb4d59..5d9c4243b4 
> 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2967,7 +2967,7 @@ F: include/qemu/yank.h
>  F: qapi/yank.json
>  
>  COLO Framework
> -M: zhanghailiang 
> +M: Hailiang Zhang 
>  S: Maintained
>  F: migration/colo*
>  F: include/migration/colo.h
>

RE: [PATCH v3 00/18] Support Multifd for RDMA migration

2020-10-21 Thread Zhanghailiang

Hi zhengchuan,

> -Original Message-
> From: zhengchuan
> Sent: Saturday, October 17, 2020 12:26 PM
> To: quint...@redhat.com; dgilb...@redhat.com
> Cc: Zhanghailiang ; Chenzhendong (alex)
> ; Xiexiangyou ; wanghao
> (O) ; yubihong ;
> fengzhim...@huawei.com; qemu-devel@nongnu.org
> Subject: [PATCH v3 00/18] Support Multifd for RDMA migration
> 
> Now I continue to support multifd for RDMA migration based on my colleague
> zhiming's work:)
> 
> The previous RFC patches is listed below:
> v1:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg669455.html
> v2:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg679188.html
> 
> As descried in previous RFC, the RDMA bandwidth is not fully utilized for over
> 25Gigabit NIC because of single channel for RDMA migration.
> This patch series is going to support multifd for RDMA migration based on 
> multifd
> framework.
> 
> Comparsion is between origion and multifd RDMA migration is re-tested for v3.
> The VM specifications for migration are as follows:
> - VM use 4k page;
> - the number of VCPU is 4;
> - the total memory is 16Gigabit;
> - use 'mempress' tool to pressurize VM(mempress 8000 500);
> - use 25Gigabit network card to migrate;
> 
> For origin RDMA and MultiRDMA migration, the total migration times of VM are
> as follows:
> +
> | | NOT rdma-pin-all | rdma-pin-all |
> +
> | origin RDMA |   26 s   | 29 s |
> -
> |  MultiRDMA  |   16 s   | 17 s |
> +
> 
> Test the multifd RDMA migration like this:
> virsh migrate --live --multiFd --migrateuri

There is no option '--multiFd' for virsh commands, It seems that, we added this 
private option for internal usage.
It's better to provide testing method by using qemu commands.


Thanks.

> rdma://192.168.1.100 [VM] --listen-address 0.0.0.0
> qemu+tcp://192.168.1.100/system --verbose
> 
> v2 -> v3:
> create multifd ops for both tcp and rdma
> do not export rdma to avoid multifd code in mess
> fix build issue for non-rdma
> fix some codestyle and buggy code
> 
> Chuan Zheng (18):
>   migration/rdma: add the 'migrate_use_rdma_pin_all' function
>   migration/rdma: judge whether or not the RDMA is used for migration
>   migration/rdma: create multifd_setup_ops for Tx/Rx thread
>   migration/rdma: add multifd_setup_ops for rdma
>   migration/rdma: do not need sync main for rdma
>   migration/rdma: export MultiFDSendParams/MultiFDRecvParams
>   migration/rdma: add rdma field into multifd send/recv param
>   migration/rdma: export getQIOChannel to get QIOchannel in rdma
>   migration/rdma: add multifd_rdma_load_setup() to setup multifd rdma
>   migration/rdma: Create the multifd recv channels for RDMA
>   migration/rdma: record host_port for multifd RDMA
>   migration/rdma: Create the multifd send channels for RDMA
>   migration/rdma: Add the function for dynamic page registration
>   migration/rdma: register memory for multifd RDMA channels
>   migration/rdma: only register the memory for multifd channels
>   migration/rdma: add rdma_channel into Migrationstate field
>   migration/rdma: send data for both rdma-pin-all and NOT rdma-pin-all
> mode
>   migration/rdma: RDMA cleanup for multifd migration
> 
>  migration/migration.c |  24 +++
>  migration/migration.h |  11 ++
>  migration/multifd.c   |  97 +-
>  migration/multifd.h   |  24 +++
>  migration/qemu-file.c |   5 +
>  migration/qemu-file.h |   1 +
>  migration/rdma.c  | 503
> +-
>  7 files changed, 653 insertions(+), 12 deletions(-)
> 
> --
> 1.8.3.1

RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint

2020-08-15 Thread Zhanghailiang

> -Original Message-
> From: Derek Su [mailto:jwsu1...@gmail.com]
> Sent: Thursday, August 13, 2020 6:28 PM
> To: Lukas Straub 
> Cc: Derek Su ; qemu-devel@nongnu.org; Zhanghailiang
> ; chy...@qnap.com; quint...@redhat.com;
> dgilb...@redhat.com; ctch...@qnap.com
> Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> checkpoint
> 
> On Fri, Jul 31, 2020 at 3:52 PM Lukas Straub  wrote:
> >
> > On Sun, 21 Jun 2020 10:10:03 +0800
> > Derek Su  wrote:
> >
> > > This series is to reduce the guest's downtime during colo checkpoint
> > > by migrating dirty ram pages as many as possible before colo checkpoint.
> > >
> > > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or ram
> > > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > > ram migration and do colo checkpoint.
> > >
> > > Test environment:
> > > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> > > FT traffic.
> > > One fio buffer write job runs on the guest.
> > > The result shows the total primary VM downtime is decreased by ~40%.
> > >
> > > Please help to review it and suggestions are welcomed.
> > > Thanks.
> >
> > Hello Derek,
> > Sorry for the late reply.
> > I think this is not a good idea, because it unnecessarily introduces a delay
> between checkpoint request and the checkpoint itself and thus impairs network
> bound workloads due to increased network latency. Workloads that are
> independent from network don't cause many checkpoints anyway, so it doesn't
> help there either.
> >
> 

Hi Derek,

Actually, There is a quit interesting question we should think: 
What will happen if VM continues to run after detected a mismatched state 
between PVM and SVM,
According to the rules of COLO, we should stop VMs immediately to sync the 
state between PVM and SVM,
But here, you choose them to continue to run for a while, then there may be 
more client's network packages
Coming, and may cause more memory pages dirty, another side effect is the new 
network packages will not
Be sent out with high probability, because their replies should be different 
since the state between PVM and SVM is different.

So, IMHO, it makes non-sense to let VMs to continue to run after detected them 
in different state.
Besides, I don't think it is easy to construct this case in tests.


Thanks,
Hailiang

s> Hello, Lukas & Zhanghailiang
> 
> Thanks for your opinions.
> I went through my patch, and I feel a little confused and would like to dig 
> into it
> more.
> 
> In this patch, colo_migrate_ram_before_checkpoint() is before
> COLO_MESSAGE_CHECKPOINT_REQUEST, so the SVM and PVM should not enter
> the pause state.
> 
> In the meanwhile, the packets to PVM/SVM can still be compared and notify
> inconsistency if mismatched, right?
> Is it possible to introduce extra network latency?
> 
> In my test (randwrite to disk by fio with direct=0), the ping from another 
> client to
> the PVM  using generic colo and colo used this patch are below.
> The network latency does not increase as my expectation.
> 
> generic colo
> ```
> 64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=28.109 ms
> 64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=16.747 ms
> 64 bytes from 192.168.80.18: icmp_seq=89 ttl=64 time=2388.779 ms
> <checkpoint start
> 64 bytes from 192.168.80.18: icmp_seq=90 ttl=64 time=1385.792 ms
> 64 bytes from 192.168.80.18: icmp_seq=91 ttl=64 time=384.896 ms
> <checkpoint end
> 64 bytes from 192.168.80.18: icmp_seq=92 ttl=64 time=3.895 ms
> 64 bytes from 192.168.80.18: icmp_seq=93 ttl=64 time=1.020 ms
> 64 bytes from 192.168.80.18: icmp_seq=94 ttl=64 time=0.865 ms
> 64 bytes from 192.168.80.18: icmp_seq=95 ttl=64 time=0.854 ms
> 64 bytes from 192.168.80.18: icmp_seq=96 ttl=64 time=28.359 ms
> 64 bytes from 192.168.80.18: icmp_seq=97 ttl=64 time=12.309 ms
> 64 bytes from 192.168.80.18: icmp_seq=98 ttl=64 time=0.870 ms
> 64 bytes from 192.168.80.18: icmp_seq=99 ttl=64 time=2371.733 ms
> 64 bytes from 192.168.80.18: icmp_seq=100 ttl=64 time=1371.440 ms
> 64 bytes from 192.168.80.18: icmp_seq=101 ttl=64 time=366.414 ms
> 64 bytes from 192.168.80.18: icmp_seq=102 ttl=64 time=0.818 ms
> 64 bytes from 192.168.80.18: icmp_seq=103 ttl=64 time=0.997 ms ```
> 
> colo used this patch
> ```
> 64 bytes from 192.168.80.18: icmp_seq=72 ttl=64 time=1.417 ms
> 64 bytes from 192.168.80.18: icmp_seq=73 ttl=64 time=0.931 ms
> 64 bytes from 192.168.80.18: icmp_seq=74 ttl=64 time=0.876 ms
> 64 bytes from 192.168.80.18: icmp_seq=75 ttl=64 time=1184.034 ms
> <checkpoint start
> 64 bytes from 192.168

RE: [PATCH v0 0/4] background snapshot

2020-08-04 Thread Zhanghailiang

Hi David,

Thanks for cc me, it was really exciting to know that write-protect feature 
finally been merged.
Exclude live memory snapshot, I'm thinking if we can use it to realize the real 
memory throttle in migration,
Since we still can come across dirty pages fail to converge with current cpu 
throttle method.
we may use write-protect capability to slow down the accessing speed of guest's 
memory, in order to
slow down the dirty pages ..., I'll look into it.

Besides, I'll follow this snapshot series, and to see if I can do some works to 
make this feature to be perfect enough
To be accepted as quickly as possible. ;)


Thanks,
Hailiang

> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Tuesday, July 28, 2020 1:00 AM
> To: Denis Plotnikov ; da...@redhat.com;
> Zhanghailiang 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; quint...@redhat.com;
> ebl...@redhat.com; arm...@redhat.com; pet...@redhat.com;
> d...@openvz.org
> Subject: Re: [PATCH v0 0/4] background snapshot
> 
> * Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> > Currently where is no way to make a vm snapshot without pausing a vm
> > for the whole time until the snapshot is done. So, the problem is the
> > vm downtime on snapshoting. The downtime value depends on the
> vmstate
> > size, the major part of which is RAM and the disk performance which is
> > used for the snapshot saving.
> >
> > The series propose a way to reduce the vm snapshot downtime. This is
> > done by saving RAM, the major part of vmstate, in the background when
> > the vm is running.
> >
> > The background snapshot uses linux UFFD write-protected mode for
> > memory page access intercepting. UFFD write-protected mode was added
> to the linux v5.7.
> > If UFFD write-protected mode isn't available the background snapshot
> > rejects to run.
> 
> Hi Denis,
>   I see Peter has responded to most of your patches, but just anted to say
> thank you; but also to cc in a couple of other people; David Hildenbrand
> (who is interested in unusual memory stuff) and zhanghailiang who works on
> COLO which also does snapshotting and had long wanted to use WP.
> 
>   2/4 was a bit big for my liking; please try and do it in smaller chunks!
> 
> Dave
> 
> > How to use:
> > 1. enable background snapshot capability
> >virsh qemu-monitor-command vm --hmp migrate_set_capability
> > background-snapshot on
> >
> > 2. stop the vm
> >virsh qemu-monitor-command vm --hmp stop
> >
> > 3. Start the external migration to a file
> >virsh qemu-monitor-command cent78-bs --hmp migrate
> exec:'cat > ./vm_state'
> >
> > 4. Wait for the migration finish and check that the migration has completed
> state.
> >
> > Denis Plotnikov (4):
> >   bitops: add some atomic versions of bitmap operations
> >   migration: add background snapshot capability
> >   migration: add background snapshot
> >   background snapshot: add trace events for page fault processing
> >
> >  qapi/migration.json |   7 +-
> >  include/exec/ramblock.h |   8 +
> >  include/exec/ramlist.h  |   2 +
> >  include/qemu/bitops.h   |  25 ++
> >  migration/migration.h   |   1 +
> >  migration/ram.h |  19 +-
> >  migration/savevm.h  |   3 +
> >  migration/migration.c   | 142 +-
> >  migration/ram.c | 582
> ++--
> >  migration/savevm.c  |   1 -
> >  migration/trace-events  |   2 +
> >  11 files changed, 771 insertions(+), 21 deletions(-)
> >
> > --
> > 2.17.0
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint

2020-07-31 Thread Zhanghailiang

Hi Lukas Straub & Derek,

Sorry for the late reply, too busy these days ;)

> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Friday, July 31, 2020 3:52 PM
> To: Derek Su 
> Cc: qemu-devel@nongnu.org; Zhanghailiang
> ; chy...@qnap.com;
> quint...@redhat.com; dgilb...@redhat.com; ctch...@qnap.com;
> jwsu1...@gmail.com
> Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> checkpoint
> 
> On Sun, 21 Jun 2020 10:10:03 +0800
> Derek Su  wrote:
> 
> > This series is to reduce the guest's downtime during colo checkpoint
> > by migrating dirty ram pages as many as possible before colo checkpoint.
> >
> > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram
> > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > ram migration and do colo checkpoint.
> >
> > Test environment:
> > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> FT
> > traffic.
> > One fio buffer write job runs on the guest.
> > The result shows the total primary VM downtime is decreased by ~40%.
> >
> > Please help to review it and suggestions are welcomed.
> > Thanks.
> 
> Hello Derek,
> Sorry for the late reply.
> I think this is not a good idea, because it unnecessarily introduces a delay
> between checkpoint request and the checkpoint itself and thus impairs
> network bound workloads due to increased network latency. Workloads that
> are independent from network don't cause many checkpoints anyway, so it
> doesn't help there either.
> 

Agreed, though it seems to reduce VM's downtime while do checkpoint, but
It doesn't help to reduce network latency, because the network packages which 
are
Different between SVM and PVM caused this checkpoint request, it will be blocked
Until finishing checkpoint process.


> Hailang did have a patch to migrate ram between checkpoints, which should
> help all workloads, but it wasn't merged back then. I think you can pick it up
> again, rebase and address David's and Eric's comments:
> https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhang
> haili...@huawei.com/T/#u
>  

The second one is not merged, which can help reduce the downtime.

> Hailang, are you ok with that?
> 

Yes. @Derek, please feel free to pick it up if you would like to ;)


Thanks,
Hailiang

> Regards,
> Lukas Straub

RE: [PATCH 08/22] qga: Plug unlikely memory leak in guest-set-memory-blocks

2020-06-23 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> -Original Message-
> From: Markus Armbruster [mailto:arm...@redhat.com]
> Sent: Monday, June 22, 2020 6:43 PM
> To: qemu-devel@nongnu.org
> Cc: Michael Roth ; Zhanghailiang
> 
> Subject: [PATCH 08/22] qga: Plug unlikely memory leak in
> guest-set-memory-blocks
> 
> transfer_memory_block() leaks an Error object when reading file
> /sys/devices/system/memory/memory/state fails with errno other
> than ENOENT, and @sys2memblk is false, i.e. when the state file exists but
> cannot be read (seems quite unlikely), and this is guest-set-memory-blocks,
> not guest-get-memory-blocks.
> 
> Plug the leak.
> 
> Fixes: bd240fca42d5f072fb758a71720d9de9990ac553
> Cc: Michael Roth 
> Cc: Hailiang Zhang 
> Signed-off-by: Markus Armbruster 
> ---
>  qga/commands-posix.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c index
> ae1348dc8f..cdbeb59dcc 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -2421,6 +2421,7 @@ static void
> transfer_memory_block(GuestMemoryBlock *mem_blk, bool sys2memblk,
>  if (sys2memblk) {
>  error_propagate(errp, local_err);
>  } else {
> +error_free(local_err);
>  result->response =
> 
> GUEST_MEMORY_BLOCK_RESPONSE_TYPE_OPERATION_FAILED;
>  }
> --
> 2.26.2

RE: Memory leak in transfer_memory_block()?

2020-06-18 Thread Zhanghailiang

> -Original Message-
> From: Markus Armbruster [mailto:arm...@redhat.com]
> Sent: Thursday, June 18, 2020 1:36 PM
> To: Zhanghailiang 
> Cc: qemu-devel@nongnu.org; Michael Roth 
> Subject: Memory leak in transfer_memory_block()?
> 
> We appear to leak an Error object when ga_read_sysfs_file() fails with
> errno != ENOENT unless caller passes true @sys2memblk:
> 
> static void transfer_memory_block(GuestMemoryBlock *mem_blk, bool
> sys2memblk,
>   GuestMemoryBlockResponse
> *result,
>   Error **errp)
> {
> [...]
> if (local_err) {
> 
> We have an Error object.
> 
> /* treat with sysfs file that not exist in old kernel */
> if (errno == ENOENT) {
> 
> Case 1: ENOENT; we free it.  Good.
> 
> error_free(local_err);
> if (sys2memblk) {
> mem_blk->online = true;
> mem_blk->can_offline = false;
> } else if (!mem_blk->online) {
> result->response =
> 
> GUEST_MEMORY_BLOCK_RESPONSE_TYPE_OPERATION_NOT_SUPPORTED;
> }
> } else {
> 
> Case 2: other than ENOENT
> 
> if (sys2memblk) {
> 
> Case 2a: sys2memblk; we pass it to the caller.  Good.
> 
> error_propagate(errp, local_err);
> } else {
> 
> Case 2b: !sys2memblk; ???
> 

Good catch!  I think we should pass the error info back to the caller,
Let's record this error for debug when it happens.

> result->response =
> 
> GUEST_MEMORY_BLOCK_RESPONSE_TYPE_OPERATION_FAILED;
> }
> }
> goto out2;
> }
> [...]
> out2:
> g_free(status);
> close(dirfd);
> out1:
> if (!sys2memblk) {
> result->has_error_code = true;
> result->error_code = errno;
> }
> }
> 
> What is supposed to be done with @local_err in case 2b?

RE: [PATCH 1/2] migration/colo: fix typo in the COLO Framework module

2020-06-15 Thread Zhanghailiang

Reviewed-by: zhanghailiang 


> -Original Message-
> From: Like Xu [mailto:like...@linux.intel.com]
> Sent: Sunday, June 14, 2020 4:45 PM
> To: qemu-devel@nongnu.org
> Cc: Like Xu ; Zhanghailiang
> 
> Subject: [PATCH 1/2] migration/colo: fix typo in the COLO Framework
> module
> 
> Cc: Hailiang Zhang 
> Signed-off-by: Like Xu 
> ---
>  docs/COLO-FT.txt | 8 
>  migration/colo.c | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index
> c8e1740935..fdc0207cff 100644
> --- a/docs/COLO-FT.txt
> +++ b/docs/COLO-FT.txt
> @@ -10,7 +10,7 @@ See the COPYING file in the top-level directory.
>  This document gives an overview of COLO's design and how to use it.
> 
>  == Background ==
> -Virtual machine (VM) replication is a well known technique for providing
> +Virtual machine (VM) replication is a well-known technique for
> +providing
>  application-agnostic software-implemented hardware fault tolerance,  also
> known as "non-stop service".
> 
> @@ -103,7 +103,7 @@ Primary side.
> 
>  COLO Proxy:
>  Delivers packets to Primary and Secondary, and then compare the responses
> from -both side. Then decide whether to start a checkpoint according to
> some rules.
> +both sides. Then decide whether to start a checkpoint according to some
> rules.
>  Please refer to docs/colo-proxy.txt for more information.
> 
>  Note:
> @@ -146,12 +146,12 @@ in test procedure.
> 
>  == Test procedure ==
>  Note: Here we are running both instances on the same host for testing,
> -change the IP Addresses if you want to run it on two hosts. Initally
> +change the IP Addresses if you want to run it on two hosts. Initially
>  127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
> 
>  == Startup qemu ==
>  1. Primary:
> -Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts.
> +Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
>  You don't need to change any IP's here, because 0.0.0.0 listens on any
> interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
> instance.
> diff --git a/migration/colo.c b/migration/colo.c index
> ea7d1e9d4e..80788d46b5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -632,7 +632,7 @@ out:
>  /*
>   * It is safe to unregister notifier after failover finished.
>   * Besides, colo_delay_timer and colo_checkpoint_sem can't be
> - * released befor unregister notifier, or there will be use-after-free
> + * released before unregister notifier, or there will be
> + use-after-free
>   * error.
>   */
>  colo_compare_unregister_notifier(_compare_notifier);
> --
> 2.21.3

RE: [PATCH 1/2] migration/colo: fix typo in the COLO Framework module

2020-06-14 Thread Zhanghailiang


I have checked this patch in mail archive, it has no problem,
It seems that my email setup has some problem. It didn't show the right newline
In this patch.


> -Original Message-
> From: Like Xu [mailto:like...@linux.intel.com]
> Sent: Monday, June 15, 2020 10:24 AM
> To: Zhanghailiang ;
> qemu-devel@nongnu.org
> Subject: Re: [PATCH 1/2] migration/colo: fix typo in the COLO Framework
> module
> 
> On 2020/6/15 9:36, Zhanghailiang wrote:
> > Hi Like,
> >
> > Please check this patch, It seems that you didn't use git format-patch
> > command to generate this patch, It is in wrong format.
> 
> I rebase the patch on the top commit of
> 7d3660e79830a069f1848bb4fa1cdf8f666424fb,
> and hope it helps you.
> 
> >
> > Thanks,
> > Hailiang
> 
>  From 15c19be9be07598d4264a4a84b85d4efa79bff9d Mon Sep 17
> 00:00:00 2001
> From: Like Xu 
> Date: Mon, 15 Jun 2020 10:10:57 +0800
> Subject: [PATCH 1/2] migration/colo: fix typo in the COLO Framework
> module
> 
> Cc: Hailiang Zhang 
> Signed-off-by: Like Xu 
> ---
>   docs/COLO-FT.txt | 8 
>   migration/colo.c | 2 +-
>   2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index
> c8e1740935..fdc0207cff 100644
> --- a/docs/COLO-FT.txt
> +++ b/docs/COLO-FT.txt
> @@ -10,7 +10,7 @@ See the COPYING file in the top-level directory.
>   This document gives an overview of COLO's design and how to use it.
> 
>   == Background ==
> -Virtual machine (VM) replication is a well known technique for providing
> +Virtual machine (VM) replication is a well-known technique for
> +providing
>   application-agnostic software-implemented hardware fault tolerance,
>   also known as "non-stop service".
> 
> @@ -103,7 +103,7 @@ Primary side.
> 
>   COLO Proxy:
>   Delivers packets to Primary and Secondary, and then compare the
> responses from -both side. Then decide whether to start a checkpoint
> according to some rules.
> +both sides. Then decide whether to start a checkpoint according to some
> rules.
>   Please refer to docs/colo-proxy.txt for more information.
> 
>   Note:
> @@ -146,12 +146,12 @@ in test procedure.
> 
>   == Test procedure ==
>   Note: Here we are running both instances on the same host for testing,
> -change the IP Addresses if you want to run it on two hosts. Initally
> +change the IP Addresses if you want to run it on two hosts. Initially
>   127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
> 
>   == Startup qemu ==
>   1. Primary:
> -Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts.
> +Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
>   You don't need to change any IP's here, because 0.0.0.0 listens on any
>   interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
>   instance.
> diff --git a/migration/colo.c b/migration/colo.c index
> ea7d1e9d4e..80788d46b5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -632,7 +632,7 @@ out:
>   /*
>* It is safe to unregister notifier after failover finished.
>* Besides, colo_delay_timer and colo_checkpoint_sem can't be
> - * released befor unregister notifier, or there will be use-after-free
> + * released before unregister notifier, or there will be
> + use-after-free
>* error.
>*/
>   colo_compare_unregister_notifier(_compare_notifier);
> --
> 2.21.3
>

RE: [PATCH 1/2] migration/colo: fix typo in the COLO Framework module

2020-06-14 Thread Zhanghailiang

Hi Like,

Please check this patch, It seems that you didn't use git format-patch command 
to generate this patch,
It is in wrong format.

Thanks,
Hailiang

> -Original Message-
> From: Like Xu [mailto:like...@linux.intel.com]
> Sent: Sunday, June 14, 2020 4:45 PM
> To: qemu-devel@nongnu.org
> Cc: Like Xu ; Zhanghailiang
> 
> Subject: [PATCH 1/2] migration/colo: fix typo in the COLO Framework
> module
> 
> Cc: Hailiang Zhang 
> Signed-off-by: Like Xu 
> ---
>  docs/COLO-FT.txt | 8 
>  migration/colo.c | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt index
> c8e1740935..fdc0207cff 100644
> --- a/docs/COLO-FT.txt
> +++ b/docs/COLO-FT.txt
> @@ -10,7 +10,7 @@ See the COPYING file in the top-level directory.
>  This document gives an overview of COLO's design and how to use it.
> 
>  == Background ==
> -Virtual machine (VM) replication is a well known technique for providing
> +Virtual machine (VM) replication is a well-known technique for
> +providing
>  application-agnostic software-implemented hardware fault tolerance,  also
> known as "non-stop service".
> 
> @@ -103,7 +103,7 @@ Primary side.
> 
>  COLO Proxy:
>  Delivers packets to Primary and Secondary, and then compare the responses
> from -both side. Then decide whether to start a checkpoint according to
> some rules.
> +both sides. Then decide whether to start a checkpoint according to some
> rules.
>  Please refer to docs/colo-proxy.txt for more information.
> 
>  Note:
> @@ -146,12 +146,12 @@ in test procedure.
> 
>  == Test procedure ==
>  Note: Here we are running both instances on the same host for testing,
> -change the IP Addresses if you want to run it on two hosts. Initally
> +change the IP Addresses if you want to run it on two hosts. Initially
>  127.0.0.1 is the Primary Host and 127.0.0.2 is the Secondary Host.
> 
>  == Startup qemu ==
>  1. Primary:
> -Note: Initally, $imagefolder/primary.qcow2 needs to be copied to all hosts.
> +Note: Initially, $imagefolder/primary.qcow2 needs to be copied to all hosts.
>  You don't need to change any IP's here, because 0.0.0.0 listens on any
> interface. The chardev's with 127.0.0.1 IP's loopback to the local qemu
> instance.
> diff --git a/migration/colo.c b/migration/colo.c index
> ea7d1e9d4e..80788d46b5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -632,7 +632,7 @@ out:
>  /*
>   * It is safe to unregister notifier after failover finished.
>   * Besides, colo_delay_timer and colo_checkpoint_sem can't be
> - * released befor unregister notifier, or there will be use-after-free
> + * released before unregister notifier, or there will be
> + use-after-free
>   * error.
>   */
>  colo_compare_unregister_notifier(_compare_notifier);
> --
> 2.21.3

RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-03 Thread Zhanghailiang

> -Original Message-
> From: Zhang, Chen [mailto:chen.zh...@intel.com]
> Sent: Wednesday, June 3, 2020 5:11 PM
> To: Zhanghailiang ; Dr . David Alan
> Gilbert ; Juan Quintela ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> 
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> 
> 
> > -Original Message-
> > From: Zhanghailiang 
> > Sent: Tuesday, June 2, 2020 2:59 PM
> > To: Zhang, Chen ; Dr . David Alan Gilbert
> > ; Juan Quintela ; qemu-dev
> > 
> > Cc: Zhang Chen ; Jason Wang
> 
> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > request into one.
> >
> >
> >
> > > -Original Message-
> > > From: Zhang Chen [mailto:chen.zh...@intel.com]
> > > Sent: Friday, May 15, 2020 12:28 PM
> > > To: Dr . David Alan Gilbert ; Juan Quintela
> > > ; Zhanghailiang
> > ;
> > > qemu-dev 
> > > Cc: Zhang Chen ; Jason Wang
> > > ; Zhang Chen 
> > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> > > into one.
> > >
> > > From: Zhang Chen 
> > >
> > > When COLO guest occur issues, COLO-compare will catch lots of
> > > different network packet and trigger notification multi times, force
> > > periodic may happen at the same time. So this can be efficient merge
> > > checkpoint request within COLO_CHECKPOINT_INTERVAL.
> > >
> > > Signed-off-by: Zhang Chen 
> > > ---
> > >  migration/colo.c | 22 --
> > >  1 file changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > d5bced22cb..e6a7d8c6e2 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> > >
> > >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> > >
> > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > > +COLO_CHECKPOINT_INTERVAL 1000
> > > +
> > >  bool migration_in_colo_state(void)
> > >  {
> > >  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20
> > > @@
> > > out:
> > >  void colo_checkpoint_notify(void *opaque)  {
> > >  MigrationState *s = opaque;
> > > -int64_t next_notify_time;
> > > +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > >
> > > -qemu_sem_post(>colo_checkpoint_sem);
> > > -s->colo_checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > -next_notify_time = s->colo_checkpoint_time +
> > > -s->parameters.x_checkpoint_delay;
> > > -timer_mod(s->colo_delay_timer, next_notify_time);
> > > +/*
> > > + * When COLO guest occur issues, COLO-compare will catch lots of
> > > + * different network packet and trigger notification multi times,
> > > + * force periodic may happen at the same time. So this can be
> > > + * efficient merge checkpoint request within
> > > COLO_CHECKPOINT_INTERVAL.
> > > + */
> > > +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL)
> {
> > > +qemu_sem_post(>colo_checkpoint_sem);
> >
> > It is not right here, this notification should not be controlled by
> > the interval time, I got what happened here, when multiple checkpoint
> > requires come, this Colo_delay_time will be added every time and it
> > will be a big value which is not what we want.
> 
> Not just this, multi checkpoint will spend lots of resource to sync memory
> from PVM to SVM, It will make VM stop/start multi times, but for the results
> are same with one checkpoint.
> So in short time just need one checkpoint, because do checkpoint still need
> some time...
> 

Yes, this because we use semaphore here, it will be increased multiple times,
And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore' 
has fixed this problem.
Did you try the qemu upstream which has merged this patch ?

> Thanks
> Zhang Chen
> 
> >
> > Besides, please update this patch based on [PATCH 0/6] colo: migration
> > related bugfixes series which Has modified the same place.
> >
> >
> >
> > > +timer_mod(s->colo_delay_timer, now +
> > > +  s->parameters.x_checkpoint_delay);
> > > +s->colo_checkpoint_time = now;
> > > +}
> > >  }
> > >
> > >  void migrate_start_colo_process(MigrationState *s)
> > > --
> > > 2.17.1

RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-02 Thread Zhanghailiang




> -Original Message-
> From: Zhang Chen [mailto:chen.zh...@intel.com]
> Sent: Friday, May 15, 2020 12:28 PM
> To: Dr . David Alan Gilbert ; Juan Quintela
> ; Zhanghailiang ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> ; Zhang Chen 
> Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request into
> one.
> 
> From: Zhang Chen 
> 
> When COLO guest occur issues, COLO-compare will catch lots of different
> network packet and trigger notification multi times, force periodic may
> happen at the same time. So this can be efficient merge checkpoint request
> within COLO_CHECKPOINT_INTERVAL.
> 
> Signed-off-by: Zhang Chen 
> ---
>  migration/colo.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> d5bced22cb..e6a7d8c6e2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> 
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> 
> +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> +COLO_CHECKPOINT_INTERVAL 1000
> +
>  bool migration_in_colo_state(void)
>  {
>  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@
> out:
>  void colo_checkpoint_notify(void *opaque)  {
>  MigrationState *s = opaque;
> -int64_t next_notify_time;
> +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> 
> -qemu_sem_post(>colo_checkpoint_sem);
> -s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -next_notify_time = s->colo_checkpoint_time +
> -s->parameters.x_checkpoint_delay;
> -timer_mod(s->colo_delay_timer, next_notify_time);
> +/*
> + * When COLO guest occur issues, COLO-compare will catch lots of
> + * different network packet and trigger notification multi times,
> + * force periodic may happen at the same time. So this can be
> + * efficient merge checkpoint request within
> COLO_CHECKPOINT_INTERVAL.
> + */
> +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
> +qemu_sem_post(>colo_checkpoint_sem);

It is not right here, this notification should not be controlled by the 
interval time,
I got what happened here, when multiple checkpoint requires come, this
Colo_delay_time will be added every time and it will be a big value which is 
not what we want.

Besides, please update this patch based on [PATCH 0/6] colo: migration related 
bugfixes series which
Has modified the same place.



> +timer_mod(s->colo_delay_timer, now +
> +  s->parameters.x_checkpoint_delay);
> +s->colo_checkpoint_time = now;
> +}
>  }
> 
>  void migrate_start_colo_process(MigrationState *s)
> --
> 2.17.1

RE: [PATCH 2/3] migration/colo: Update checkpoint time lately

2020-06-02 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

Hmm, How much time it spends on preparing before COLO process ?

> -Original Message-
> From: Zhang Chen [mailto:chen.zh...@intel.com]
> Sent: Friday, May 15, 2020 12:28 PM
> To: Dr . David Alan Gilbert ; Juan Quintela
> ; Zhanghailiang ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> ; Zhang Chen 
> Subject: [PATCH 2/3] migration/colo: Update checkpoint time lately
> 
> From: Zhang Chen 
> 
> Previous operation(like vm_start and replication_start_all) will consume
> extra time for first forced synchronization, so reduce it in this patch.
> 
> Signed-off-by: Zhang Chen 
> ---
>  migration/colo.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> 5ef69b885d..d5bced22cb 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -531,7 +531,6 @@ static void colo_process_checkpoint(MigrationState
> *s)  {
>  QIOChannelBuffer *bioc;
>  QEMUFile *fb = NULL;
> -int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>  Error *local_err = NULL;
>  int ret;
> 
> @@ -580,8 +579,8 @@ static void colo_process_checkpoint(MigrationState
> *s)
>  qemu_mutex_unlock_iothread();
>  trace_colo_vm_state_change("stop", "run");
> 
> -timer_mod(s->colo_delay_timer,
> -current_time + s->parameters.x_checkpoint_delay);
> +timer_mod(s->colo_delay_timer,
> qemu_clock_get_ms(QEMU_CLOCK_HOST) +
> +  s->parameters.x_checkpoint_delay);
> 
>  while (s->state == MIGRATION_STATUS_COLO) {
>  if (failover_get_state() != FAILOVER_STATUS_NONE) {
> --
> 2.17.1

RE: [PATCH 1/3] migration/colo: Optimize COLO boot code path

2020-06-01 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> -Original Message-
> From: Zhang Chen [mailto:chen.zh...@intel.com]
> Sent: Friday, May 15, 2020 12:28 PM
> To: Dr . David Alan Gilbert ; Juan Quintela
> ; Zhanghailiang ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> ; Zhang Chen 
> Subject: [PATCH 1/3] migration/colo: Optimize COLO boot code path
> 
> From: Zhang Chen 
> 
> No need to reuse MIGRATION_STATUS_ACTIVE boot COLO.
> 
> Signed-off-by: Zhang Chen 
> ---
>  migration/colo.c  |  2 --
>  migration/migration.c | 17 ++---
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> d015d4f84e..5ef69b885d 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -669,8 +669,6 @@ void migrate_start_colo_process(MigrationState *s)
>  colo_checkpoint_notify, s);
> 
>  qemu_sem_init(>colo_exit_sem, 0);
> -migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
> -  MIGRATION_STATUS_COLO);
>  colo_process_checkpoint(s);
>  qemu_mutex_lock_iothread();
>  }
> diff --git a/migration/migration.c b/migration/migration.c index
> 0bb042a0f7..c889ef6eb7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2972,7 +2972,10 @@ static void
> migration_completion(MigrationState *s)
>  goto fail_invalidate;
>  }
> 
> -if (!migrate_colo_enabled()) {
> +if (migrate_colo_enabled()) {
> +migrate_set_state(>state, current_active_state,
> +  MIGRATION_STATUS_COLO);
> +} else {
>  migrate_set_state(>state, current_active_state,
>MIGRATION_STATUS_COMPLETED);
>  }
> @@ -3304,12 +3307,7 @@ static void
> migration_iteration_finish(MigrationState *s)
>  migration_calculate_complete(s);
>  runstate_set(RUN_STATE_POSTMIGRATE);
>  break;
> -
> -case MIGRATION_STATUS_ACTIVE:
> -/*
> - * We should really assert here, but since it's during
> - * migration, let's try to reduce the usage of assertions.
> - */
> +case MIGRATION_STATUS_COLO:
>  if (!migrate_colo_enabled()) {
>  error_report("%s: critical error: calling COLO code without "
>   "COLO enabled", __func__); @@ -3319,6
> +3317,11 @@ static void migration_iteration_finish(MigrationState *s)
>   * Fixme: we will run VM in COLO no matter its old running state.
>   * After exited COLO, we will keep running.
>   */
> +case MIGRATION_STATUS_ACTIVE:
> +/*
> + * We should really assert here, but since it's during
> + * migration, let's try to reduce the usage of assertions.
> + */
>  s->vm_was_running = true;
>  /* Fallthrough */
>  case MIGRATION_STATUS_FAILED:
> --
> 2.17.1

RE: About migration/colo issue

2020-05-15 Thread Zhanghailiang

Hi,

I can't reproduce this issue with the qemu upstream either,
It works well.

Did you use an old version ?

Thanks,
Hailiang


> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Friday, May 15, 2020 3:12 PM
> To: Zhang, Chen 
> Cc: Zhanghailiang ; Dr . David Alan
> Gilbert ; qemu-devel ; Li
> Zhijian ; Jason Wang 
> Subject: Re: About migration/colo issue
> 
> On Fri, 15 May 2020 03:16:18 +
> "Zhang, Chen"  wrote:
> 
> > Hi Hailiang/Dave.
> >
> > I found a urgent problem in current upstream code, COLO will stuck on
> secondary checkpoint and later.
> > The guest will stuck by this issue.
> > I have bisect upstream code, this issue caused by Hailiang's optimize patch:
> 
> Hmm, I'm on v5.0.0 (where that commit is in) and I don't have this issue in
> my testing.
> 
> Regards,
> Lukas Straub
> 
> > From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17
> 00:00:00 2001
> > From: zhanghailiang 
> > Date: Mon, 24 Feb 2020 14:54:10 +0800
> > Subject: [PATCH] COLO: Optimize memory back-up process
> >
> > This patch will reduce the downtime of VM for the initial process,
> > Previously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: zhanghailiang 
> > Message-Id:
> <20200224065414.36524-5-zhang.zhanghaili...@huawei.com>
> > Signed-off-by: Dr. David Alan Gilbert 
> >   minor typo fixes
> >
> > Hailiang, do you have time to look into it?
> >
> > ...

RE: [PATCH 4/6] migration/colo.c: Relaunch failover even if there was an error

2020-05-15 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Monday, May 11, 2020 7:11 PM
> To: qemu-devel 
> Cc: Zhanghailiang ; Juan Quintela
> ; Dr. David Alan Gilbert 
> Subject: [PATCH 4/6] migration/colo.c: Relaunch failover even if there was an
> error
> 
> If vmstate_loading is true, secondary_vm_do_failover will set failover status
> to FAILOVER_STATUS_RELAUNCH and return success without initiating
> failover. However, if there is an error during the vmstate_loading section,
> failover isn't relaunched. Instead we then wait for failover on
> colo_incoming_sem.
> 
> Fix this by relaunching failover even if there was an error. Also, to make 
> this
> work properly, set vmstate_loading to false when returning during the
> vmstate_loading section.
> 
> Signed-off-by: Lukas Straub 
> ---
>  migration/colo.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> 2947363ae5..a69782efc5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -743,6 +743,7 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  ret = qemu_load_device_state(fb);
>  if (ret < 0) {
>  error_setg(errp, "COLO: load device state failed");
> +vmstate_loading = false;
>  qemu_mutex_unlock_iothread();
>  return;
>  }
> @@ -751,6 +752,7 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  replication_get_error_all(_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> +vmstate_loading = false;
>  qemu_mutex_unlock_iothread();
>  return;
>  }
> @@ -759,6 +761,7 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  replication_do_checkpoint_all(_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> +vmstate_loading = false;
>  qemu_mutex_unlock_iothread();
>  return;
>  }
> @@ -770,6 +773,7 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
> 
>  if (local_err) {
>  error_propagate(errp, local_err);
> +vmstate_loading = false;
>  qemu_mutex_unlock_iothread();
>  return;
>  }
> @@ -780,9 +784,6 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  qemu_mutex_unlock_iothread();
> 
>  if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
> -failover_set_state(FAILOVER_STATUS_RELAUNCH,
> -FAILOVER_STATUS_NONE);
> -failover_request_active(NULL);
>  return;
>  }
> 
> @@ -881,6 +882,14 @@ void *colo_process_incoming_thread(void
> *opaque)
>  error_report_err(local_err);
>  break;
>  }
> +
> +if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
> +failover_set_state(FAILOVER_STATUS_RELAUNCH,
> +FAILOVER_STATUS_NONE);
> +failover_request_active(NULL);
> +break;
> +}
> +
>  if (failover_get_state() != FAILOVER_STATUS_NONE) {
>  error_report("failover request");
>  break;
> @@ -888,8 +897,6 @@ void *colo_process_incoming_thread(void *opaque)
>  }
> 
>  out:
> -vmstate_loading = false;
> -
>  /*
>   * There are only two reasons we can get here, some error happened
>   * or the user triggered failover.
> --
> 2.20.1

RE: About migration/colo issue

2020-05-14 Thread Zhanghailiang

Hi Zhang Chen,

>From your tracing log, it seems to be hanged in colo_flush_ram_cache()?
Does it come across a dead loop there ?
I'll test it by using the new qemu.

Thanks,
Hailiang

From: Zhang, Chen [mailto:chen.zh...@intel.com]
Sent: Friday, May 15, 2020 11:16 AM
To: Zhanghailiang ; Dr . David Alan Gilbert 
; qemu-devel ; Li Zhijian 

Cc: Jason Wang ; Lukas Straub 
Subject: About migration/colo issue

Hi Hailiang/Dave.

I found a urgent problem in current upstream code, COLO will stuck on secondary 
checkpoint and later.
The guest will stuck by this issue.
I have bisect upstream code, this issue caused by Hailiang's optimize patch:

>From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
From: zhanghailiang 
mailto:zhang.zhanghaili...@huawei.com>>
Date: Mon, 24 Feb 2020 14:54:10 +0800
Subject: [PATCH] COLO: Optimize memory back-up process

This patch will reduce the downtime of VM for the initial process,
Previously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: zhanghailiang 
mailto:zhang.zhanghaili...@huawei.com>>
Message-Id: 
<20200224065414.36524-5-zhang.zhanghaili...@huawei.com<mailto:20200224065414.36524-5-zhang.zhanghaili...@huawei.com>>
Signed-off-by: Dr. David Alan Gilbert 
mailto:dgilb...@redhat.com>>
  minor typo fixes

Hailiang, do you have time to look into it?


The detail log:
Primary node:
13322@1589511271.917346:colo_receive_message<mailto:13322@1589511271.917346:colo_receive_message>
 Receive 'checkpoint-ready' message
{"timestamp": {"seconds": 1589511271, "microseconds": 917406}, "event": 
"RESUME"}
13322@1589511271.917842:colo_vm_state_change<mailto:13322@1589511271.917842:colo_vm_state_change>
 Change 'stop' => 'run'
13322@1589511291.243346:colo_send_message<mailto:13322@1589511291.243346:colo_send_message>
 Send 'checkpoint-request' message
13322@1589511291.243978:colo_receive_message<mailto:13322@1589511291.243978:colo_receive_message>
 Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511291, "microseconds": 244096}, "event": "STOP"}
13322@1589511291.24:colo_vm_state_change<mailto:13322@1589511291.24:colo_vm_state_change>
 Change 'run' => 'stop'
13322@1589511291.244561:colo_send_message<mailto:13322@1589511291.244561:colo_send_message>
 Send 'vmstate-send' message
13322@1589511291.258594:colo_send_message<mailto:13322@1589511291.258594:colo_send_message>
 Send 'vmstate-size' message
13322@1589511305.412479:colo_receive_message<mailto:13322@1589511305.412479:colo_receive_message>
 Receive 'vmstate-received' message
13322@1589511309.031826:colo_receive_message<mailto:13322@1589511309.031826:colo_receive_message>
 Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1589511309, "microseconds": 31862}, "event": "RESUME"}
13322@1589511309.033075:colo_vm_state_change<mailto:13322@1589511309.033075:colo_vm_state_change>
 Change 'stop' => 'run'
{"timestamp": {"seconds": 1589511311, "microseconds": 111617}, "event": 
"VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", 
"service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": 
"ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
{"timestamp": {"seconds": 1589511311, "microseconds": 116197}, "event": 
"VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", 
"service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": 
"ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
13322@1589511311.243271:colo_send_message<mailto:13322@1589511311.243271:colo_send_message>
 Send 'checkpoint-request' message
13322@1589511311.351361:colo_receive_message<mailto:13322@1589511311.351361:colo_receive_message>
 Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511311, "microseconds": 351439}, "event": "STOP"}
13322@1589511311.415779:colo_vm_state_change<mailto:13322@1589511311.4

RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place

2020-05-14 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Monday, May 11, 2020 7:11 PM
> To: qemu-devel 
> Cc: Zhanghailiang ; Juan Quintela
> ; Dr. David Alan Gilbert 
> Subject: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event
> to the right place
> 
> If the secondary has to failover during checkpointing, it still is in the old 
> state
> (i.e. different state than primary). Thus we can't expose the primary state
> until after the checkpoint is sent.
> 
> This fixes sporadic connection reset of client connections during failover.
> 
> Signed-off-by: Lukas Straub 
> ---
>  migration/colo.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> a69782efc5..a3fc21e86e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -430,12 +430,6 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  goto out;
>  }
> 
> -qemu_event_reset(>colo_checkpoint_event);
> -colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> _err);
> -if (local_err) {
> -goto out;
> -}
> -
>  /* Disable block migration */
>  migrate_set_block_enabled(false, _err);
>  qemu_mutex_lock_iothread();
> @@ -494,6 +488,12 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  goto out;
>  }
> 
> +qemu_event_reset(>colo_checkpoint_event);
> +colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> _err);
> +if (local_err) {
> +goto out;
> +}
> +
>  colo_receive_check_message(s->rp_state.from_dst_file,
> COLO_MESSAGE_VMSTATE_LOADED, _err);
>  if (local_err) {
> --
> 2.20.1

RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place

2020-05-14 Thread Zhanghailiang

> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Thursday, May 14, 2020 10:31 PM
> To: Zhanghailiang 
> Cc: qemu-devel ; Zhang Chen
> ; Juan Quintela ; Dr. David
> Alan Gilbert 
> Subject: Re: [PATCH 6/6] migration/colo.c: Move
> colo_notify_compares_event to the right place
> 
> On Thu, 14 May 2020 13:27:30 +
> Zhanghailiang  wrote:
> 
> > Cc: Zhang Chen 
> >
> > >
> > > If the secondary has to failover during checkpointing, it still is
> > > in the old state (i.e. different state than primary). Thus we can't
> > > expose the primary state until after the checkpoint is sent.
> > >
> >
> > Hmm, do you mean we should not flush the net packages to client
> > connection until checkpointing Process almost success because it may fail
> during checkpointing ?
> 
> No.
> If the primary fails/crashes during checkpointing, the secondary is still in
> different state than the primary because it didn't receive the full 
> checkpoint.
> We can release the miscompared packets only after both primary and
> secondary are in the same state.
> 
> Example:
> 1. Client opens a TCP connection, sends SYN.
> 2. Primary accepts the connection with SYN-ACK, but due to
> nondeterministic execution the secondary is delayed.
> 3. Checkpoint happens, primary releases the SYN-ACK packet but then
> crashes while sending the checkpoint.
> 4. The Secondary fails over. At this point it is still in the old state where 
> it
> hasn't sent the SYN-ACK packet.
> 5. The client responds with ACK to the SYN-ACK packet.
> 6. Because it doesn't know the connection, the secondary responds with RST,
> connection reset.
> 

Good example. For this patch, it is OK, I will add reviewed-by in your origin 
patch.


> Regards,
> Lukas Straub
> 
> > > This fixes sporadic connection reset of client connections during 
> > > failover.
> > >
> > > Signed-off-by: Lukas Straub 
> > > ---
> > >  migration/colo.c | 12 ++--
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > a69782efc5..a3fc21e86e 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -430,12 +430,6 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > >  goto out;
> > >  }
> > >
> > > -qemu_event_reset(>colo_checkpoint_event);
> > > -colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > _err);
> > > -if (local_err) {
> > > -goto out;
> > > -}
> > > -
> > >  /* Disable block migration */
> > >  migrate_set_block_enabled(false, _err);
> > >  qemu_mutex_lock_iothread();
> > > @@ -494,6 +488,12 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > >  goto out;
> > >  }
> > >
> > > +qemu_event_reset(>colo_checkpoint_event);
> > > +colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > _err);
> > > +if (local_err) {
> > > +goto out;
> > > +}
> > > +
> > >  colo_receive_check_message(s->rp_state.from_dst_file,
> > > COLO_MESSAGE_VMSTATE_LOADED,
> _err);
> > >  if (local_err) {
> > > --
> > > 2.20.1

答复: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place

2020-05-14 Thread Zhanghailiang

Cc: Zhang Chen 

> 
> If the secondary has to failover during checkpointing, it still is in the old 
> state
> (i.e. different state than primary). Thus we can't expose the primary state
> until after the checkpoint is sent.
> 

Hmm, do you mean we should not flush the net packages to client connection 
until checkpointing
Process almost success because it may fail during checkpointing ?

> This fixes sporadic connection reset of client connections during failover.
> 
> Signed-off-by: Lukas Straub 
> ---
>  migration/colo.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> a69782efc5..a3fc21e86e 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -430,12 +430,6 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  goto out;
>  }
> 
> -qemu_event_reset(>colo_checkpoint_event);
> -colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> _err);
> -if (local_err) {
> -goto out;
> -}
> -
>  /* Disable block migration */
>  migrate_set_block_enabled(false, _err);
>  qemu_mutex_lock_iothread();
> @@ -494,6 +488,12 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  goto out;
>  }
> 
> +qemu_event_reset(>colo_checkpoint_event);
> +colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> _err);
> +if (local_err) {
> +goto out;
> +}
> +
>  colo_receive_check_message(s->rp_state.from_dst_file,
> COLO_MESSAGE_VMSTATE_LOADED, _err);
>  if (local_err) {
> --
> 2.20.1

答复: [PATCH 5/6] migration/qemu-file.c: Don't ratelimit a shutdown fd

2020-05-14 Thread Zhanghailiang

> This causes the migration thread to hang if we failover during checkpoint. A
> shutdown fd won't cause network traffic anyway.
> 

I'm not quite sure if this modification can take side effect on normal 
migration process or not,
There are several places calling it.

Maybe Juan and Dave can help ;)

> Signed-off-by: Lukas Straub 
> ---
>  migration/qemu-file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c index
> 1c3a358a14..0748b5810f 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -660,7 +660,7 @@ int64_t qemu_ftell(QEMUFile *f)  int
> qemu_file_rate_limit(QEMUFile *f)  {
>  if (f->shutdown) {
> -return 1;
> +return 0;
>  }
>  if (qemu_file_get_error(f)) {
>  return 1;
> --
> 2.20.1

答复: [PATCH 3/6] migration/colo.c: Flush ram cache only after receiving device state

2020-05-14 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> 
> If we suceed in receiving ram state, but fail receiving the device state, 
> there
> will be a mismatch between the two.
> 
> Fix this by flushing the ram cache only after the vmstate has been received.
> 
> Signed-off-by: Lukas Straub 
> ---
>  migration/colo.c | 1 +
>  migration/ram.c  | 5 +
>  migration/ram.h  | 1 +
>  3 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> 6b2ad35aa4..2947363ae5 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -739,6 +739,7 @@ static void
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
> 
>  qemu_mutex_lock_iothread();
>  vmstate_loading = true;
> +colo_flush_ram_cache();
>  ret = qemu_load_device_state(fb);
>  if (ret < 0) {
>  error_setg(errp, "COLO: load device state failed"); diff --git
> a/migration/ram.c b/migration/ram.c index 04f13feb2e..5baec5fce9 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3313,7 +3313,7 @@ static bool postcopy_is_running(void)
>   * Flush content of RAM cache into SVM's memory.
>   * Only flush the pages that be dirtied by PVM or SVM or both.
>   */
> -static void colo_flush_ram_cache(void)
> +void colo_flush_ram_cache(void)
>  {
>  RAMBlock *block = NULL;
>  void *dst_host;
> @@ -3585,9 +3585,6 @@ static int ram_load(QEMUFile *f, void *opaque,
> int version_id)
>  }
>  trace_ram_load_complete(ret, seq_iter);
> 
> -if (!ret  && migration_incoming_in_colo_state()) {
> -colo_flush_ram_cache();
> -}
>  return ret;
>  }
> 
> diff --git a/migration/ram.h b/migration/ram.h index 5ceaff7cb4..2eeaacfa13
> 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -65,6 +65,7 @@ int ram_dirty_bitmap_reload(MigrationState *s,
> RAMBlock *rb);
> 
>  /* ram cache */
>  int colo_init_ram_cache(void);
> +void colo_flush_ram_cache(void);
>  void colo_release_ram_cache(void);
>  void colo_incoming_start_dirty_log(void);
> 
> --
> 2.20.1

答复: [PATCH 1/6] migration/colo.c: Use event instead of semaphore

2020-05-13 Thread Zhanghailiang

> If multiple packets miscompare in a short timeframe, the semaphore value
> will be increased multiple times. This causes multiple checkpoints even if one
> would be sufficient.
> 

You right, good catch ;)

Reviewed-by: zhanghailiang 

> Fix this by using a event instead of a semaphore for triggering checkpoints.
> Now, checkpoint requests will be ignored until the checkpoint event is sent
> to colo-compare (which releases the miscompared packets).
> 
> Benchmark results (iperf3):
> Client-to-server tcp:
> without patch: ~66 Mbit/s
> with patch: ~61 Mbit/s
> Server-to-client tcp:
> without patch: ~702 Kbit/s
> with patch: ~16 Mbit/s
> 
> Signed-off-by: Lukas Straub 
> ---
>  migration/colo.c  | 9 +
>  migration/migration.h | 4 ++--
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> a54ac84f41..09168627bc 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -430,6 +430,7 @@ static int
> colo_do_checkpoint_transaction(MigrationState *s,
>  goto out;
>  }
> 
> +qemu_event_reset(>colo_checkpoint_event);
>  colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> _err);
>  if (local_err) {
>  goto out;
> @@ -580,7 +581,7 @@ static void colo_process_checkpoint(MigrationState
> *s)
>  goto out;
>  }
> 
> -qemu_sem_wait(>colo_checkpoint_sem);
> +qemu_event_wait(>colo_checkpoint_event);
> 
>  if (s->state != MIGRATION_STATUS_COLO) {
>  goto out;
> @@ -628,7 +629,7 @@ out:
>  colo_compare_unregister_notifier(_compare_notifier);
>  timer_del(s->colo_delay_timer);
>  timer_free(s->colo_delay_timer);
> -qemu_sem_destroy(>colo_checkpoint_sem);
> +qemu_event_destroy(>colo_checkpoint_event);
> 
>  /*
>   * Must be called after failover BH is completed, @@ -645,7 +646,7
> @@ void colo_checkpoint_notify(void *opaque)
>  MigrationState *s = opaque;
>  int64_t next_notify_time;
> 
> -qemu_sem_post(>colo_checkpoint_sem);
> +qemu_event_set(>colo_checkpoint_event);
>  s->colo_checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
>  next_notify_time = s->colo_checkpoint_time +
>  s->parameters.x_checkpoint_delay; @@ -655,7
> +656,7 @@ void colo_checkpoint_notify(void *opaque)  void
> migrate_start_colo_process(MigrationState *s)  {
>  qemu_mutex_unlock_iothread();
> -qemu_sem_init(>colo_checkpoint_sem, 0);
> +qemu_event_init(>colo_checkpoint_event, false);
>  s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
>  colo_checkpoint_notify, s);
> 
> diff --git a/migration/migration.h b/migration/migration.h index
> 507284e563..f617960522 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -215,8 +215,8 @@ struct MigrationState
>  /* The semaphore is used to notify COLO thread that failover is finished
> */
>  QemuSemaphore colo_exit_sem;
> 
> -/* The semaphore is used to notify COLO thread to do checkpoint */
> -QemuSemaphore colo_checkpoint_sem;
> +/* The event is used to notify COLO thread to do checkpoint */
> +QemuEvent colo_checkpoint_event;
>  int64_t colo_checkpoint_time;
>  QEMUTimer *colo_delay_timer;
> 
> --
> 2.20.1

RE: [PATCH 11/11] migration/colo: Fix qmp_xen_colo_do_checkpoint() error handling

2020-04-20 Thread Zhanghailiang

Reviewed-by: zhanghailiang 

> -Original Message-
> From: Markus Armbruster [mailto:arm...@redhat.com]
> Sent: Monday, April 20, 2020 4:33 PM
> To: qemu-devel@nongnu.org
> Cc: Zhang Chen ; Zhanghailiang
> 
> Subject: [PATCH 11/11] migration/colo: Fix qmp_xen_colo_do_checkpoint()
> error handling
> 
> The Error ** argument must be NULL, _abort, _fatal, or a
> pointer to a variable containing NULL.  Passing an argument of the
> latter kind twice without clearing it in between is wrong: if the
> first call sets an error, it no longer points to NULL for the second
> call.
> 
> qmp_xen_colo_do_checkpoint() passes @errp first to
> replication_do_checkpoint_all(), and then to
> colo_notify_filters_event().  If both fail, this will trip the
> assertion in error_setv().
> 
> Similar code in secondary_vm_do_failover() calls
> colo_notify_filters_event() only after replication_do_checkpoint_all()
> succeeded.  Do the same here.
> 
> Fixes: 0e8818f023616677416840d6ddc880db8de3c967
> Cc: Zhang Chen 
> Cc: zhanghailiang 
> Signed-off-by: Markus Armbruster 
> ---
>  migration/colo.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index a54ac84f41..1b3493729b 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -263,7 +263,13 @@ ReplicationStatus
> *qmp_query_xen_replication_status(Error **errp)
> 
>  void qmp_xen_colo_do_checkpoint(Error **errp)
>  {
> -replication_do_checkpoint_all(errp);
> +Error *err = NULL;
> +
> +replication_do_checkpoint_all();
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
>  /* Notify all filters of all NIC to do checkpoint */
>  colo_notify_filters_event(COLO_EVENT_CHECKPOINT, errp);
>  }
> --
> 2.21.1

答复: colo: qemu 4.2.0 vs. qemu 5.0.0-rc2 performance regression

2020-04-12 Thread Zhanghailiang

Hi,

This patch " COLO: Optimize memory back-up process " should only affects VM's 
migration process before COLO compare starting to work.
Have you tried to revert this patch to see if it affects your tests ?

For memory size we used for secondary qemu, we only need a backup of VM's ram, 
so it should be double amount.


Thanks,
Hailiang


-邮件原件-
发件人: Lukas Straub [mailto:lukasstra...@web.de] 
发送时间: 2020年4月12日 1:17
收件人: qemu-devel@nongnu.org
抄送: dgilb...@redhat.com; quint...@redhat.com; Zhanghailiang 
; Zhang Chen 
主题: colo: qemu 4.2.0 vs. qemu 5.0.0-rc2 performance regression

Hello Everyone,
I did some Benchmarking with iperf3 and memtester (to dirty some guest memory) 
of colo performance in qemu 4.2.0 and in qemu 5.0.0-rc2 with my bugfixes on 
top.( https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg01432.html )

I have taken the average over 4 runs.
Client-to-server tcp bandwidth rose slightly from ~83.98 Mbit/s to ~89.40 Mbits.
Server-to-client tcp bandwidth fell from ~9.73 Mbit/s to ~1.79 Mbit/s.
Client-to-server udp bandwidth stayed the same at 1.05 Mbit/s and jitter rose 
from ~5.12 ms to ~10.77 ms.
Server-to-client udp bandwidth fell from ~380.5 Kbit/s to ~33.6 Kbit/s and 
jitter rose from ~41.74 ms to ~83976.15 ms (!).

I haven't looked closely into it, but i think
0393031a16735835a441b6d6e0495a1bd14adb90 "COLO: Optimize memory back-up process"
is the culprint as it reduces vm downtime for the checkpoints but increases the 
overall checkpoint time and we can only release miscompared primary packets 
after the checkpoint is completely finished.

Another thing that I noticed: With 4.2.0, the secondary qemu uses thrice the 
amount of gest memory. With 5.0.0-rc2 it's just double the amount of guest 
memory. So maybe the ram cache isn't working properly?

Regards,
Lukas Straub

RE: [PATCH V2 4/8] COLO: Optimize memory back-up process

2020-02-24 Thread Zhanghailiang

Hi,


> -Original Message-
> From: Daniel Cho [mailto:daniel...@qnap.com]
> Sent: Tuesday, February 25, 2020 10:53 AM
> To: Zhanghailiang 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; Dr. David Alan Gilbert
> 
> Subject: Re: [PATCH V2 4/8] COLO: Optimize memory back-up process
> 
> Hi Hailiang,
> 
> With version 2, the code in migration/ram.c
> 
> +if (migration_incoming_colo_enabled()) {
> +if (migration_incoming_in_colo_state()) {
> +/* In COLO stage, put all pages into cache
> temporarily */
> +host = colo_cache_from_block_offset(block, addr);
> +} else {
> +   /*
> +* In migration stage but before COLO stage,
> +* Put all pages into both cache and SVM's memory.
> +*/
> +host_bak = colo_cache_from_block_offset(block,
> addr);
> +}
>  }
>  if (!host) {
>  error_report("Illegal RAM offset " RAM_ADDR_FMT,
> addr);
>  ret = -EINVAL;
>  break;
>  }
> 
> host = colo_cache_from_block_offset(block, addr);
> host_bak = colo_cache_from_block_offset(block, addr);
> Does it cause the "if(!host)" will go break if the condition goes
> "host_bak = colo_cache_from_block_offset(block, addr);" ?
> 

That will not happen, you may have missed this parts.

@@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
+host = host_from_ram_block_offset(block, addr);
 /*

We have given host a value unconditionally.


> Best regards,
> Daniel Cho
> 
> zhanghailiang  於 2020年2月24日 週
> 一 下午2:55寫道：
> >
> > This patch will reduce the downtime of VM for the initial process,
> > Privously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: zhanghailiang 
> > ---
> >  migration/colo.c |  3 +++
> >  migration/ram.c  | 68
> +++-
> >  migration/ram.h  |  1 +
> >  3 files changed, 54 insertions(+), 18 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index 93c5a452fb..44942c4e23 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -26,6 +26,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "migration/failover.h"
> > +#include "migration/ram.h"
> >  #ifdef CONFIG_REPLICATION
> >  #include "replication.h"
> >  #endif
> > @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void
> *opaque)
> >   */
> >  qemu_file_set_blocking(mis->from_src_file, true);
> >
> > +colo_incoming_start_dirty_log();
> > +
> >  bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
> >  fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
> >  object_unref(OBJECT(bioc));
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ed23ed1c7c..ebf9e6ba51 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
> >   * dirty_memory[DIRTY_MEMORY_MIGRATION] don't
> include the whole
> >   * guest memory.
> >   */
> > +
> >  block->bmap = bitmap_new(pages);
> >  bitmap_set(block->bmap, 0, pages);
> >  block->clear_bmap_shift = shift;
> > @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
> >  }
> >  return -errno;
> >  }
> > -memcpy(block->colo_cache, block->host,
> block->used_length);
> >  }
> >  }
> >
> > @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
> >
> >  RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> >  unsigned long pages = block->max_length >>
> TARGET_PAGE_BITS;
> > -
> >  block->bmap = bitmap_new(pages);
> > -bitmap_set(

RE: [PATCH V2 7/8] COLO: Migrate dirty pages during the gap of checkpointing

2020-02-24 Thread Zhanghailiang




> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Monday, February 24, 2020 11:19 PM
> To: Zhanghailiang ;
> qemu-devel@nongnu.org
> Cc: daniel...@qnap.com; dgilb...@redhat.com; quint...@redhat.com
> Subject: Re: [PATCH V2 7/8] COLO: Migrate dirty pages during the gap of
> checkpointing
> 
> On 2/24/20 12:54 AM, zhanghailiang wrote:
> > We can migrate some dirty pages during the gap of checkpointing,
> > by this way, we can reduce the amount of ram migrated during
> checkpointing.
> >
> > Signed-off-by: zhanghailiang 
> > ---
> 
> > +++ b/qapi/migration.json
> > @@ -977,12 +977,14 @@
> >   #
> >   # @vmstate-loaded: VM's state has been loaded by SVM.
> >   #
> > +# @migrate-ram-background: Send some dirty pages during the gap of
> COLO checkpoint
> 
> Missing a '(since 5.0)' tag.
> 

OK, will add this in next version, I forgot to modify it in this version which 
you reminded
In previous version. :(

> > +#
> >   # Since: 2.8
> >   ##
> >   { 'enum': 'COLOMessage',
> > 'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> >   'vmstate-send', 'vmstate-size', 'vmstate-received',
> > -'vmstate-loaded' ] }
> > +'vmstate-loaded', 'migrate-ram-background' ] }
> >
> >   ##
> >   # @COLOMode:
> >
> 
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org

RE: The issues about architecture of the COLO checkpoint

2020-02-23 Thread Zhanghailiang

Hi Daniel,

I have fixed this problem, and send V2, please refer to that series.

Thanks,

From: Daniel Cho [mailto:daniel...@qnap.com]
Sent: Thursday, February 20, 2020 11:52 AM
To: Zhang, Chen 
Cc: Dr. David Alan Gilbert ; Zhanghailiang 
; qemu-devel@nongnu.org; Jason Wang 

Subject: Re: The issues about architecture of the COLO checkpoint

Hi Hailiang,

I have already patched the file to my branch, but there is a problem while 
doing migration.
Here is the error message from SVM
"qemu-system-x86_64: /root/download/qemu-4.1.0/memory.c:1079: 
memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()' 
failed."

Do you have this problem?

Best regards,
Daniel Cho

Daniel Cho mailto:daniel...@qnap.com>> 於 2020年2月20日 週四 
上午11:49寫道：
Hi Zhang,

Thanks, I will configure on code for testing first.
However, if you have free time, could you please send the patch file to us, 
Thanks.

Best Regard,
Daniel Cho

Zhang, Chen mailto:chen.zh...@intel.com>> 於 2020年2月20日 週四 
上午11:07寫道：

On 2/18/2020 5:22 PM, Daniel Cho wrote:
Hi Hailiang,
Thanks for your help. If we have any problems we will contact you for your 
favor.

Hi Zhang,

" If colo-compare got a primary packet without related secondary packet in a 
certain time , it will automatically trigger checkpoint.  "
As you said, the colo-compare will trigger checkpoint, but does it need to 
limit checkpoint times?
There is a problem about doing many checkpoints while we use fio to random 
write files. Then it will cause low throughput on PVM.
Is this situation is normal on COLO?

Hi Daniel,

The checkpoint time is designed to be user adjustable based on user 
environment(workload/network status/business conditions...).

In net/colo-compare.c

/* TODO: Should be configurable */
#define REGULAR_PACKET_CHECK_MS 3000

If you need, I can send a patch for this issue. Make users can change the value 
by QMP and qemu monitor commands.

Thanks

Zhang Chen

Best regards,
Daniel Cho

Zhang, Chen mailto:chen.zh...@intel.com>> 於 2020年2月17日 週一 
下午1:36寫道：

On 2/15/2020 11:35 AM, Daniel Cho wrote:
Hi Dave,

Yes, I agree with you, it does need a timeout.

Hi Daniel and Dave,

Current colo-compare already have the timeout mechanism.

Named packet_check_timer,  It will scan primary packet queue to make sure all 
the primary packet not stay too long time.

If colo-compare got a primary packet without related secondary packet in a 
certain time , it will automatic trigger checkpoint.

https://github.com/qemu/qemu/blob/master/net/colo-compare.c#L847

Thanks

Zhang Chen

Hi Hailiang,

We base on qemu-4.1.0 for using COLO feature, in your patch, we found a lot of 
difference  between your version and ours.
Could you give us a latest release version which is close your developing code?

Thanks.

Regards
Daniel Cho

Dr. David Alan Gilbert mailto:dgilb...@redhat.com>> 於 
2020年2月13日 週四 下午6:38寫道：
* Daniel Cho (daniel...@qnap.com<mailto:daniel...@qnap.com>) wrote:
> Hi Hailiang,
>
> 1.
> OK, we will try the patch
> “0001-COLO-Optimize-memory-back-up-process.patch”,
> and thanks for your help.
>
> 2.
> We understand the reason to compare PVM and SVM's packet. However, the
> empty of SVM's packet queue might happened on setting COLO feature and SVM
> broken.
>
> On situation 1 ( setting COLO feature ):
> We could force do checkpoint after setting COLO feature finish, then it
> will protect the state of PVM and SVM . As the Zhang Chen said.
>
> On situation 2 ( SVM broken ):
> COLO will do failover for PVM, so it might not cause any wrong on PVM.
>
> However, those situations are our views, so there might be a big difference
> between reality and our views.
> If we have any wrong views and opinions, please let us know, and correct
> us.

It does need a timeout; the SVM being broken or being in a state where
it never sends the corresponding packet (because of a state difference)
can happen and COLO needs to timeout when the packet hasn't arrived
after a while and trigger the checkpoint.

Dave

> Thanks.
>
> Best regards,
> Daniel Cho
>
> Zhang, Chen mailto:chen.zh...@intel.com>> 於 2020年2月13日 
> 週四 上午10:17寫道：
>
> > Add cc Jason Wang, he is a network expert.
> >
> > In case some network things goes wrong.
> >
> >
> >
> > Thanks
> >
> > Zhang Chen
> >
> >
> >
> > *From:* Zhang, Chen
> > *Sent:* Thursday, February 13, 2020 10:10 AM
> > *To:* 'Zhanghailiang' 
> > mailto:zhang.zhanghaili...@huawei.com>>; 
> > Daniel Cho <
> > daniel...@qnap.com<mailto:daniel...@qnap.com>>
> > *Cc:* Dr. David Alan Gilbert 
> > mailto:dgilb...@redhat.com>>; 
> > qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>
> > *Subject:* RE: The issues about architecture of the

[PATCH V2 7/8] COLO: Migrate dirty pages during the gap of checkpointing

2020-02-23 Thread zhanghailiang

We can migrate some dirty pages during the gap of checkpointing,
by this way, we can reduce the amount of ram migrated during checkpointing.

Signed-off-by: zhanghailiang 
---
 migration/colo.c   | 73 --
 migration/migration.h  |  1 +
 migration/trace-events |  1 +
 qapi/migration.json|  4 ++-
 4 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 44942c4e23..c36d94072f 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -47,6 +47,13 @@ static COLOMode last_colo_mode;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+#define DEFAULT_RAM_PENDING_CHECK 1000
+
+/* should be calculated by bandwidth and max downtime ? */
+#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)
+
+static int checkpoint_request;
+
 bool migration_in_colo_state(void)
 {
 MigrationState *s = migrate_get_current();
@@ -517,6 +524,20 @@ static void colo_compare_notify_checkpoint(Notifier 
*notifier, void *data)
 colo_checkpoint_notify(data);
 }
 
+static bool colo_need_migrate_ram_background(MigrationState *s)
+{
+uint64_t pending_size, pend_pre, pend_compat, pend_post;
+int64_t max_size = THRESHOLD_PENDING_SIZE;
+
+qemu_savevm_state_pending(s->to_dst_file, max_size, _pre,
+  _compat, _post);
+pending_size = pend_pre + pend_compat + pend_post;
+
+trace_colo_need_migrate_ram_background(pending_size);
+return (pending_size >= max_size);
+}
+
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -572,6 +593,8 @@ static void colo_process_checkpoint(MigrationState *s)
 
 timer_mod(s->colo_delay_timer,
 current_time + s->parameters.x_checkpoint_delay);
+timer_mod(s->pending_ram_check_timer,
+current_time + DEFAULT_RAM_PENDING_CHECK);
 
 while (s->state == MIGRATION_STATUS_COLO) {
 if (failover_get_state() != FAILOVER_STATUS_NONE) {
@@ -584,9 +607,30 @@ static void colo_process_checkpoint(MigrationState *s)
 if (s->state != MIGRATION_STATUS_COLO) {
 goto out;
 }
-ret = colo_do_checkpoint_transaction(s, bioc, fb);
-if (ret < 0) {
-goto out;
+if (atomic_xchg(_request, 0)) {
+/* start a colo checkpoint */
+ret = colo_do_checkpoint_transaction(s, bioc, fb);
+if (ret < 0) {
+goto out;
+}
+} else {
+if (colo_need_migrate_ram_background(s)) {
+colo_send_message(s->to_dst_file,
+  COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
+  _err);
+if (local_err) {
+goto out;
+}
+
+qemu_savevm_state_iterate(s->to_dst_file, false);
+qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_setg_errno(_err, -ret,
+"Failed to send dirty pages backgroud");
+goto out;
+}
+}
 }
 }
 
@@ -627,6 +671,8 @@ out:
 colo_compare_unregister_notifier(_compare_notifier);
 timer_del(s->colo_delay_timer);
 timer_free(s->colo_delay_timer);
+timer_del(s->pending_ram_check_timer);
+timer_free(s->pending_ram_check_timer);
 qemu_sem_destroy(>colo_checkpoint_sem);
 
 /*
@@ -644,6 +690,7 @@ void colo_checkpoint_notify(void *opaque)
 MigrationState *s = opaque;
 int64_t next_notify_time;
 
+atomic_inc(_request);
 qemu_sem_post(>colo_checkpoint_sem);
 s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 next_notify_time = s->colo_checkpoint_time +
@@ -651,6 +698,19 @@ void colo_checkpoint_notify(void *opaque)
 timer_mod(s->colo_delay_timer, next_notify_time);
 }
 
+static void colo_pending_ram_check_notify(void *opaque)
+{
+int64_t next_notify_time;
+MigrationState *s = opaque;
+
+if (migration_in_colo_state()) {
+next_notify_time = DEFAULT_RAM_PENDING_CHECK +
+   qemu_clock_get_ms(QEMU_CLOCK_HOST);
+timer_mod(s->pending_ram_check_timer, next_notify_time);
+qemu_sem_post(>colo_checkpoint_sem);
+}
+}
+
 void migrate_start_colo_process(MigrationState *s)
 {
 qemu_mutex_unlock_iothread();
@@ -658,6 +718,8 @@ void migrate_start_colo_process(MigrationState *s)
 s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
 colo_checkpoint_notify, s);
 
+s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST,
+colo_pending_ram_check_notify, s);
 qemu_sem_init(>colo_exit_sem, 0);
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STA

[PATCH V2 8/8] migration/colo: Only flush ram cache while do checkpoint

2020-02-23 Thread zhanghailiang

After add migrating ram backgroud, we will call ram_load
for this process, but we should not flush ram cache during
this process. Move the flush action to the right place.

Signed-off-by: zhanghailiang 
---
 migration/colo.c | 1 +
 migration/ram.c  | 5 +
 migration/ram.h  | 1 +
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c36d94072f..18df8289f8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -799,6 +799,7 @@ static void 
colo_incoming_process_checkpoint(MigrationIncomingState *mis,
 
 qemu_mutex_lock_iothread();
 vmstate_loading = true;
+colo_flush_ram_cache();
 ret = qemu_load_device_state(fb);
 if (ret < 0) {
 error_setg(errp, "COLO: load device state failed");
diff --git a/migration/ram.c b/migration/ram.c
index 1b3f423351..7bc841d14f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3305,7 +3305,7 @@ static bool postcopy_is_running(void)
  * Flush content of RAM cache into SVM's memory.
  * Only flush the pages that be dirtied by PVM or SVM or both.
  */
-static void colo_flush_ram_cache(void)
+void colo_flush_ram_cache(void)
 {
 RAMBlock *block = NULL;
 void *dst_host;
@@ -3576,9 +3576,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 trace_ram_load_complete(ret, seq_iter);
 
-if (!ret  && migration_incoming_in_colo_state()) {
-colo_flush_ram_cache();
-}
 return ret;
 }
 
diff --git a/migration/ram.h b/migration/ram.h
index 5ceaff7cb4..ae14341482 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -67,5 +67,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
 void colo_incoming_start_dirty_log(void);
+void colo_flush_ram_cache(void);
 
 #endif
-- 
2.21.0

[PATCH V2 4/8] COLO: Optimize memory back-up process

2020-02-23 Thread zhanghailiang

This patch will reduce the downtime of VM for the initial process,
Privously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: zhanghailiang 
---
 migration/colo.c |  3 +++
 migration/ram.c  | 68 +++-
 migration/ram.h  |  1 +
 3 files changed, 54 insertions(+), 18 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 93c5a452fb..44942c4e23 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/rcu.h"
 #include "migration/failover.h"
+#include "migration/ram.h"
 #ifdef CONFIG_REPLICATION
 #include "replication.h"
 #endif
@@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void *opaque)
  */
 qemu_file_set_blocking(mis->from_src_file, true);
 
+colo_incoming_start_dirty_log();
+
 bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
diff --git a/migration/ram.c b/migration/ram.c
index ed23ed1c7c..ebf9e6ba51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
  * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
  * guest memory.
  */
+
 block->bmap = bitmap_new(pages);
 bitmap_set(block->bmap, 0, pages);
 block->clear_bmap_shift = shift;
@@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
 }
 return -errno;
 }
-memcpy(block->colo_cache, block->host, block->used_length);
 }
 }
 
@@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
 
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
-
 block->bmap = bitmap_new(pages);
-bitmap_set(block->bmap, 0, pages);
 }
 }
-ram_state = g_new0(RAMState, 1);
-ram_state->migration_dirty_pages = 0;
-qemu_mutex_init(_state->bitmap_mutex);
-memory_global_dirty_log_start();
 
+ram_state_init(_state);
 return 0;
 }
 
+/* TODO: duplicated with ram_init_bitmaps */
+void colo_incoming_start_dirty_log(void)
+{
+RAMBlock *block = NULL;
+/* For memory_global_dirty_log_start below. */
+qemu_mutex_lock_iothread();
+qemu_mutex_lock_ramlist();
+
+memory_global_dirty_log_sync();
+WITH_RCU_READ_LOCK_GUARD() {
+RAMBLOCK_FOREACH_NOT_IGNORED(block) {
+ramblock_sync_dirty_bitmap(ram_state, block);
+/* Discard this dirty bitmap record */
+bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
+}
+memory_global_dirty_log_start();
+}
+ram_state->migration_dirty_pages = 0;
+qemu_mutex_unlock_ramlist();
+qemu_mutex_unlock_iothread();
+}
+
 /* It is need to hold the global lock to call this helper */
 void colo_release_ram_cache(void)
 {
@@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void)
 }
 }
 }
-qemu_mutex_destroy(_state->bitmap_mutex);
-g_free(ram_state);
-ram_state = NULL;
+ram_state_cleanup(_state);
 }
 
 /**
@@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void)
 ramblock_sync_dirty_bitmap(ram_state, block);
 }
 }
-
 trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
 WITH_RCU_READ_LOCK_GUARD() {
 block = QLIST_FIRST_RCU(_list.blocks);
@@ -3348,7 +3362,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
 ram_addr_t addr, total_ram_bytes;
-void *host = NULL;
+void *host = NULL, *host_bak = NULL;
 uint8_t ch;
 
 /*
@@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
+host = host_from_ram_block_offset(block, addr);
 /*
- * After going into COLO, we should load the Page into colo_cache.
+ * After going into COLO stage, we should not load the page
+ * into SVM's memory diretly, we put them into colo_cache firstly.
+ * NOTE: We need to keep a copy of SVM's ram in colo_cache.
+ * Privously, we copied all these memory in preparing stage of COLO
+ * while we need to stop VM, which is a time-consuming process.
+ * Here we optimize it by a trick, bac

[PATCH V2 3/8] savevm: Don't call colo_init_ram_cache twice

2020-02-23 Thread zhanghailiang

This helper has been called twice which is wrong.
Left the one where called while get COLO enable message
from source side.

Signed-off-by: zhanghailiang 
---
 migration/migration.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 06d1ff9d56..e8c62c6e2e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -484,11 +484,6 @@ static void process_incoming_migration_co(void *opaque)
 goto fail;
 }
 
-if (colo_init_ram_cache() < 0) {
-error_report("Init ram cache failed");
-goto fail;
-}
-
 qemu_thread_create(>colo_incoming_thread, "COLO incoming",
  colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
 mis->have_colo_incoming_thread = true;
-- 
2.21.0

[PATCH V2 2/8] migration/colo: wrap incoming checkpoint process into new helper

2020-02-23 Thread zhanghailiang

Split checkpoint incoming process into a helper.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 260 ---
 1 file changed, 133 insertions(+), 127 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 2c88aa57a2..93c5a452fb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s)
 qemu_mutex_lock_iothread();
 }
 
-static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request,
- Error **errp)
+static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
+  QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
+{
+uint64_t total_size;
+uint64_t value;
+Error *local_err = NULL;
+int ret;
+
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+trace_colo_vm_state_change("run", "stop");
+qemu_mutex_unlock_iothread();
+
+/* FIXME: This is unnecessary for periodic checkpoint mode */
+colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+colo_receive_check_message(mis->from_src_file,
+   COLO_MESSAGE_VMSTATE_SEND, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+qemu_mutex_lock_iothread();
+cpu_synchronize_all_pre_loadvm();
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+qemu_mutex_unlock_iothread();
+
+if (ret < 0) {
+error_setg(errp, "Load VM's live state (ram) error");
+return;
+}
+
+value = colo_receive_message_value(mis->from_src_file,
+ COLO_MESSAGE_VMSTATE_SIZE, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/*
+ * Read VM device state data into channel buffer,
+ * It's better to re-use the memory allocated.
+ * Here we need to handle the channel buffer directly.
+ */
+if (value > bioc->capacity) {
+bioc->capacity = value;
+bioc->data = g_realloc(bioc->data, bioc->capacity);
+}
+total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value);
+if (total_size != value) {
+error_setg(errp, "Got %" PRIu64 " VMState data, less than expected"
+" %" PRIu64, total_size, value);
+return;
+}
+bioc->usage = total_size;
+qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL);
+
+colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+qemu_mutex_lock_iothread();
+vmstate_loading = true;
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_setg(errp, "COLO: load device state failed");
+qemu_mutex_unlock_iothread();
+return;
+}
+
+#ifdef CONFIG_REPLICATION
+replication_get_error_all(_err);
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+#else
+abort();
+#endif
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_EVENT_CHECKPOINT, _err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+qemu_mutex_unlock_iothread();
+return;
+}
+
+vmstate_loading = false;
+vm_start();
+trace_colo_vm_state_change("stop", "run");
+qemu_mutex_unlock_iothread();
+
+if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+failover_set_state(FAILOVER_STATUS_RELAUNCH,
+FAILOVER_STATUS_NONE);
+failover_request_active(NULL);
+return;
+}
+
+colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
+ _err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
+}
+
+static void colo_wait_handle_message(MigrationIncomingState *mis,
+QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp)
 {
 COLOMessage msg;
 Error *local_err = NULL;
 
-msg = colo_receive_message(f, _err);
+msg = colo_receive_message(mis->from_src_file, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 
 switch (msg) {
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
-*checkp

[PATCH V2 0/8] Optimize VM's downtime while do checkpoint in COLO

2020-02-23 Thread zhanghailiang

This series try to  tries to reduce VM's pause time while do checkpoint in COLO 
state.

Here, we use two methods to reduce the downtime during COLO stage:
The first one is to reduce the time of backup PVM's memory into cache,
Instread of doing this once time backup all PVM's memory when VM is stopped, we 
backup
them during the live migration time.

Secondly, we reduced the total number of dirty pages while do checkpoint with 
VM been paused,
instead of sending all dirty pages while VM been pause, it sends part of dirty 
pages
during the gap time of two checkpoints when SVM and PVM are running.

V1 -> V2:
- Fix tested problem found by Daniel Cho
- Fix a degradation after rebase to master (first patch)

Please review, thanks.

Hailiang Zhang (8):
  migration: fix COLO broken caused by a previous commit
  migration/colo: wrap incoming checkpoint process into new helper
  savevm: Don't call colo_init_ram_cache twice
  COLO: Optimize memory back-up process
  ram/colo: only record bitmap of dirty pages in COLO stage
  migration: recognize COLO as part of activating process
  COLO: Migrate dirty pages during the gap of checkpointing
  migration/colo: Only flush ram cache while do checkpoint

 migration/colo.c   | 337 +
 migration/migration.c  |   7 +-
 migration/migration.h  |   1 +
 migration/ram.c|  78 +++---
 migration/ram.h|   2 +
 migration/trace-events |   1 +
 qapi/migration.json|   4 +-
 7 files changed, 269 insertions(+), 161 deletions(-)

--
2.21.0

[PATCH V2 1/8] migration: fix COLO broken caused by a previous commit

2020-02-23 Thread zhanghailiang

This commit "migration: Create migration_is_running()" broke
COLO. Becuase there is a process broken by this commit.

colo_process_checkpoint
 ->colo_do_checkpoint_transaction
   ->migrate_set_block_enabled
 ->qmp_migrate_set_capabilities

It can be fixed by make COLO process as an exception,
Maybe we need a better way to fix it.

Cc: Juan Quintela 
Signed-off-by: zhanghailiang 
---
 migration/migration.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8fb68795dc..06d1ff9d56 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -865,7 +865,6 @@ bool migration_is_running(int state)
 case MIGRATION_STATUS_DEVICE:
 case MIGRATION_STATUS_WAIT_UNPLUG:
 case MIGRATION_STATUS_CANCELLING:
-case MIGRATION_STATUS_COLO:
 return true;
 
 default:
-- 
2.21.0

[PATCH V2 6/8] migration: recognize COLO as part of activating process

2020-02-23 Thread zhanghailiang

We will migrate parts of dirty pages backgroud lively during the gap time
of two checkpoints, without this modification, it will not work
because ram_save_iterate() will check it before send RAM_SAVE_FLAG_EOS
at the end of it.

Signed-off-by: zhanghailiang 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index e8c62c6e2e..f71c337600 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -840,6 +840,7 @@ bool migration_is_setup_or_active(int state)
 case MIGRATION_STATUS_PRE_SWITCHOVER:
 case MIGRATION_STATUS_DEVICE:
 case MIGRATION_STATUS_WAIT_UNPLUG:
+case MIGRATION_STATUS_COLO:
 return true;
 
 default:
-- 
2.21.0

[PATCH V2 5/8] ram/colo: only record bitmap of dirty pages in COLO stage

2020-02-23 Thread zhanghailiang

It is only need to record bitmap of dirty pages while goes
into COLO stage.

Signed-off-by: zhanghailiang 
---
 migration/ram.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ebf9e6ba51..1b3f423351 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2735,7 +2735,7 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 }
 
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
- ram_addr_t offset)
+ ram_addr_t offset, bool record_bitmap)
 {
 if (!offset_in_ramblock(block, offset)) {
 return NULL;
@@ -2751,7 +2751,8 @@ static inline void *colo_cache_from_block_offset(RAMBlock 
*block,
 * It help us to decide which pages in ram cache should be flushed
 * into VM's RAM later.
 */
-if (!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
+if (record_bitmap &&
+!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
 ram_state->migration_dirty_pages++;
 }
 return block->colo_cache + offset;
@@ -3408,13 +3409,13 @@ static int ram_load_precopy(QEMUFile *f)
 if (migration_incoming_colo_enabled()) {
 if (migration_incoming_in_colo_state()) {
 /* In COLO stage, put all pages into cache temporarily */
-host = colo_cache_from_block_offset(block, addr);
+host = colo_cache_from_block_offset(block, addr, true);
 } else {
/*
 * In migration stage but before COLO stage,
 * Put all pages into both cache and SVM's memory.
 */
-host_bak = colo_cache_from_block_offset(block, addr);
+host_bak = colo_cache_from_block_offset(block, addr, 
false);
 }
 }
 if (!host) {
-- 
2.21.0

RE: [PATCH 3/3] COLO: Optimize memory back-up process

2020-02-23 Thread Zhanghailiang

Hi Dave,

> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Friday, February 21, 2020 2:25 AM
> To: Zhanghailiang 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; chen.zh...@intel.com;
> daniel...@qnap.com
> Subject: Re: [PATCH 3/3] COLO: Optimize memory back-up process
> 
> * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> > This patch will reduce the downtime of VM for the initial process,
> > Privously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: Hailiang Zhang 
> 
> OK, I think this is right, but it took me quite a while to understand,
> I think one of the comments below might not be right:
> 

> > ---
> >  migration/colo.c |  3 +++
> >  migration/ram.c  | 35 +++
> >  migration/ram.h  |  1 +
> >  3 files changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index d30c6bc4ad..febf010571 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -26,6 +26,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "migration/failover.h"
> > +#include "migration/ram.h"
> >  #ifdef CONFIG_REPLICATION
> >  #include "replication.h"
> >  #endif
> > @@ -906,6 +907,8 @@ void *colo_process_incoming_thread(void
> *opaque)
> >   */
> >  qemu_file_set_blocking(mis->from_src_file, true);
> >
> > +colo_incoming_start_dirty_log();
> > +
> >  bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
> >  fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
> >  object_unref(OBJECT(bioc));
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ed23ed1c7c..24a8aa3527 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2986,7 +2986,6 @@ int colo_init_ram_cache(void)
> >  }
> >  return -errno;
> >  }
> > -memcpy(block->colo_cache, block->host,
> block->used_length);
> >  }
> >  }
> >
> > @@ -3005,12 +3004,16 @@ int colo_init_ram_cache(void)
> >  bitmap_set(block->bmap, 0, pages);
> >  }
> >  }
> > +
> > +return 0;
> > +}
> > +
> > +void colo_incoming_start_dirty_log(void)
> > +{
> >  ram_state = g_new0(RAMState, 1);
> >  ram_state->migration_dirty_pages = 0;
> >  qemu_mutex_init(_state->bitmap_mutex);
> >  memory_global_dirty_log_start();
> > -
> > -return 0;
> >  }
> >
> >  /* It is need to hold the global lock to call this helper */
> > @@ -3348,7 +3351,7 @@ static int ram_load_precopy(QEMUFile *f)
> >
> >  while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >  ram_addr_t addr, total_ram_bytes;
> > -void *host = NULL;
> > +void *host = NULL, *host_bak = NULL;
> >  uint8_t ch;
> >
> >  /*
> > @@ -3378,13 +3381,26 @@ static int ram_load_precopy(QEMUFile *f)
> >  if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
> >   RAM_SAVE_FLAG_COMPRESS_PAGE |
> RAM_SAVE_FLAG_XBZRLE)) {
> >  RAMBlock *block = ram_block_from_stream(f, flags);
> > -
> >  /*
> > - * After going into COLO, we should load the Page into
> colo_cache.
> > + * After going into COLO, we should load the Page into
> colo_cache
> > + * NOTE: We need to keep a copy of SVM's ram in
> colo_cache.
> > + * Privously, we copied all these memory in preparing stage
> of COLO
> > + * while we need to stop VM, which is a time-consuming
> process.
> > + * Here we optimize it by a trick, back-up every page while
> in
> > + * migration process while COLO is enabled, though it
> affects the
> > + * speed of the migration, but it obviously reduce the
> downtime of
> > + * back-up all SVM'S memory in COLO preparing stage.
> >   */
> > -if (migration_incoming_in_colo_state()) {
> &

RE: [PATCH 2/3] COLO: Migrate dirty pages during the gap of checkpointing

2020-02-23 Thread Zhanghailiang

> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Thursday, February 20, 2020 2:51 AM
> To: Zhanghailiang 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; chen.zh...@intel.com;
> daniel...@qnap.com
> Subject: Re: [PATCH 2/3] COLO: Migrate dirty pages during the gap of
> checkpointing
> 
> * Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> > We can migrate some dirty pages during the gap of checkpointing,
> > by this way, we can reduce the amount of ram migrated during
> checkpointing.
> >
> > Signed-off-by: Hailiang Zhang 
> > ---
> >  migration/colo.c   | 69
> +++---
> >  migration/migration.h  |  1 +
> >  migration/trace-events |  1 +
> >  qapi/migration.json|  4 ++-
> >  4 files changed, 70 insertions(+), 5 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index 93c5a452fb..d30c6bc4ad 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -46,6 +46,13 @@ static COLOMode last_colo_mode;
> >
> >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> >
> > +#define DEFAULT_RAM_PENDING_CHECK 1000
> > +
> > +/* should be calculated by bandwidth and max downtime ? */
> > +#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL)
> 
> Turn both of these magic constants into parameters.
> 

Good idea, will do this in later patches.

> > +static int checkpoint_request;
> > +
> >  bool migration_in_colo_state(void)
> >  {
> >  MigrationState *s = migrate_get_current();
> > @@ -516,6 +523,20 @@ static void
> colo_compare_notify_checkpoint(Notifier *notifier, void *data)
> >  colo_checkpoint_notify(data);
> >  }
> >
> > +static bool colo_need_migrate_ram_background(MigrationState *s)
> > +{
> > +uint64_t pending_size, pend_pre, pend_compat, pend_post;
> > +int64_t max_size = THRESHOLD_PENDING_SIZE;
> > +
> > +qemu_savevm_state_pending(s->to_dst_file, max_size, _pre,
> > +  _compat, _post);
> > +pending_size = pend_pre + pend_compat + pend_post;
> > +
> > +trace_colo_need_migrate_ram_background(pending_size);
> > +return (pending_size >= max_size);
> > +}
> > +
> > +
> >  static void colo_process_checkpoint(MigrationState *s)
> >  {
> >  QIOChannelBuffer *bioc;
> > @@ -571,6 +592,8 @@ static void
> colo_process_checkpoint(MigrationState *s)
> >
> >  timer_mod(s->colo_delay_timer,
> >  current_time + s->parameters.x_checkpoint_delay);
> > +timer_mod(s->pending_ram_check_timer,
> > +current_time + DEFAULT_RAM_PENDING_CHECK);
> 
> What happens if the iterate takes a while and this triggers in the
> middle of the iterate?
> 

It will trigger another iterate after this one been finished.

> >  while (s->state == MIGRATION_STATUS_COLO) {
> >  if (failover_get_state() != FAILOVER_STATUS_NONE) {
> > @@ -583,10 +606,25 @@ static void
> colo_process_checkpoint(MigrationState *s)
> >  if (s->state != MIGRATION_STATUS_COLO) {
> >  goto out;
> >  }
> > -ret = colo_do_checkpoint_transaction(s, bioc, fb);
> > -if (ret < 0) {
> > -goto out;
> > -}
> > +if (atomic_xchg(_request, 0)) {
> > +/* start a colo checkpoint */
> > +ret = colo_do_checkpoint_transaction(s, bioc, fb);
> > +if (ret < 0) {
> > +goto out;
> > +}
> > +} else {
> > +if (colo_need_migrate_ram_background(s)) {
> > +colo_send_message(s->to_dst_file,
> > +
> COLO_MESSAGE_MIGRATE_RAM_BACKGROUND,
> > +  _err);
> > +if (local_err) {
> > +goto out;
> > +}
> > +
> > +qemu_savevm_state_iterate(s->to_dst_file, false);
> > +qemu_put_byte(s->to_dst_file, QEMU_VM_EOF);
> 
> Maybe you should do a qemu_file_get_error(..) at this point to check
> it's OK.

Agreed, we should check it.

> 
> > +}
> > + }
> >  }
> >
> >  out:
> > @@ -626,6 +664,8 @@ out:
> >  colo_compare_unregister_notifier(_compare_notifier);
> >  timer_del(s->colo_delay_timer);
> >  timer_free(s->colo_delay_timer);
> > +timer_del(s->pending_ram_check_timer);
> > +timer_free(s->pending

RE: The issues about architecture of the COLO checkpoint

2020-02-16 Thread Zhanghailiang

Hi Daniel,

I have rebased these patches with newest upstream version, this series 
“Optimize VM's downtime while do checkpoint in COLO”,
It is not been tested, please let me known if there are any problems.

Thanks,
Hailiang

From: Daniel Cho [mailto:daniel...@qnap.com]
Sent: Saturday, February 15, 2020 11:36 AM
To: Dr. David Alan Gilbert 
Cc: Zhang, Chen ; Zhanghailiang 
; qemu-devel@nongnu.org; Jason Wang 

Subject: Re: The issues about architecture of the COLO checkpoint

Hi Dave,

Yes, I agree with you, it does need a timeout.

Hi Hailiang,

We base on qemu-4.1.0 for using COLO feature, in your patch, we found a lot of 
difference  between your version and ours.
Could you give us a latest release version which is close your developing code?

Thanks.

Regards
Daniel Cho

Dr. David Alan Gilbert mailto:dgilb...@redhat.com>> 於 
2020年2月13日 週四 下午6:38寫道：
* Daniel Cho (daniel...@qnap.com<mailto:daniel...@qnap.com>) wrote:
> Hi Hailiang,
>
> 1.
> OK, we will try the patch
> “0001-COLO-Optimize-memory-back-up-process.patch”,
> and thanks for your help.
>
> 2.
> We understand the reason to compare PVM and SVM's packet. However, the
> empty of SVM's packet queue might happened on setting COLO feature and SVM
> broken.
>
> On situation 1 ( setting COLO feature ):
> We could force do checkpoint after setting COLO feature finish, then it
> will protect the state of PVM and SVM . As the Zhang Chen said.
>
> On situation 2 ( SVM broken ):
> COLO will do failover for PVM, so it might not cause any wrong on PVM.
>
> However, those situations are our views, so there might be a big difference
> between reality and our views.
> If we have any wrong views and opinions, please let us know, and correct
> us.

It does need a timeout; the SVM being broken or being in a state where
it never sends the corresponding packet (because of a state difference)
can happen and COLO needs to timeout when the packet hasn't arrived
after a while and trigger the checkpoint.

Dave

> Thanks.
>
> Best regards,
> Daniel Cho
>
> Zhang, Chen mailto:chen.zh...@intel.com>> 於 2020年2月13日 
> 週四 上午10:17寫道：
>
> > Add cc Jason Wang, he is a network expert.
> >
> > In case some network things goes wrong.
> >
> >
> >
> > Thanks
> >
> > Zhang Chen
> >
> >
> >
> > *From:* Zhang, Chen
> > *Sent:* Thursday, February 13, 2020 10:10 AM
> > *To:* 'Zhanghailiang' 
> > mailto:zhang.zhanghaili...@huawei.com>>; 
> > Daniel Cho <
> > daniel...@qnap.com<mailto:daniel...@qnap.com>>
> > *Cc:* Dr. David Alan Gilbert 
> > mailto:dgilb...@redhat.com>>; 
> > qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>
> > *Subject:* RE: The issues about architecture of the COLO checkpoint
> >
> >
> >
> > For the issue 2:
> >
> >
> >
> > COLO need use the network packets to confirm PVM and SVM in the same state,
> >
> > Generally speaking, we can’t send PVM packets without compared with SVM
> > packets.
> >
> > But to prevent jamming, I think COLO can do force checkpoint and send the
> > PVM packets in this case.
> >
> >
> >
> > Thanks
> >
> > Zhang Chen
> >
> >
> >
> > *From:* Zhanghailiang 
> > mailto:zhang.zhanghaili...@huawei.com>>
> > *Sent:* Thursday, February 13, 2020 9:45 AM
> > *To:* Daniel Cho mailto:daniel...@qnap.com>>
> > *Cc:* Dr. David Alan Gilbert 
> > mailto:dgilb...@redhat.com>>; 
> > qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>;
> > Zhang, Chen mailto:chen.zh...@intel.com>>
> > *Subject:* RE: The issues about architecture of the COLO checkpoint
> >
> >
> >
> > Hi,
> >
> >
> >
> > 1.   After re-walked through the codes, yes, you are right, actually,
> > after the first migration, we will keep dirty log on in primary side,
> >
> > And only send the dirty pages in PVM to SVM. The ram cache in secondary
> > side is always a backup of PVM, so we don’t have to
> >
> > Re-send the none-dirtied pages.
> >
> > The reason why the first checkpoint takes longer time is we have to backup
> > the whole VM’s ram into ram cache, that is colo_init_ram_cache().
> >
> > It is time consuming, but I have optimized in the second patch
> > “0001-COLO-Optimize-memory-back-up-process.patch” which you can find in my
> > previous reply.
> >
> >
> >
> > Besides, I found that, In my previous reply “We can only copy the pages
> > that dirtied by PVM and SVM in last checkpoint.”,
> >
> > We have done this opti

RE: The issues about architecture of the COLO checkpoint

2020-02-12 Thread Zhanghailiang

Hi,

1.  After re-walked through the codes, yes, you are right, actually, after 
the first migration, we will keep dirty log on in primary side,

And only send the dirty pages in PVM to SVM. The ram cache in secondary side is 
always a backup of PVM, so we don’t have to

Re-send the none-dirtied pages.

The reason why the first checkpoint takes longer time is we have to backup the 
whole VM’s ram into ram cache, that is colo_init_ram_cache().

It is time consuming, but I have optimized in the second patch 
“0001-COLO-Optimize-memory-back-up-process.patch” which you can find in my 
previous reply.

Besides, I found that, In my previous reply “We can only copy the pages that 
dirtied by PVM and SVM in last checkpoint.”,

We have done this optimization in current upstream codes.

2．I don’t quite understand this question. For COLO, we always need both network 
packets of PVM’s and SVM’s to compare before send this packets to client.
It depends on this to decide whether or not PVM and SVM are in same state.

Thanks,
hailiang
From: Daniel Cho [mailto:daniel...@qnap.com]
Sent: Wednesday, February 12, 2020 4:37 PM
To: Zhang, Chen 
Cc: Zhanghailiang ; Dr. David Alan Gilbert 
; qemu-devel@nongnu.org
Subject: Re: The issues about architecture of the COLO checkpoint

Hi Hailiang,

Thanks for your replaying and explain in detail.
We will try to use the attachments to enhance memory copy.

However, we have some questions for your replying.

1.  As you said, "for each checkpoint, we have to send the whole PVM's pages To 
SVM", why the only first checkpoint will takes more pause time?
In our observing, the first checkpoint will take more time for pausing, then 
other checkpoints will takes a few time for pausing. Does it means only the 
first checkpoint will send the whole pages to SVM, and the other checkpoints 
send the dirty pages to SVM for reloading?

2. We notice the COLO-COMPARE component will stuck the packet until receive 
packets from PVM and SVM, as this rule, when we add the COLO-COMPARE to PVM, 
its network will stuck until SVM start. So it is an other issue to make PVM 
stuck while setting COLO feature. With this issue, could we let colo-compare to 
pass the PVM's packet when the SVM's packet queue is empty? Then, the PVM's 
network won't stock, and "if PVM runs firstly, it still need to wait for The 
network packets from SVM to compare before send it to client side" won't 
happened either.

Best regard,
Daniel Cho

Zhang, Chen mailto:chen.zh...@intel.com>> 於 2020年2月12日 週三 
下午1:45寫道：

> -Original Message-
> From: Zhanghailiang 
> mailto:zhang.zhanghaili...@huawei.com>>
> Sent: Wednesday, February 12, 2020 11:18 AM
> To: Dr. David Alan Gilbert mailto:dgilb...@redhat.com>>; 
> Daniel Cho
> mailto:daniel...@qnap.com>>; Zhang, Chen 
> mailto:chen.zh...@intel.com>>
> Cc: qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>
> Subject: RE: The issues about architecture of the COLO checkpoint
>
> Hi,
>
> Thank you Dave,
>
> I'll reply here directly.
>
> -Original Message-
> From: Dr. David Alan Gilbert 
> [mailto:dgilb...@redhat.com<mailto:dgilb...@redhat.com>]
> Sent: Wednesday, February 12, 2020 1:48 AM
> To: Daniel Cho mailto:daniel...@qnap.com>>; 
> chen.zh...@intel.com<mailto:chen.zh...@intel.com>;
> Zhanghailiang 
> mailto:zhang.zhanghaili...@huawei.com>>
> Cc: qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>
> Subject: Re: The issues about architecture of the COLO checkpoint
>
>
> cc'ing in COLO people:
>
>
> * Daniel Cho (daniel...@qnap.com<mailto:daniel...@qnap.com>) wrote:
> > Hi everyone,
> >  We have some issues about setting COLO feature. Hope somebody
> > could give us some advice.
> >
> > Issue 1:
> >  We dynamic to set COLO feature for PVM(2 core, 16G memory),  but
> > the Primary VM will pause a long time(based on memory size) for
> > waiting SVM start. Does it have any idea to reduce the pause time?
> >
>
> Yes, we do have some ideas to optimize this downtime.
>
> The main problem for current version is, for each checkpoint, we have to
> send the whole PVM's pages
> To SVM, and then copy the whole VM's state into SVM from ram cache, in
> this process, we need both of them be paused.
> Just as you said, the downtime is based on memory size.
>
> So firstly, we need to reduce the sending data while do checkpoint, actually,
> we can migrate parts of PVM's dirty pages in background
> While both of VMs are running. And then we load these pages into ram
> cache (backup memory) in SVM temporarily. While do checkpoint,
> We just send the last dirty pages of PVM to slave side and then copy the ram
> cache into SVM. Further on, we don't have
> To send the whole PVM's dirty pages, we can

RE: The issues about architecture of the COLO checkpoint

2020-02-12 Thread Zhanghailiang

Hi Zhang Chen,

> -Original Message-
> From: Zhang, Chen [mailto:chen.zh...@intel.com]
> Sent: Wednesday, February 12, 2020 1:45 PM
> To: Zhanghailiang ; Dr. David Alan
> Gilbert ; Daniel Cho 
> Cc: qemu-devel@nongnu.org
> Subject: RE: The issues about architecture of the COLO checkpoint
> 
> 
> 
> > -Original Message-
> > From: Zhanghailiang 
> > Sent: Wednesday, February 12, 2020 11:18 AM
> > To: Dr. David Alan Gilbert ; Daniel Cho
> > ; Zhang, Chen 
> > Cc: qemu-devel@nongnu.org
> > Subject: RE: The issues about architecture of the COLO checkpoint
> >
> > Hi,
> >
> > Thank you Dave,
> >
> > I'll reply here directly.
> >
> > -Original Message-
> > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> > Sent: Wednesday, February 12, 2020 1:48 AM
> > To: Daniel Cho ; chen.zh...@intel.com;
> > Zhanghailiang 
> > Cc: qemu-devel@nongnu.org
> > Subject: Re: The issues about architecture of the COLO checkpoint
> >
> >
> > cc'ing in COLO people:
> >
> >
> > * Daniel Cho (daniel...@qnap.com) wrote:
> > > Hi everyone,
> > >  We have some issues about setting COLO feature. Hope somebody
> > > could give us some advice.
> > >
> > > Issue 1:
> > >  We dynamic to set COLO feature for PVM(2 core, 16G memory),
> > > but the Primary VM will pause a long time(based on memory size) for
> > > waiting SVM start. Does it have any idea to reduce the pause time?
> > >
> >
> > Yes, we do have some ideas to optimize this downtime.
> >
> > The main problem for current version is, for each checkpoint, we have
> > to send the whole PVM's pages To SVM, and then copy the whole VM's
> > state into SVM from ram cache, in this process, we need both of them
> > be paused.
> > Just as you said, the downtime is based on memory size.
> >
> > So firstly, we need to reduce the sending data while do checkpoint,
> > actually, we can migrate parts of PVM's dirty pages in background
> > While both of VMs are running. And then we load these pages into ram
> > cache (backup memory) in SVM temporarily. While do checkpoint, We just
> > send the last dirty pages of PVM to slave side and then copy the ram
> > cache into SVM. Further on, we don't have To send the whole PVM's
> > dirty pages, we can only send the pages that dirtied by PVM or SVM
> > during two checkpoints. (Because If one page is not dirtied by both
> > PVM and SVM, the data of this pages will keep same in SVM, PVM, backup
> > memory). This method can reduce the time that consumed in sending
> > data.
> >
> > For the second problem, we can reduce the memory copy by two methods,
> > first one, we don't have to copy the whole pages in ram cache, We can
> > only copy the pages that dirtied by PVM and SVM in last checkpoint.
> > Second, we can use userfault missing function to reduce the Time
> > consumed in memory copy. (For the second time, in theory, we can
> > reduce time consumed in memory into ms level).
> >
> > You can find the first optimization in attachment, it is based on an
> > old qemu version (qemu-2.6), it should not be difficult to rebase it
> > Into master or your version. And please feel free to send the new
> > version if you want into community ;)
> >
> >
> 
> Thanks Hailiang!
> By the way, Do you have time to push the patches to upstream?
> I think this is a better and faster option.
> 

Yes, I can do this, for the second optimization, we need time to realize and 
test

Thanks

> Thanks
> Zhang Chen
> 
> > >
> > > Issue 2:
> > >  In
> > > https://github.com/qemu/qemu/blob/master/migration/colo.c#L503,
> > > could we move start_vm() before Line 488? Because at first
> > > checkpoint PVM will wait for SVM's reply, it cause PVM stop for a while.
> > >
> >
> > No, that makes no sense, because if PVM runs firstly, it still need to
> > wait for The network packets from SVM to compare before send it to client
> side.
> >
> >
> > Thanks,
> > Hailiang
> >
> > >  We set the COLO feature on running VM, so we hope the running
> > > VM could continuous service for users.
> > > Do you have any suggestions for those issues?
> > >
> > > Best regards,
> > > Daniel Cho
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

RE: The issues about architecture of the COLO checkpoint

2020-02-11 Thread Zhanghailiang

Hi,

Thank you Dave,

I'll reply here directly.

-Original Message-
From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com] 
Sent: Wednesday, February 12, 2020 1:48 AM
To: Daniel Cho ; chen.zh...@intel.com; Zhanghailiang 

Cc: qemu-devel@nongnu.org
Subject: Re: The issues about architecture of the COLO checkpoint

cc'ing in COLO people:

* Daniel Cho (daniel...@qnap.com) wrote:
> Hi everyone,
>  We have some issues about setting COLO feature. Hope somebody 
> could give us some advice.
> 
> Issue 1:
>  We dynamic to set COLO feature for PVM(2 core, 16G memory),  but 
> the Primary VM will pause a long time(based on memory size) for 
> waiting SVM start. Does it have any idea to reduce the pause time?
> 

Yes, we do have some ideas to optimize this downtime.

The main problem for current version is, for each checkpoint, we have to send 
the whole PVM's pages
To SVM, and then copy the whole VM's state into SVM from ram cache, in this 
process, we need both of them be paused. 
Just as you said, the downtime is based on memory size. 

So firstly, we need to reduce the sending data while do checkpoint, actually, 
we can migrate parts of PVM's dirty pages in background
While both of VMs are running. And then we load these pages into ram cache 
(backup memory) in SVM temporarily. While do checkpoint,
We just send the last dirty pages of PVM to slave side and then copy the ram 
cache into SVM. Further on, we don't have
To send the whole PVM's dirty pages, we can only send the pages that dirtied by 
PVM or SVM during two checkpoints. (Because
If one page is not dirtied by both PVM and SVM, the data of this pages will 
keep same in SVM, PVM, backup memory). This method can reduce
the time that consumed in sending data.

For the second problem, we can reduce the memory copy by two methods, first 
one, we don't have to copy the whole pages in ram cache,
We can only copy the pages that dirtied by PVM and SVM in last checkpoint. 
Second, we can use userfault missing function to reduce the
Time consumed in memory copy. (For the second time, in theory, we can reduce 
time consumed in memory into ms level).

You can find the first optimization in attachment, it is based on an old qemu 
version (qemu-2.6), it should not be difficult to rebase it
Into master or your version. And please feel free to send the new version if 
you want into community ;)

> 
> Issue 2:
>  In 
> https://github.com/qemu/qemu/blob/master/migration/colo.c#L503,
> could we move start_vm() before Line 488? Because at first checkpoint 
> PVM will wait for SVM's reply, it cause PVM stop for a while.
> 

No, that makes no sense, because if PVM runs firstly, it still need to wait for
The network packets from SVM to compare before send it to client side.

Thanks,
Hailiang

>  We set the COLO feature on running VM, so we hope the running VM 
> could continuous service for users.
> Do you have any suggestions for those issues?
> 
> Best regards,
> Daniel Cho
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

0001-COLO-Migrate-dirty-pages-during-the-gap-of-checkpoin.patch
Description: 0001-COLO-Migrate-dirty-pages-during-the-gap-of-checkpoin.patch

0001-COLO-Optimize-memory-back-up-process.patch
Description: 0001-COLO-Optimize-memory-back-up-process.patch

[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application

2017-12-01 Thread Zhanghailiang

Hi，

We hit a bug in our test while run PCMark 10 in a windows 7 VM,
The VM got stuck and the wallclock was hang after several minutes running
PCMark 10 in it.
It is quite easily to reproduce the bug with the upstream KVM and Qemu.

We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
to
Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.

static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
  int irq_level, bool line_status)
{
… …
 if (!irq_level) {
  ioapic->irr &= ~mask;
  ret = 1;
  goto out;
 }
… …
 if ((edge && old_irr == ioapic->irr) ||
 (!edge && entry.fields.remote_irr)) {
  ret = 0;
  goto out;
 }

According to RTC spec, after RTC injects a High level irq, OS will read CMOS’s
register C to to clear the irq flag, and pull down the irq electric pin.

For Qemu, we will emulate the reading operation in cmos_ioport_read(),
but Guest OS will fire a write operation before to tell which register will be 
read
after this write, where we use s->cmos_index to record the following register 
to read.

But in our test, we found that there is a possible situation that Vcpu fails to 
read
RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
it changes s->cmos_index to RTC_YEAR by a writing action.
The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
will miss
calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
inject RTC irq,
and Windows VM will hang.
static void cmos_ioport_write(void *opaque, hwaddr addr,
  uint64_t data, unsigned size)
{
RTCState *s = opaque;

if ((addr & 1) == 0) {
s->cmos_index = data & 0x7f;
}
……
static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
 unsigned size)
{
RTCState *s = opaque;
int ret;
if ((addr & 1) == 0) {
return 0xff;
} else {
switch(s->cmos_index) {

According to CMOS spec, ‘any write to PROT 0070h should be followed by an 
action to PROT 0071h or the RTC
Will be RTC will be left in an unknown state’, but it seems that we can not 
ensure this sequence in qemu/kvm.

Any ideas ?

Thanks,
Hailiang

[Qemu-devel] [PATCH RESEND v2 17/18] filter-rewriter: handle checkpoint and failover event

2017-04-22 Thread zhanghailiang

After one round of checkpoint, the states between PVM and SVM
become consistent, so it is unnecessary to adjust the sequence
of net packets for old connections, besides, while failover
happens, filter-rewriter needs to check if it still needs to
adjust sequence of net packets.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/filter-rewriter.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c9a6d43..0a90b11 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -22,6 +22,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/iov.h"
 #include "net/checksum.h"
+#include "net/colo.h"
 
 #define FILTER_COLO_REWRITER(obj) \
 OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
@@ -270,6 +271,43 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
 return 0;
 }
 
+static void reset_seq_offset(gpointer key, gpointer value, gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+conn->offset = 0;
+}
+
+static gboolean offset_is_nonzero(gpointer key,
+  gpointer value,
+  gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+return conn->offset ? true : false;
+}
+
+static void colo_rewriter_handle_event(NetFilterState *nf, int event,
+   Error **errp)
+{
+RewriterState *rs = FILTER_COLO_REWRITER(nf);
+
+switch (event) {
+case COLO_CHECKPOINT:
+g_hash_table_foreach(rs->connection_track_table,
+reset_seq_offset, NULL);
+break;
+case COLO_FAILOVER:
+if (!g_hash_table_find(rs->connection_track_table,
+  offset_is_nonzero, NULL)) {
+object_property_set_str(OBJECT(nf), "off", "status", errp);
+}
+break;
+default:
+break;
+}
+}
+
 static void colo_rewriter_cleanup(NetFilterState *nf)
 {
 RewriterState *s = FILTER_COLO_REWRITER(nf);
@@ -299,6 +337,7 @@ static void colo_rewriter_class_init(ObjectClass *oc, void 
*data)
 nfc->setup = colo_rewriter_setup;
 nfc->cleanup = colo_rewriter_cleanup;
 nfc->receive_iov = colo_rewriter_receive_iov;
+nfc->handle_event = colo_rewriter_handle_event;
 }
 
 static const TypeInfo colo_rewriter_info = {
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 16/18] filter: Add handle_event method for NetFilterClass

2017-04-22 Thread zhanghailiang

Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 include/net/filter.h |  5 +
 net/filter.c | 16 
 net/net.c| 28 
 3 files changed, 49 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 0c4a2ea..df4510d 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -37,6 +37,8 @@ typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
 
 typedef void (FilterStatusChanged) (NetFilterState *nf, Error **errp);
 
+typedef void (FilterHandleEvent) (NetFilterState *nf, int event, Error **errp);
+
 typedef struct NetFilterClass {
 ObjectClass parent_class;
 
@@ -44,6 +46,7 @@ typedef struct NetFilterClass {
 FilterSetup *setup;
 FilterCleanup *cleanup;
 FilterStatusChanged *status_changed;
+FilterHandleEvent *handle_event;
 /* mandatory */
 FilterReceiveIOV *receive_iov;
 } NetFilterClass;
@@ -76,4 +79,6 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 int iovcnt,
 void *opaque);
 
+void colo_notify_filters_event(int event, Error **errp);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index 1dfd2ca..993b35e 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -17,6 +17,7 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
+#include "net/colo.h"
 
 static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
 {
@@ -245,11 +246,26 @@ static void netfilter_finalize(Object *obj)
 g_free(nf->netdev_id);
 }
 
+static void dummy_handle_event(NetFilterState *nf, int event, Error **errp)
+{
+switch (event) {
+case COLO_CHECKPOINT:
+break;
+case COLO_FAILOVER:
+object_property_set_str(OBJECT(nf), "off", "status", errp);
+break;
+default:
+break;
+}
+}
+
 static void netfilter_class_init(ObjectClass *oc, void *data)
 {
 UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
 
 ucc->complete = netfilter_complete;
+nfc->handle_event = dummy_handle_event;
 }
 
 static const TypeInfo netfilter_info = {
diff --git a/net/net.c b/net/net.c
index 0ac3b9e..1373f63 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1373,6 +1373,34 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
 }
 }
 
+void colo_notify_filters_event(int event, Error **errp)
+{
+NetClientState *nc, *peer;
+NetClientDriver type;
+NetFilterState *nf;
+NetFilterClass *nfc = NULL;
+Error *local_err = NULL;
+
+QTAILQ_FOREACH(nc, _clients, next) {
+peer = nc->peer;
+type = nc->info->type;
+if (!peer || type != NET_CLIENT_DRIVER_NIC) {
+continue;
+}
+QTAILQ_FOREACH(nf, >filters, next) {
+nfc =  NETFILTER_GET_CLASS(OBJECT(nf));
+if (!nfc->handle_event) {
+continue;
+}
+nfc->handle_event(nf, event, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+}
+
 void qmp_set_link(const char *name, bool up, Error **errp)
 {
 NetClientState *ncs[MAX_QUEUE_NUM];
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 09/18] COLO: Flush memory data from ram cache

2017-04-22 Thread zhanghailiang

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/ram.c   | 40 
 migration/trace-events|  2 ++
 3 files changed, 43 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba765eb..2aa7654 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -364,4 +364,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_flush_ram_cache(void);
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 0653a24..df10d4b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2602,6 +2602,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_state_get() >= POSTCOPY_INCOMING_ADVISE;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -2636,6 +2637,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (migration_incoming_in_colo_state()) {
 host = colo_cache_from_block_offset(block, addr);
+need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);
 }
@@ -2742,6 +2744,10 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 wait_for_decompress_done();
 rcu_read_unlock();
 trace_ram_load_complete(ret, seq_iter);
+
+if (!ret  && ram_cache_enable && need_flush) {
+colo_flush_ram_cache();
+}
 return ret;
 }
 
@@ -2810,6 +2816,40 @@ void colo_release_ram_cache(void)
 rcu_read_unlock();
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+void colo_flush_ram_cache(void)
+{
+RAMBlock *block = NULL;
+void *dst_host;
+void *src_host;
+unsigned long offset = 0;
+
+trace_colo_flush_ram_cache_begin(ram_state.migration_dirty_pages);
+rcu_read_lock();
+block = QLIST_FIRST_RCU(_list.blocks);
+
+while (block) {
+offset = migration_bitmap_find_dirty(_state, block, offset);
+migration_bitmap_clear_dirty(_state, block, offset);
+
+if (offset << TARGET_PAGE_BITS >= block->used_length) {
+offset = 0;
+block = QLIST_NEXT_RCU(block, next);
+} else {
+dst_host = block->host + (offset << TARGET_PAGE_BITS);
+src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
+memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+}
+}
+
+rcu_read_unlock();
+trace_colo_flush_ram_cache_end();
+assert(ram_state.migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
 .save_live_setup = ram_save_setup,
 .save_live_iterate = ram_save_iterate,
diff --git a/migration/trace-events b/migration/trace-events
index b8f01a2..93f4337 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -72,6 +72,8 @@ ram_discard_range(const char *rbname, uint64_t start, size_t 
len) "%s: start: %"
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 
%zx len: %zx"
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 15/18] COLO: flush host dirty ram from cache

2017-04-22 Thread zhanghailiang

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
v2:
 - stop dirty log after exit from COLO state. (Dave)
---
 migration/ram.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index f171a82..7bf3515 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2775,6 +2775,7 @@ int colo_init_ram_cache(void)
 ram_state.ram_bitmap = g_new0(RAMBitmap, 1);
 ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page());
 ram_state.migration_dirty_pages = 0;
+memory_global_dirty_log_start();
 
 return 0;
 
@@ -2798,6 +2799,7 @@ void colo_release_ram_cache(void)
 
 atomic_rcu_set(_state.ram_bitmap, NULL);
 if (bitmap) {
+memory_global_dirty_log_stop();
 call_rcu(bitmap, migration_bitmap_free, rcu);
 }
 
@@ -2822,6 +2824,16 @@ void colo_flush_ram_cache(void)
 void *src_host;
 unsigned long offset = 0;
 
+memory_global_dirty_log_sync();
+qemu_mutex_lock(_state.bitmap_mutex);
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, _list.blocks, next) {
+migration_bitmap_sync_range(_state, block, block->offset,
+block->used_length);
+}
+rcu_read_unlock();
+qemu_mutex_unlock(_state.bitmap_mutex);
+
 trace_colo_flush_ram_cache_begin(ram_state.migration_dirty_pages);
 rcu_read_lock();
 block = QLIST_FIRST_RCU(_list.blocks);
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 10/18] qmp event: Add COLO_EXIT event to notify users while exited COLO

2017-04-22 Thread zhanghailiang

If some errors happen during VM's COLO FT stage, it's important to
notify the users of this event. Together with 'x_colo_lost_heartbeat',
Users can intervene in COLO's failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster <arm...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 migration/colo.c | 19 +++
 qapi-schema.json | 14 ++
 qapi/event.json  | 21 +
 3 files changed, 54 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 9949293..e62da93 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -516,6 +516,18 @@ out:
 qemu_fclose(fb);
 }
 
+/*
+ * There are only two reasons we can go here, some error happened.
+ * Or the user triggered failover.
+ */
+if (failover_get_state() == FAILOVER_STATUS_NONE) {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
+
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(>colo_exit_sem);
 qemu_sem_destroy(>colo_exit_sem);
@@ -757,6 +769,13 @@ out:
 if (local_err) {
 error_report_err(local_err);
 }
+if (failover_get_state() == FAILOVER_STATUS_NONE) {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
 
 if (fb) {
 qemu_fclose(fb);
diff --git a/qapi-schema.json b/qapi-schema.json
index 4b3e1b7..460ca53 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1233,6 +1233,20 @@
   'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
 
 ##
+# @COLOExitReason:
+#
+# The reason for a COLO exit
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.10
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'request', 'error' ] }
+
+##
 # @x-colo-lost-heartbeat:
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index e80f3f4..924bc6f 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -441,6 +441,27 @@
   'data': { 'pass': 'int' } }
 
 ##
+# @COLO_EXIT:
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 2.10
+#
+# Example:
+#
+# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172},
+#  "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } 
}
+#
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
 # @ACPI_DEVICE_OST:
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 06/18] COLO: Add block replication into colo process

2017-04-22 Thread zhanghailiang

Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Max Reitz <mre...@redhat.com>
Cc: Xie Changlong <xiechanglon...@gmail.com>
---
 migration/colo.c  | 50 ++
 migration/migration.c | 16 
 2 files changed, 66 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index c4fc865..9949293 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -23,6 +23,9 @@
 #include "qmp-commands.h"
 #include "net/colo-compare.h"
 #include "net/colo.h"
+#include "qapi-event.h"
+#include "block/block.h"
+#include "replication.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -57,6 +60,7 @@ static void secondary_vm_do_failover(void)
 {
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
 
 /* Can not do failover during the process of VM's loading VMstate, Or
  * it will break the secondary VM.
@@ -74,6 +78,11 @@ static void secondary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -111,6 +120,7 @@ static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
 int old_state;
+Error *local_err = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
@@ -134,6 +144,13 @@ static void primary_vm_do_failover(void)
  FailoverStatus_lookup[old_state]);
 return;
 }
+
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+local_err = NULL;
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_exit_sem);
 }
@@ -345,6 +362,15 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 s->params.shared = 0;
 qemu_savevm_state_header(fb);
 qemu_savevm_state_begin(fb, >params);
+
+/* We call this API although this may do nothing on primary side. */
+qemu_mutex_lock_iothread();
+replication_do_checkpoint_all(_err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete_precopy(fb, false);
 qemu_mutex_unlock_iothread();
@@ -451,6 +477,12 @@ static void colo_process_checkpoint(MigrationState *s)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_PRIMARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
@@ -554,6 +586,7 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_GUEST_SHUTDOWN:
 qemu_mutex_lock_iothread();
 vm_stop_force_state(RUN_STATE_COLO);
+replication_stop_all(false, NULL);
 qemu_system_shutdown_request_core();
 qemu_mutex_unlock_iothread();
 /*
@@ -602,6 +635,11 @@ void *colo_process_incoming_thread(void *opaque)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_SECONDARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 vm_start();
 trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
@@ -682,6 +720,18 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+replication_get_error_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
diff --git a/migration/migration.c b/migration/migration.c
index 2ade2aa..755ea54 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -394,6 +394,7 @@ static void process_incoming_migration_c

[Qemu-devel] [PATCH RESEND v2 02/18] colo-compare: implement the process of checkpoint

2017-04-22 Thread zhanghailiang

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
 net/colo-compare.c | 78 ++
 net/colo-compare.h |  6 +
 2 files changed, 84 insertions(+)
 create mode 100644 net/colo-compare.h

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 97bf0e5..3adccfb 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -29,17 +29,24 @@
 #include "qemu/sockets.h"
 #include "qapi-visit.h"
 #include "net/colo.h"
+#include "net/colo-compare.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+static QTAILQ_HEAD(, CompareState) net_compares =
+   QTAILQ_HEAD_INITIALIZER(net_compares);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
 /* TODO: Should be configurable */
 #define REGULAR_PACKET_CHECK_MS 3000
 
+static QemuMutex event_mtx = { .lock = PTHREAD_MUTEX_INITIALIZER };
+static QemuCond event_complete_cond = { .cond = PTHREAD_COND_INITIALIZER };
+static int event_unhandled_count;
 /*
   + CompareState ++
   |   |
@@ -87,6 +94,10 @@ typedef struct CompareState {
 
 GMainContext *worker_context;
 GMainLoop *compare_loop;
+/* Used for COLO to notify compare to do something */
+FilterNotifier *notifier;
+
+QTAILQ_ENTRY(CompareState) next;
 } CompareState;
 
 typedef struct CompareClass {
@@ -417,6 +428,11 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 while (!g_queue_is_empty(>primary_list) &&
!g_queue_is_empty(>secondary_list)) {
 pkt = g_queue_pop_tail(>primary_list);
+if (!pkt) {
+error_report("colo-compare pop pkt failed");
+return;
+}
+
 switch (conn->ip_proto) {
 case IPPROTO_TCP:
 result = g_queue_find_custom(>secondary_list,
@@ -538,6 +554,53 @@ static gboolean check_old_packet_regular(void *opaque)
 return TRUE;
 }
 
+/* Public API, Used for COLO frame to notify compare event */
+void colo_notify_compares_event(void *opaque, int event, Error **errp)
+{
+CompareState *s;
+int ret;
+
+qemu_mutex_lock(_mtx);
+QTAILQ_FOREACH(s, _compares, next) {
+ret = filter_notifier_set(s->notifier, event);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write value to eventfd");
+goto fail;
+}
+event_unhandled_count++;
+}
+/* Wait all compare threads to finish handling this event */
+while (event_unhandled_count > 0) {
+qemu_cond_wait(_complete_cond, _mtx);
+}
+
+fail:
+qemu_mutex_unlock(_mtx);
+}
+
+static void colo_flush_packets(void *opaque, void *user_data);
+
+static void colo_compare_handle_event(void *opaque, int event)
+{
+FilterNotifier *notify = opaque;
+CompareState *s = notify->opaque;
+
+switch (event) {
+case COLO_CHECKPOINT:
+g_queue_foreach(>conn_list, colo_flush_packets, s);
+break;
+case COLO_FAILOVER:
+break;
+default:
+break;
+}
+qemu_mutex_lock(_mtx);
+assert(event_unhandled_count > 0);
+event_unhandled_count--;
+qemu_cond_broadcast(_complete_cond);
+qemu_mutex_unlock(_mtx);
+}
+
 static void *colo_compare_thread(void *opaque)
 {
 CompareState *s = opaque;
@@ -558,10 +621,15 @@ static void *colo_compare_thread(void *opaque)
   (GSourceFunc)check_old_packet_regular, s, NULL);
 g_source_attach(timeout_source, s->worker_context);
 
+s->notifier = filter_notifier_new(colo_compare_handle_event, s, NULL);
+g_source_attach(>notifier->source, s->worker_context);
+
 qemu_sem_post(>thread_ready);
 
 g_main_loop_run(s->compare_loop);
 
+g_source_destroy(>notifier->source);
+g_source_unref(>notifier->source);
 g_source_destroy(timeout_source);
 g_source_unref(timeout_source);
 
@@ -706,6 +774,8 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 net_socket_rs_init(>pri_rs, compare_pri_rs_finalize);
 net_socket_rs_init(>sec_rs, compare_sec_rs_finalize);
 
+QTAILQ_INSERT_TAIL(_compares, s, next);
+
 g_queue_init(>conn_list);
 
 s->connection_track_table = g_hash_table_new_full(connection_key_hash,
@@ -765,6 +835,7 @@ static void colo_compare_init(Object *obj)
 static void colo_compare_finalize(Object *obj)
 {
 CompareState *s = COLO_COMPARE(obj);
+CompareState *tmp = NULL;
 
 qemu_chr_f

[Qemu-devel] [PATCH RESEND v2 07/18] COLO: Load dirty pages into SVM's RAM cache firstly

2017-04-22 Thread zhanghailiang

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
---
v2:
- Move colo_init_ram_cache() and colo_release_ram_cache() out of
  incoming thread since both of them need the global lock, if we keep
  colo_release_ram_cache() in incoming thread, there are potential
  dead-lock.
- Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead.
- Remove the Reviewd-by tag because of the above changes.
---
 include/exec/ram_addr.h   |  1 +
 include/migration/migration.h |  4 +++
 migration/migration.c |  6 
 migration/ram.c   | 71 ++-
 4 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c9ddcd0..0b3d77c 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -27,6 +27,7 @@ struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
 uint8_t *host;
+uint8_t *colo_cache; /* For colo, VM's ram cache */
 ram_addr_t offset;
 ram_addr_t used_length;
 ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba1a16c..ba765eb 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -360,4 +360,8 @@ uint64_t ram_pagesize_summary(void);
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 755ea54..7419404 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -441,6 +441,10 @@ static void process_incoming_migration_co(void *opaque)
 error_report_err(local_err);
 exit(EXIT_FAILURE);
 }
+if (colo_init_ram_cache() < 0) {
+error_report("Init ram cache failed");
+exit(EXIT_FAILURE);
+}
 mis->migration_incoming_co = qemu_coroutine_self();
 qemu_thread_create(>colo_incoming_thread, "COLO incoming",
  colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
@@ -449,6 +453,8 @@ static void process_incoming_migration_co(void *opaque)
 
 /* Wait checkpoint incoming thread exit before free resource */
 qemu_thread_join(>colo_incoming_thread);
+/* We hold the global iothread lock, so it is safe here */
+colo_release_ram_cache();
 }
 
 if (ret < 0) {
diff --git a/migration/ram.c b/migration/ram.c
index f48664e..05d1b06 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2265,6 +2265,20 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+if (!offset_in_ramblock(block, offset)) {
+return NULL;
+}
+if (!block->colo_cache) {
+error_report("%s: colo_cache is NULL in block :%s",
+ __func__, block->idstr);
+return NULL;
+}
+return block->colo_cache + offset;
+}
+
 /**
  * ram_handle_compressed: handle the zero page case
  *
@@ -2605,7 +2619,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
-host = host_from_ram_block_offset(block, addr);
+/* After going into COLO, we should load the Page into colo_cache 
*/
+if (migration_incoming_in_colo_state()) {
+host = colo_cache_from_block_offset(block, addr);
+} else {
+host = host_from_ram_block_offset(block, addr);
+}
 if (!host) {
 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
 ret = -EINVAL;
@@ -2712,6 +2731,56 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it is nee

[Qemu-devel] [PATCH RESEND v2 18/18] COLO: notify net filters about checkpoint/failover event

2017-04-22 Thread zhanghailiang

Notify all net filters about the checkpoint and failover event.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 migration/colo.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 66bb5b2..62f58c6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@
 #include "qapi-event.h"
 #include "block/block.h"
 #include "replication.h"
+#include "net/filter.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -82,6 +83,11 @@ static void secondary_vm_do_failover(void)
 if (local_err) {
 error_report_err(local_err);
 }
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_FAILOVER, _err);
+if (local_err) {
+error_report_err(local_err);
+}
 
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
@@ -794,6 +800,13 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_CHECKPOINT, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 08/18] ram/COLO: Record the dirty pages that SVM received

2017-04-22 Thread zhanghailiang

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/ram.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 05d1b06..0653a24 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2268,6 +2268,9 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
  ram_addr_t offset)
 {
+unsigned long *bitmap;
+long k;
+
 if (!offset_in_ramblock(block, offset)) {
 return NULL;
 }
@@ -2276,6 +2279,17 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
  __func__, block->idstr);
 return NULL;
 }
+
+k = (memory_region_get_ram_addr(block->mr) + offset) >> TARGET_PAGE_BITS;
+bitmap = atomic_rcu_read(_state.ram_bitmap)->bmap;
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+if (!test_and_set_bit(k, bitmap)) {
+ram_state.migration_dirty_pages++;
+}
 return block->colo_cache + offset;
 }
 
@@ -2752,6 +2766,15 @@ int colo_init_ram_cache(void)
 memcpy(block->colo_cache, block->host, block->used_length);
 }
 rcu_read_unlock();
+/*
+* Record the dirty pages that sent by PVM, we use this dirty bitmap 
together
+* with to decide which page in cache should be flushed into SVM's RAM. Here
+* we use the same name 'ram_bitmap' as for migration.
+*/
+ram_state.ram_bitmap = g_new0(RAMBitmap, 1);
+ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page());
+ram_state.migration_dirty_pages = 0;
+
 return 0;
 
 out_locked:
@@ -2770,6 +2793,12 @@ out_locked:
 void colo_release_ram_cache(void)
 {
 RAMBlock *block;
+RAMBitmap *bitmap = ram_state.ram_bitmap;
+
+atomic_rcu_set(_state.ram_bitmap, NULL);
+if (bitmap) {
+call_rcu(bitmap, migration_bitmap_free, rcu);
+}
 
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, _list.blocks, next) {
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 00/18] COLO: integrate colo frame with block replication and net compare

2017-04-22 Thread zhanghailiang

Hi,

(Sorry, I have misspelled Dave's email address, resend this series.)

COLO Frame, block replication and COLO net compare have been exist in qemu
for long time, it's time to integrate these three parts to make COLO really 
works.

In this series, we have some optimizations for COLO frame, including separating 
the
process of saving ram and device state, using an COLO_EXIT event to notify 
users that
VM exits COLO, for these parts, most of them have been reviewed long time ago 
in old version,
but since this series have just rebased on upstream which had merged a new 
series of migration,
parts of pathes in this series deserve review again.

We use notifier/callback method for COLO compare to notify COLO frame about
net packets inconsistent event, and add a handle_event method for 
NetFilterClass to
help COLO frame to notify filters and colo-compare about checkpoint/failover 
event, 
it is flexible.

Besides, this series is on top of '[PATCH 0/3] colo-compare: fix three bugs' 
series.

For the neweset version, please refer to:
https://github.com/coloft/qemu/tree/colo-for-qemu-2.10-2017-4-22

Please review, thanks.

Cc: Dong eddie <eddie.d...@intel.com>
Cc: Jiang yunhong <yunhong.ji...@intel.com>
Cc: Xu Quan <xuqu...@huawei.com>
Cc: Jason Wang <jasow...@redhat.com>

zhanghailiang (18):
  net/colo: Add notifier/callback related helpers for filter
  colo-compare: implement the process of checkpoint
  colo-compare: use notifier to notify packets comparing result
  COLO: integrate colo compare with colo frame
  COLO: Handle shutdown command for VM in COLO state
  COLO: Add block replication into colo process
  COLO: Load dirty pages into SVM's RAM cache firstly
  ram/COLO: Record the dirty pages that SVM received
  COLO: Flush memory data from ram cache
  qmp event: Add COLO_EXIT event to notify users while exited COLO
  savevm: split save/find loadvm_handlers entry into two helper
functions
  savevm: split the process of different stages for loadvm/savevm
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  COLO: flush host dirty ram from cache
  filter: Add handle_event method for NetFilterClass
  filter-rewriter: handle checkpoint and failover event
  COLO: notify net filters about checkpoint/failover event

 include/exec/ram_addr.h   |   1 +
 include/migration/colo.h  |   1 +
 include/migration/migration.h |   5 +
 include/net/filter.h  |   5 +
 include/sysemu/sysemu.h   |   9 ++
 migration/colo.c  | 242 +++---
 migration/migration.c |  24 -
 migration/ram.c   | 147 -
 migration/savevm.c| 113 
 migration/trace-events|   2 +
 net/colo-compare.c| 110 ++-
 net/colo-compare.h|   8 ++
 net/colo.c| 105 ++
 net/colo.h|  19 
 net/filter-rewriter.c |  39 +++
 net/filter.c  |  16 +++
 net/net.c |  28 +
 qapi-schema.json  |  18 +++-
 qapi/event.json   |  21 
 vl.c  |  19 +++-
 20 files changed, 886 insertions(+), 46 deletions(-)
 create mode 100644 net/colo-compare.h

-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 04/18] COLO: integrate colo compare with colo frame

2017-04-22 Thread zhanghailiang

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c  | 42 --
 migration/migration.c |  2 +-
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c19eb3f..a3344ce 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,8 +21,11 @@
 #include "migration/failover.h"
 #include "replication.h"
 #include "qmp-commands.h"
+#include "net/colo-compare.h"
+#include "net/colo.h"
 
 static bool vmstate_loading;
+static Notifier packets_compare_notifier;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -332,6 +335,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
@@ -390,6 +398,11 @@ out:
 return ret;
 }
 
+static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
+{
+colo_checkpoint_notify(data);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -406,6 +419,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
+packets_compare_notifier.notify = colo_compare_notify_checkpoint;
+colo_compare_register_notifier(_compare_notifier);
+
 /*
  * Wait for Secondary finish loading VM states and enter COLO
  * restore.
@@ -451,11 +467,21 @@ out:
 qemu_fclose(fb);
 }
 
-timer_del(s->colo_delay_timer);
-
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(>colo_exit_sem);
 qemu_sem_destroy(>colo_exit_sem);
+
+/*
+ * It is safe to unregister notifier after failover finished.
+ * Besides, colo_delay_timer and colo_checkpoint_sem can't be
+ * released befor unregister notifier, or there will be use-after-free
+ * error.
+ */
+colo_compare_unregister_notifier(_compare_notifier);
+timer_del(s->colo_delay_timer);
+timer_free(s->colo_delay_timer);
+qemu_sem_destroy(>colo_checkpoint_sem);
+
 /*
  * Must be called after failover BH is completed,
  * Or the failover BH may shutdown the wrong fd that
@@ -548,6 +574,11 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+qemu_mutex_lock_iothread();
+vm_start();
+trace_colo_vm_state_change("stop", "run");
+qemu_mutex_unlock_iothread();
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
   _err);
 if (local_err) {
@@ -567,6 +598,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+trace_colo_vm_state_change("run", "stop");
+qemu_mutex_unlock_iothread();
+
 /* FIXME: This is unnecessary for periodic checkpoint mode */
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
  _err);
@@ -620,6 +656,8 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 vmstate_loading = false;
+vm_start();
+trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
 
 if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
diff --git a/migration/migration.c b/migration/migration.c
index 353f272..2ade2aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -70,7 +70,7 @@
 /* The delay time (in ms) between two COLO checkpoints
  * Note: Please change this default value to 1 when we support hybrid mode.
  */
-#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 03/18] colo-compare: use notifier to notify packets comparing result

2017-04-22 Thread zhanghailiang

It's a good idea to use notifier to notify COLO frame of
inconsistent packets comparing.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo-compare.c | 32 
 net/colo-compare.h |  2 ++
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 3adccfb..bb234dd 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -30,6 +30,7 @@
 #include "qapi-visit.h"
 #include "net/colo.h"
 #include "net/colo-compare.h"
+#include "migration/migration.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
@@ -38,6 +39,9 @@
 static QTAILQ_HEAD(, CompareState) net_compares =
QTAILQ_HEAD_INITIALIZER(net_compares);
 
+static NotifierList colo_compare_notifiers =
+NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -384,6 +388,22 @@ static int colo_old_packet_check_one(Packet *pkt, int64_t 
*check_time)
 }
 }
 
+static void colo_compare_inconsistent_notify(void)
+{
+notifier_list_notify(_compare_notifiers,
+migrate_get_current());
+}
+
+void colo_compare_register_notifier(Notifier *notify)
+{
+notifier_list_add(_compare_notifiers, notify);
+}
+
+void colo_compare_unregister_notifier(Notifier *notify)
+{
+notifier_remove(notify);
+}
+
 static void colo_old_packet_check_one_conn(void *opaque,
void *user_data)
 {
@@ -397,7 +417,7 @@ static void colo_old_packet_check_one_conn(void *opaque,
 
 if (result) {
 /* do checkpoint will flush old packet */
-/* TODO: colo_notify_checkpoint();*/
+colo_compare_inconsistent_notify();
 }
 }
 
@@ -415,7 +435,10 @@ static void colo_old_packet_check(void *opaque)
 
 /*
  * Called from the compare thread on the primary
- * for compare connection
+ * for compare connection.
+ * TODO: Reconstruct this function, we should hold the max handled sequence
+ * of the connect, Don't trigger a checkpoint request if we only get packets
+ * from one side (primary or secondary).
  */
 static void colo_compare_connection(void *opaque, void *user_data)
 {
@@ -464,11 +487,12 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 /*
  * If one packet arrive late, the secondary_list or
  * primary_list will be empty, so we can't compare it
- * until next comparison.
+ * until next comparison. If the packets in the list are
+ * timeout, it will trigger a checkpoint request.
  */
 trace_colo_compare_main("packet different");
 g_queue_push_tail(>primary_list, pkt);
-/* TODO: colo_notify_checkpoint();*/
+colo_compare_inconsistent_notify();
 break;
 }
 }
diff --git a/net/colo-compare.h b/net/colo-compare.h
index c9c62f5..a0b573e 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -2,5 +2,7 @@
 #define QEMU_COLO_COMPARE_H
 
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
+void colo_compare_register_notifier(Notifier *notify);
+void colo_compare_unregister_notifier(Notifier *notify);
 
 #endif /* QEMU_COLO_COMPARE_H */
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 03/18] colo-compare: use notifier to notify packets comparing result

2017-04-22 Thread zhanghailiang

It's a good idea to use notifier to notify COLO frame of
inconsistent packets comparing.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo-compare.c | 32 
 net/colo-compare.h |  2 ++
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 3adccfb..bb234dd 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -30,6 +30,7 @@
 #include "qapi-visit.h"
 #include "net/colo.h"
 #include "net/colo-compare.h"
+#include "migration/migration.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
@@ -38,6 +39,9 @@
 static QTAILQ_HEAD(, CompareState) net_compares =
QTAILQ_HEAD_INITIALIZER(net_compares);
 
+static NotifierList colo_compare_notifiers =
+NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -384,6 +388,22 @@ static int colo_old_packet_check_one(Packet *pkt, int64_t 
*check_time)
 }
 }
 
+static void colo_compare_inconsistent_notify(void)
+{
+notifier_list_notify(_compare_notifiers,
+migrate_get_current());
+}
+
+void colo_compare_register_notifier(Notifier *notify)
+{
+notifier_list_add(_compare_notifiers, notify);
+}
+
+void colo_compare_unregister_notifier(Notifier *notify)
+{
+notifier_remove(notify);
+}
+
 static void colo_old_packet_check_one_conn(void *opaque,
void *user_data)
 {
@@ -397,7 +417,7 @@ static void colo_old_packet_check_one_conn(void *opaque,
 
 if (result) {
 /* do checkpoint will flush old packet */
-/* TODO: colo_notify_checkpoint();*/
+colo_compare_inconsistent_notify();
 }
 }
 
@@ -415,7 +435,10 @@ static void colo_old_packet_check(void *opaque)
 
 /*
  * Called from the compare thread on the primary
- * for compare connection
+ * for compare connection.
+ * TODO: Reconstruct this function, we should hold the max handled sequence
+ * of the connect, Don't trigger a checkpoint request if we only get packets
+ * from one side (primary or secondary).
  */
 static void colo_compare_connection(void *opaque, void *user_data)
 {
@@ -464,11 +487,12 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 /*
  * If one packet arrive late, the secondary_list or
  * primary_list will be empty, so we can't compare it
- * until next comparison.
+ * until next comparison. If the packets in the list are
+ * timeout, it will trigger a checkpoint request.
  */
 trace_colo_compare_main("packet different");
 g_queue_push_tail(>primary_list, pkt);
-/* TODO: colo_notify_checkpoint();*/
+colo_compare_inconsistent_notify();
 break;
 }
 }
diff --git a/net/colo-compare.h b/net/colo-compare.h
index c9c62f5..a0b573e 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -2,5 +2,7 @@
 #define QEMU_COLO_COMPARE_H
 
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
+void colo_compare_register_notifier(Notifier *notify);
+void colo_compare_unregister_notifier(Notifier *notify);
 
 #endif /* QEMU_COLO_COMPARE_H */
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 05/18] COLO: Handle shutdown command for VM in COLO state

2017-04-22 Thread zhanghailiang

If VM is in COLO FT state, we need to do some extra works before
starting normal shutdown process.

Secondary VM will ignore the shutdown command if users issue it directly
to Secondary VM. COLO will capture shutdown command and after
shutdown request from user.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 include/migration/colo.h |  1 +
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c | 46 +-
 qapi-schema.json |  4 +++-
 vl.c | 19 ---
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2bbff9e..aadd040 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -37,4 +37,5 @@ COLOMode get_colo_mode(void);
 void colo_do_failover(MigrationState *s);
 
 void colo_checkpoint_notify(void *opaque);
+bool colo_handle_shutdown(void);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 16175f7..8054f53 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -49,6 +49,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -56,6 +58,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index a3344ce..c4fc865 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -384,6 +384,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+if (colo_shutdown_requested) {
+colo_send_message(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN,
+  _err);
+if (local_err) {
+error_free(local_err);
+/* Go on the shutdown process and throw the error message */
+error_report("Failed to send shutdown message to SVM");
+}
+qemu_fflush(s->to_dst_file);
+colo_shutdown_requested = 0;
+qemu_system_shutdown_request_core();
+/* Fix me: Just let the colo thread exit ? */
+qemu_thread_exit(0);
+}
+
 ret = 0;
 
 qemu_mutex_lock_iothread();
@@ -449,7 +464,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
-qemu_sem_wait(>colo_checkpoint_sem);
+if (!colo_shutdown_requested) {
+qemu_sem_wait(>colo_checkpoint_sem);
+}
 
 ret = colo_do_checkpoint_transaction(s, bioc, fb);
 if (ret < 0) {
@@ -534,6 +551,16 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 *checkpoint_request = 1;
 break;
+case COLO_MESSAGE_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_system_shutdown_request_core();
+qemu_mutex_unlock_iothread();
+/*
+ * The main thread will be exit and terminate the whole
+ * process, do need some cleanup ?
+ */
+qemu_thread_exit(0);
 default:
 *checkpoint_request = 0;
 error_setg(errp, "Got unknown COLO message: %d", msg);
@@ -696,3 +723,20 @@ out:
 
 return NULL;
 }
+
+bool colo_handle_shutdown(void)
+{
+/*
+ * If VM is in COLO-FT mode, we need do some significant work before
+ * respond to the shutdown request. Besides, Secondary VM will ignore
+ * the shutdown request from users.
+ */
+if (migration_incoming_in_colo_state()) {
+return true;
+}
+if (migration_in_colo_state()) {
+colo_shutdown_requested = 1;
+return true;
+}
+return false;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 01b087f..4b3e1b7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1187,12 +1187,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @guest-shutdown: shutdown requested from PVM to SVM. (Since 2.9)
+#
 # Since: 2.8
 ##
 { 'enum': 'COLOMessage',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
 'vmstate-send', 'vmstate-size', 'vmstate-received',
-'vmstate-loaded' ] }
+'vmstate-loaded', 'guest-shutdown' ] }
 
 ##
 # @COLOMode:
diff --git a/vl.c b/vl.c
index 0b4ed

[Qemu-devel] [PATCH RESEND v2 14/18] COLO: Split qemu_savevm_state_begin out of checkpoint process

2017-04-22 Thread zhanghailiang

It is unnecessary to call qemu_savevm_state_begin() in every checkpoint process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these data
transferring in the subsequent checkpoint.

Cc: Juan Quintela <quint...@redhat.com>
Sgned-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c | 51 ---
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 8e27a4c..66bb5b2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -362,16 +362,6 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
-/* Disable block migration */
-s->params.blk = 0;
-s->params.shared = 0;
-qemu_savevm_state_begin(s->to_dst_file, >params);
-ret = qemu_file_get_error(s->to_dst_file);
-if (ret < 0) {
-error_report("Save VM state begin error");
-goto out;
-}
-
 /* We call this API although this may do nothing on primary side. */
 qemu_mutex_lock_iothread();
 replication_do_checkpoint_all(_err);
@@ -459,6 +449,21 @@ static void colo_compare_notify_checkpoint(Notifier 
*notifier, void *data)
 colo_checkpoint_notify(data);
 }
 
+static int colo_prepare_before_save(MigrationState *s)
+{
+int ret;
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save VM state begin error");
+}
+return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -478,6 +483,11 @@ static void colo_process_checkpoint(MigrationState *s)
 packets_compare_notifier.notify = colo_compare_notify_checkpoint;
 colo_compare_register_notifier(_compare_notifier);
 
+ret = colo_prepare_before_save(s);
+if (ret < 0) {
+goto out;
+}
+
 /*
  * Wait for Secondary finish loading VM states and enter COLO
  * restore.
@@ -628,6 +638,17 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 }
 }
 
+static int colo_prepare_before_load(QEMUFile *f)
+{
+int ret;
+
+ret = qemu_loadvm_state_begin(f);
+if (ret < 0) {
+error_report("Load VM state begin error, ret = %d", ret);
+}
+return ret;
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
@@ -662,6 +683,11 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+ret = colo_prepare_before_load(mis->from_src_file);
+if (ret < 0) {
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 replication_start_all(REPLICATION_MODE_SECONDARY, _err);
 if (local_err) {
@@ -709,11 +735,6 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state_begin(mis->from_src_file);
-if (ret < 0) {
-error_report("Load vm state begin error, ret=%d", ret);
-goto out;
-}
 ret = qemu_loadvm_state_main(mis->from_src_file, mis);
 if (ret < 0) {
 error_report("Load VM's live state (ram) error");
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 12/18] savevm: split the process of different stages for loadvm/savevm

2017-04-22 Thread zhanghailiang

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_loadvm_state_begin(),
qemu_load_device_state() and qemu_savevm_live_state() to achieve
different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public, and simplify the codes of qemu_save_device_state() by calling the
wrapper qemu_savevm_state_header().

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
v2:
 - Use the wrapper qemu_savevm_state_header() to simplify the codes
  of qemu_save_device_state() (Dave's suggestion)
---
 include/sysemu/sysemu.h |  6 ++
 migration/savevm.c  | 54 ++---
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8054f53..0255c4e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
uint64_t *start_list,
uint64_t *length_list);
 
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index f87cd8d..8c2ce0b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -54,6 +54,7 @@
 #include "qemu/cutils.h"
 #include "io/channel-buffer.h"
 #include "io/channel-file.h"
+#include "migration/colo.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -1285,13 +1286,20 @@ done:
 return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
 
+if (!migration_in_colo_state()) {
+qemu_savevm_state_header(f);
+}
 cpu_synchronize_all_states();
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
@@ -1342,8 +1350,6 @@ enum LoadVMExitCodes {
 LOADVM_QUIT =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* -- incoming postcopy messages -- */
 /* 'advise' arrives before any transfers just to tell us that a postcopy
  * *might* happen - it might be skipped if precopy transferred everything
@@ -1957,7 +1963,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 return 0;
 }
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
 uint8_t section_type;
 int ret = 0;
@@ -2095,6 +2101,40 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
 int save_vmstate(Monitor *mon, const char *name)
 {
 BlockDriverState *bs, *bs1;
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 10/18] qmp event: Add COLO_EXIT event to notify users while exited COLO

2017-04-22 Thread zhanghailiang

If some errors happen during VM's COLO FT stage, it's important to
notify the users of this event. Together with 'x_colo_lost_heartbeat',
Users can intervene in COLO's failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster <arm...@redhat.com>
Cc: Michael Roth <mdr...@linux.vnet.ibm.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 migration/colo.c | 19 +++
 qapi-schema.json | 14 ++
 qapi/event.json  | 21 +
 3 files changed, 54 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 9949293..e62da93 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -516,6 +516,18 @@ out:
 qemu_fclose(fb);
 }
 
+/*
+ * There are only two reasons we can go here, some error happened.
+ * Or the user triggered failover.
+ */
+if (failover_get_state() == FAILOVER_STATUS_NONE) {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
+
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(>colo_exit_sem);
 qemu_sem_destroy(>colo_exit_sem);
@@ -757,6 +769,13 @@ out:
 if (local_err) {
 error_report_err(local_err);
 }
+if (failover_get_state() == FAILOVER_STATUS_NONE) {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
 
 if (fb) {
 qemu_fclose(fb);
diff --git a/qapi-schema.json b/qapi-schema.json
index 4b3e1b7..460ca53 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1233,6 +1233,20 @@
   'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
 
 ##
+# @COLOExitReason:
+#
+# The reason for a COLO exit
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.10
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'request', 'error' ] }
+
+##
 # @x-colo-lost-heartbeat:
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index e80f3f4..924bc6f 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -441,6 +441,27 @@
   'data': { 'pass': 'int' } }
 
 ##
+# @COLO_EXIT:
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 2.10
+#
+# Example:
+#
+# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172},
+#  "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } 
}
+#
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
 # @ACPI_DEVICE_OST:
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 18/18] COLO: notify net filters about checkpoint/failover event

2017-04-22 Thread zhanghailiang

Notify all net filters about the checkpoint and failover event.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 migration/colo.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 66bb5b2..62f58c6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,6 +26,7 @@
 #include "qapi-event.h"
 #include "block/block.h"
 #include "replication.h"
+#include "net/filter.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -82,6 +83,11 @@ static void secondary_vm_do_failover(void)
 if (local_err) {
 error_report_err(local_err);
 }
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_FAILOVER, _err);
+if (local_err) {
+error_report_err(local_err);
+}
 
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
@@ -794,6 +800,13 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+/* Notify all filters of all NIC to do checkpoint */
+colo_notify_filters_event(COLO_CHECKPOINT, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 11/18] savevm: split save/find loadvm_handlers entry into two helper functions

2017-04-22 Thread zhanghailiang

COLO's checkpoint process is based on migration process,
everytime we do checkpoint we will repeat the process of savevm and loadvm.

So we will call qemu_loadvm_section_start_full() repeatedly, It will
add all migration sections information into loadvm_handlers list everytime,
which will lead to memory leak.

To fix it, we split the process of saving and finding section entry into two
helper functions, we will check if section info was exist in loadvm_handlers
list before save it.

This modifications have no side effect for normal migration.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/savevm.c | 55 +++---
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 03ae1bd..f87cd8d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1836,6 +1836,37 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
 }
 }
 
+static LoadStateEntry *loadvm_add_section_entry(MigrationIncomingState *mis,
+ SaveStateEntry *se,
+ uint32_t section_id,
+ uint32_t version_id)
+{
+LoadStateEntry *le;
+
+/* Add entry */
+le = g_malloc0(sizeof(*le));
+
+le->se = se;
+le->section_id = section_id;
+le->version_id = version_id;
+QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
+return le;
+}
+
+static LoadStateEntry *loadvm_find_section_entry(MigrationIncomingState *mis,
+ uint32_t section_id)
+{
+LoadStateEntry *le;
+
+QLIST_FOREACH(le, >loadvm_handlers, entry) {
+if (le->section_id == section_id) {
+break;
+}
+}
+
+return le;
+}
+
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
@@ -1878,15 +1909,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-/* Add entry */
-le = g_malloc0(sizeof(*le));
-
-le->se = se;
-le->section_id = section_id;
-le->version_id = version_id;
-QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
-
-ret = vmstate_load(f, le->se, le->version_id);
+ /* Check if we have saved this section info before, if not, save it */
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
+le = loadvm_add_section_entry(mis, se, section_id, version_id);
+}
+ret = vmstate_load(f, se, version_id);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
@@ -1909,12 +1937,9 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 section_id = qemu_get_be32(f);
 
 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, >loadvm_handlers, entry) {
-if (le->section_id == section_id) {
-break;
-}
-}
-if (le == NULL) {
+
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
 error_report("Unknown savevm section %d", section_id);
 return -EINVAL;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH RESEND v2 13/18] COLO: Separate the process of saving/loading ram and device state

2017-04-22 Thread zhanghailiang

We separate the process of saving/loading ram and device state when do
checkpoint. We add new helpers for save/load ram/device. With this change,
we can directly transfer RAM from primary side to secondary side without
using channel-buffer as assistant, which also reduce the size of extra memory
was used during checkpoint.

Besides, we move the colo_flush_ram_cache to the proper position after the
above change.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c   | 49 +++--
 migration/ram.c|  5 -
 migration/savevm.c |  4 
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index e62da93..8e27a4c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -357,11 +357,20 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
-qemu_savevm_state_header(fb);
-qemu_savevm_state_begin(fb, >params);
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save VM state begin error");
+goto out;
+}
 
 /* We call this API although this may do nothing on primary side. */
 qemu_mutex_lock_iothread();
@@ -372,15 +381,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 }
 
 qemu_mutex_lock_iothread();
-qemu_savevm_state_complete_precopy(fb, false);
+/*
+ * Only save VM's live state, which not including device state.
+ * TODO: We may need a timeout mechanism to prevent COLO process
+ * to be blocked here.
+ */
+qemu_savevm_live_state(s->to_dst_file);
+/* Note: device state is saved into buffer */
+ret = qemu_save_device_state(fb);
 qemu_mutex_unlock_iothread();
-
-qemu_fflush(fb);
-
-colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
-if (local_err) {
+if (ret < 0) {
+error_report("Save device state error");
 goto out;
 }
+qemu_fflush(fb);
+
 /*
  * We need the size of the VMstate data in Secondary side,
  * With which we can decide how much data should be read.
@@ -621,6 +636,7 @@ void *colo_process_incoming_thread(void *opaque)
 uint64_t total_size;
 uint64_t value;
 Error *local_err = NULL;
+int ret;
 
 qemu_sem_init(>colo_incoming_sem, 0);
 
@@ -693,6 +709,17 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+ret = qemu_loadvm_state_begin(mis->from_src_file);
+if (ret < 0) {
+error_report("Load vm state begin error, ret=%d", ret);
+goto out;
+}
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+if (ret < 0) {
+error_report("Load VM's live state (ram) error");
+goto out;
+}
+
 value = colo_receive_message_value(mis->from_src_file,
  COLO_MESSAGE_VMSTATE_SIZE, _err);
 if (local_err) {
@@ -726,8 +753,10 @@ void *colo_process_incoming_thread(void *opaque)
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
 vmstate_loading = true;
-if (qemu_loadvm_state(fb) < 0) {
-error_report("COLO: loadvm failed");
+colo_flush_ram_cache();
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_report("COLO: load device state failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
diff --git a/migration/ram.c b/migration/ram.c
index df10d4b..f171a82 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2602,7 +2602,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_state_get() >= POSTCOPY_INCOMING_ADVISE;
-bool need_flush = false;
 
 seq_iter++;
 
@@ -2637,7 +2636,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (migration_incoming_in_colo_state()) {
 host = colo_cache_from_block_offset(block, addr);
-need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);

[Qemu-devel] [PATCH v2 16/18] filter: Add handle_event method for NetFilterClass

2017-04-22 Thread zhanghailiang

Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 include/net/filter.h |  5 +
 net/filter.c | 16 
 net/net.c| 28 
 3 files changed, 49 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 0c4a2ea..df4510d 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -37,6 +37,8 @@ typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
 
 typedef void (FilterStatusChanged) (NetFilterState *nf, Error **errp);
 
+typedef void (FilterHandleEvent) (NetFilterState *nf, int event, Error **errp);
+
 typedef struct NetFilterClass {
 ObjectClass parent_class;
 
@@ -44,6 +46,7 @@ typedef struct NetFilterClass {
 FilterSetup *setup;
 FilterCleanup *cleanup;
 FilterStatusChanged *status_changed;
+FilterHandleEvent *handle_event;
 /* mandatory */
 FilterReceiveIOV *receive_iov;
 } NetFilterClass;
@@ -76,4 +79,6 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 int iovcnt,
 void *opaque);
 
+void colo_notify_filters_event(int event, Error **errp);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index 1dfd2ca..993b35e 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -17,6 +17,7 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
+#include "net/colo.h"
 
 static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
 {
@@ -245,11 +246,26 @@ static void netfilter_finalize(Object *obj)
 g_free(nf->netdev_id);
 }
 
+static void dummy_handle_event(NetFilterState *nf, int event, Error **errp)
+{
+switch (event) {
+case COLO_CHECKPOINT:
+break;
+case COLO_FAILOVER:
+object_property_set_str(OBJECT(nf), "off", "status", errp);
+break;
+default:
+break;
+}
+}
+
 static void netfilter_class_init(ObjectClass *oc, void *data)
 {
 UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
 
 ucc->complete = netfilter_complete;
+nfc->handle_event = dummy_handle_event;
 }
 
 static const TypeInfo netfilter_info = {
diff --git a/net/net.c b/net/net.c
index 0ac3b9e..1373f63 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1373,6 +1373,34 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
 }
 }
 
+void colo_notify_filters_event(int event, Error **errp)
+{
+NetClientState *nc, *peer;
+NetClientDriver type;
+NetFilterState *nf;
+NetFilterClass *nfc = NULL;
+Error *local_err = NULL;
+
+QTAILQ_FOREACH(nc, _clients, next) {
+peer = nc->peer;
+type = nc->info->type;
+if (!peer || type != NET_CLIENT_DRIVER_NIC) {
+continue;
+}
+QTAILQ_FOREACH(nf, >filters, next) {
+nfc =  NETFILTER_GET_CLASS(OBJECT(nf));
+if (!nfc->handle_event) {
+continue;
+}
+nfc->handle_event(nf, event, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+}
+
 void qmp_set_link(const char *name, bool up, Error **errp)
 {
 NetClientState *ncs[MAX_QUEUE_NUM];
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 07/18] COLO: Load dirty pages into SVM's RAM cache firstly

2017-04-22 Thread zhanghailiang

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
---
v2:
- Move colo_init_ram_cache() and colo_release_ram_cache() out of
  incoming thread since both of them need the global lock, if we keep
  colo_release_ram_cache() in incoming thread, there are potential
  dead-lock.
- Remove bool ram_cache_enable flag, use migration_incoming_in_state() instead.
- Remove the Reviewd-by tag because of the above changes.
---
 include/exec/ram_addr.h   |  1 +
 include/migration/migration.h |  4 +++
 migration/migration.c |  6 
 migration/ram.c   | 71 ++-
 4 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c9ddcd0..0b3d77c 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -27,6 +27,7 @@ struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
 uint8_t *host;
+uint8_t *colo_cache; /* For colo, VM's ram cache */
 ram_addr_t offset;
 ram_addr_t used_length;
 ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba1a16c..ba765eb 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -360,4 +360,8 @@ uint64_t ram_pagesize_summary(void);
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 755ea54..7419404 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -441,6 +441,10 @@ static void process_incoming_migration_co(void *opaque)
 error_report_err(local_err);
 exit(EXIT_FAILURE);
 }
+if (colo_init_ram_cache() < 0) {
+error_report("Init ram cache failed");
+exit(EXIT_FAILURE);
+}
 mis->migration_incoming_co = qemu_coroutine_self();
 qemu_thread_create(>colo_incoming_thread, "COLO incoming",
  colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
@@ -449,6 +453,8 @@ static void process_incoming_migration_co(void *opaque)
 
 /* Wait checkpoint incoming thread exit before free resource */
 qemu_thread_join(>colo_incoming_thread);
+/* We hold the global iothread lock, so it is safe here */
+colo_release_ram_cache();
 }
 
 if (ret < 0) {
diff --git a/migration/ram.c b/migration/ram.c
index f48664e..05d1b06 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2265,6 +2265,20 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+if (!offset_in_ramblock(block, offset)) {
+return NULL;
+}
+if (!block->colo_cache) {
+error_report("%s: colo_cache is NULL in block :%s",
+ __func__, block->idstr);
+return NULL;
+}
+return block->colo_cache + offset;
+}
+
 /**
  * ram_handle_compressed: handle the zero page case
  *
@@ -2605,7 +2619,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
-host = host_from_ram_block_offset(block, addr);
+/* After going into COLO, we should load the Page into colo_cache 
*/
+if (migration_incoming_in_colo_state()) {
+host = colo_cache_from_block_offset(block, addr);
+} else {
+host = host_from_ram_block_offset(block, addr);
+}
 if (!host) {
 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
 ret = -EINVAL;
@@ -2712,6 +2731,56 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it is nee

[Qemu-devel] [PATCH RESEND v2 05/18] COLO: Handle shutdown command for VM in COLO state

2017-04-22 Thread zhanghailiang

If VM is in COLO FT state, we need to do some extra works before
starting normal shutdown process.

Secondary VM will ignore the shutdown command if users issue it directly
to Secondary VM. COLO will capture shutdown command and after
shutdown request from user.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 include/migration/colo.h |  1 +
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c | 46 +-
 qapi-schema.json |  4 +++-
 vl.c | 19 ---
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 2bbff9e..aadd040 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -37,4 +37,5 @@ COLOMode get_colo_mode(void);
 void colo_do_failover(MigrationState *s);
 
 void colo_checkpoint_notify(void *opaque);
+bool colo_handle_shutdown(void);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 16175f7..8054f53 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -49,6 +49,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -56,6 +58,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index a3344ce..c4fc865 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -384,6 +384,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+if (colo_shutdown_requested) {
+colo_send_message(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN,
+  _err);
+if (local_err) {
+error_free(local_err);
+/* Go on the shutdown process and throw the error message */
+error_report("Failed to send shutdown message to SVM");
+}
+qemu_fflush(s->to_dst_file);
+colo_shutdown_requested = 0;
+qemu_system_shutdown_request_core();
+/* Fix me: Just let the colo thread exit ? */
+qemu_thread_exit(0);
+}
+
 ret = 0;
 
 qemu_mutex_lock_iothread();
@@ -449,7 +464,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
-qemu_sem_wait(>colo_checkpoint_sem);
+if (!colo_shutdown_requested) {
+qemu_sem_wait(>colo_checkpoint_sem);
+}
 
 ret = colo_do_checkpoint_transaction(s, bioc, fb);
 if (ret < 0) {
@@ -534,6 +551,16 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 *checkpoint_request = 1;
 break;
+case COLO_MESSAGE_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_system_shutdown_request_core();
+qemu_mutex_unlock_iothread();
+/*
+ * The main thread will be exit and terminate the whole
+ * process, do need some cleanup ?
+ */
+qemu_thread_exit(0);
 default:
 *checkpoint_request = 0;
 error_setg(errp, "Got unknown COLO message: %d", msg);
@@ -696,3 +723,20 @@ out:
 
 return NULL;
 }
+
+bool colo_handle_shutdown(void)
+{
+/*
+ * If VM is in COLO-FT mode, we need do some significant work before
+ * respond to the shutdown request. Besides, Secondary VM will ignore
+ * the shutdown request from users.
+ */
+if (migration_incoming_in_colo_state()) {
+return true;
+}
+if (migration_in_colo_state()) {
+colo_shutdown_requested = 1;
+return true;
+}
+return false;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 01b087f..4b3e1b7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1187,12 +1187,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @guest-shutdown: shutdown requested from PVM to SVM. (Since 2.9)
+#
 # Since: 2.8
 ##
 { 'enum': 'COLOMessage',
   'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
 'vmstate-send', 'vmstate-size', 'vmstate-received',
-'vmstate-loaded' ] }
+'vmstate-loaded', 'guest-shutdown' ] }
 
 ##
 # @COLOMode:
diff --git a/vl.c b/vl.c
index 0b4ed

[Qemu-devel] [PATCH RESEND v2 01/18] net/colo: Add notifier/callback related helpers for filter

2017-04-22 Thread zhanghailiang

We will use this notifier to help COLO to notify filter object
to do something, like do checkpoint, or process failover event.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
---
 net/colo.c | 105 +
 net/colo.h |  19 +++
 2 files changed, 124 insertions(+)

diff --git a/net/colo.c b/net/colo.c
index 8cc166b..8aef670 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -15,6 +15,7 @@
 #include "qemu/osdep.h"
 #include "trace.h"
 #include "net/colo.h"
+#include "qapi/error.h"
 
 uint32_t connection_key_hash(const void *opaque)
 {
@@ -209,3 +210,107 @@ Connection *connection_get(GHashTable 
*connection_track_table,
 
 return conn;
 }
+
+static gboolean
+filter_notify_prepare(GSource *source, gint *timeout)
+{
+*timeout = -1;
+
+return FALSE;
+}
+
+static gboolean
+filter_notify_check(GSource *source)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+
+return notify->pfd.revents & (G_IO_IN | G_IO_HUP | G_IO_ERR);
+}
+
+static gboolean
+filter_notify_dispatch(GSource *source,
+   GSourceFunc callback,
+   gpointer user_data)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+int revents;
+uint64_t value;
+int ret;
+
+revents = notify->pfd.revents & notify->pfd.events;
+if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
+ret = filter_notifier_get(notify, );
+if (notify->cb && !ret) {
+notify->cb(notify, value);
+}
+}
+return TRUE;
+}
+
+static void
+filter_notify_finalize(GSource *source)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+
+event_notifier_cleanup(>event);
+}
+
+static GSourceFuncs notifier_source_funcs = {
+filter_notify_prepare,
+filter_notify_check,
+filter_notify_dispatch,
+filter_notify_finalize,
+};
+
+FilterNotifier *filter_notifier_new(FilterNotifierCallback *cb,
+void *opaque, Error **errp)
+{
+FilterNotifier *notify;
+int ret;
+
+notify = (FilterNotifier *)g_source_new(_source_funcs,
+sizeof(FilterNotifier));
+ret = event_notifier_init(>event, false);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to initialize event notifier");
+goto fail;
+}
+notify->pfd.fd = event_notifier_get_fd(>event);
+notify->pfd.events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+notify->cb = cb;
+notify->opaque = opaque;
+g_source_add_poll(>source, >pfd);
+
+return notify;
+
+fail:
+g_source_destroy(>source);
+return NULL;
+}
+
+int filter_notifier_set(FilterNotifier *notify, uint64_t value)
+{
+ssize_t ret;
+
+do {
+ret = write(notify->event.wfd, , sizeof(value));
+} while (ret < 0 && errno == EINTR);
+
+/* EAGAIN is fine, a read must be pending.  */
+if (ret < 0 && errno != EAGAIN) {
+return -errno;
+}
+return 0;
+}
+
+int filter_notifier_get(FilterNotifier *notify, uint64_t *value)
+{
+ssize_t len;
+
+/* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
+do {
+len = read(notify->event.rfd, value, sizeof(*value));
+} while ((len == -1 && errno == EINTR));
+
+return len != sizeof(*value) ? -1 : 0;
+}
diff --git a/net/colo.h b/net/colo.h
index cd9027f..b586db3 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -19,6 +19,7 @@
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
 #include "slirp/tcp.h"
+#include "qemu/event_notifier.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -89,4 +90,22 @@ void connection_hashtable_reset(GHashTable 
*connection_track_table);
 Packet *packet_new(const void *data, int size);
 void packet_destroy(void *opaque, void *user_data);
 
+typedef void FilterNotifierCallback(void *opaque, int value);
+typedef struct FilterNotifier {
+GSource source;
+EventNotifier event;
+GPollFD pfd;
+FilterNotifierCallback *cb;
+void *opaque;
+} FilterNotifier;
+
+FilterNotifier *filter_notifier_new(FilterNotifierCallback *cb,
+void *opaque, Error **errp);
+int filter_notifier_set(FilterNotifier *notify, uint64_t value);
+int filter_notifier_get(FilterNotifier *notify, uint64_t *value);
+
+enum {
+COLO_CHECKPOINT = 2,
+COLO_FAILOVER,
+};
 #endif /* QEMU_COLO_PROXY_H */
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 01/18] net/colo: Add notifier/callback related helpers for filter

2017-04-22 Thread zhanghailiang

We will use this notifier to help COLO to notify filter object
to do something, like do checkpoint, or process failover event.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
---
 net/colo.c | 105 +
 net/colo.h |  19 +++
 2 files changed, 124 insertions(+)

diff --git a/net/colo.c b/net/colo.c
index 8cc166b..8aef670 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -15,6 +15,7 @@
 #include "qemu/osdep.h"
 #include "trace.h"
 #include "net/colo.h"
+#include "qapi/error.h"
 
 uint32_t connection_key_hash(const void *opaque)
 {
@@ -209,3 +210,107 @@ Connection *connection_get(GHashTable 
*connection_track_table,
 
 return conn;
 }
+
+static gboolean
+filter_notify_prepare(GSource *source, gint *timeout)
+{
+*timeout = -1;
+
+return FALSE;
+}
+
+static gboolean
+filter_notify_check(GSource *source)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+
+return notify->pfd.revents & (G_IO_IN | G_IO_HUP | G_IO_ERR);
+}
+
+static gboolean
+filter_notify_dispatch(GSource *source,
+   GSourceFunc callback,
+   gpointer user_data)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+int revents;
+uint64_t value;
+int ret;
+
+revents = notify->pfd.revents & notify->pfd.events;
+if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
+ret = filter_notifier_get(notify, );
+if (notify->cb && !ret) {
+notify->cb(notify, value);
+}
+}
+return TRUE;
+}
+
+static void
+filter_notify_finalize(GSource *source)
+{
+FilterNotifier *notify = (FilterNotifier *)source;
+
+event_notifier_cleanup(>event);
+}
+
+static GSourceFuncs notifier_source_funcs = {
+filter_notify_prepare,
+filter_notify_check,
+filter_notify_dispatch,
+filter_notify_finalize,
+};
+
+FilterNotifier *filter_notifier_new(FilterNotifierCallback *cb,
+void *opaque, Error **errp)
+{
+FilterNotifier *notify;
+int ret;
+
+notify = (FilterNotifier *)g_source_new(_source_funcs,
+sizeof(FilterNotifier));
+ret = event_notifier_init(>event, false);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to initialize event notifier");
+goto fail;
+}
+notify->pfd.fd = event_notifier_get_fd(>event);
+notify->pfd.events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+notify->cb = cb;
+notify->opaque = opaque;
+g_source_add_poll(>source, >pfd);
+
+return notify;
+
+fail:
+g_source_destroy(>source);
+return NULL;
+}
+
+int filter_notifier_set(FilterNotifier *notify, uint64_t value)
+{
+ssize_t ret;
+
+do {
+ret = write(notify->event.wfd, , sizeof(value));
+} while (ret < 0 && errno == EINTR);
+
+/* EAGAIN is fine, a read must be pending.  */
+if (ret < 0 && errno != EAGAIN) {
+return -errno;
+}
+return 0;
+}
+
+int filter_notifier_get(FilterNotifier *notify, uint64_t *value)
+{
+ssize_t len;
+
+/* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
+do {
+len = read(notify->event.rfd, value, sizeof(*value));
+} while ((len == -1 && errno == EINTR));
+
+return len != sizeof(*value) ? -1 : 0;
+}
diff --git a/net/colo.h b/net/colo.h
index cd9027f..b586db3 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -19,6 +19,7 @@
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
 #include "slirp/tcp.h"
+#include "qemu/event_notifier.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -89,4 +90,22 @@ void connection_hashtable_reset(GHashTable 
*connection_track_table);
 Packet *packet_new(const void *data, int size);
 void packet_destroy(void *opaque, void *user_data);
 
+typedef void FilterNotifierCallback(void *opaque, int value);
+typedef struct FilterNotifier {
+GSource source;
+EventNotifier event;
+GPollFD pfd;
+FilterNotifierCallback *cb;
+void *opaque;
+} FilterNotifier;
+
+FilterNotifier *filter_notifier_new(FilterNotifierCallback *cb,
+void *opaque, Error **errp);
+int filter_notifier_set(FilterNotifier *notify, uint64_t value);
+int filter_notifier_get(FilterNotifier *notify, uint64_t *value);
+
+enum {
+COLO_CHECKPOINT = 2,
+COLO_FAILOVER,
+};
 #endif /* QEMU_COLO_PROXY_H */
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 17/18] filter-rewriter: handle checkpoint and failover event

2017-04-22 Thread zhanghailiang

After one round of checkpoint, the states between PVM and SVM
become consistent, so it is unnecessary to adjust the sequence
of net packets for old connections, besides, while failover
happens, filter-rewriter needs to check if it still needs to
adjust sequence of net packets.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/filter-rewriter.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c9a6d43..0a90b11 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -22,6 +22,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/iov.h"
 #include "net/checksum.h"
+#include "net/colo.h"
 
 #define FILTER_COLO_REWRITER(obj) \
 OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
@@ -270,6 +271,43 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
 return 0;
 }
 
+static void reset_seq_offset(gpointer key, gpointer value, gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+conn->offset = 0;
+}
+
+static gboolean offset_is_nonzero(gpointer key,
+  gpointer value,
+  gpointer user_data)
+{
+Connection *conn = (Connection *)value;
+
+return conn->offset ? true : false;
+}
+
+static void colo_rewriter_handle_event(NetFilterState *nf, int event,
+   Error **errp)
+{
+RewriterState *rs = FILTER_COLO_REWRITER(nf);
+
+switch (event) {
+case COLO_CHECKPOINT:
+g_hash_table_foreach(rs->connection_track_table,
+reset_seq_offset, NULL);
+break;
+case COLO_FAILOVER:
+if (!g_hash_table_find(rs->connection_track_table,
+  offset_is_nonzero, NULL)) {
+object_property_set_str(OBJECT(nf), "off", "status", errp);
+}
+break;
+default:
+break;
+}
+}
+
 static void colo_rewriter_cleanup(NetFilterState *nf)
 {
 RewriterState *s = FILTER_COLO_REWRITER(nf);
@@ -299,6 +337,7 @@ static void colo_rewriter_class_init(ObjectClass *oc, void 
*data)
 nfc->setup = colo_rewriter_setup;
 nfc->cleanup = colo_rewriter_cleanup;
 nfc->receive_iov = colo_rewriter_receive_iov;
+nfc->handle_event = colo_rewriter_handle_event;
 }
 
 static const TypeInfo colo_rewriter_info = {
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 02/18] colo-compare: implement the process of checkpoint

2017-04-22 Thread zhanghailiang

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
 net/colo-compare.c | 78 ++
 net/colo-compare.h |  6 +
 2 files changed, 84 insertions(+)
 create mode 100644 net/colo-compare.h

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 97bf0e5..3adccfb 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -29,17 +29,24 @@
 #include "qemu/sockets.h"
 #include "qapi-visit.h"
 #include "net/colo.h"
+#include "net/colo-compare.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+static QTAILQ_HEAD(, CompareState) net_compares =
+   QTAILQ_HEAD_INITIALIZER(net_compares);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
 /* TODO: Should be configurable */
 #define REGULAR_PACKET_CHECK_MS 3000
 
+static QemuMutex event_mtx = { .lock = PTHREAD_MUTEX_INITIALIZER };
+static QemuCond event_complete_cond = { .cond = PTHREAD_COND_INITIALIZER };
+static int event_unhandled_count;
 /*
   + CompareState ++
   |   |
@@ -87,6 +94,10 @@ typedef struct CompareState {
 
 GMainContext *worker_context;
 GMainLoop *compare_loop;
+/* Used for COLO to notify compare to do something */
+FilterNotifier *notifier;
+
+QTAILQ_ENTRY(CompareState) next;
 } CompareState;
 
 typedef struct CompareClass {
@@ -417,6 +428,11 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 while (!g_queue_is_empty(>primary_list) &&
!g_queue_is_empty(>secondary_list)) {
 pkt = g_queue_pop_tail(>primary_list);
+if (!pkt) {
+error_report("colo-compare pop pkt failed");
+return;
+}
+
 switch (conn->ip_proto) {
 case IPPROTO_TCP:
 result = g_queue_find_custom(>secondary_list,
@@ -538,6 +554,53 @@ static gboolean check_old_packet_regular(void *opaque)
 return TRUE;
 }
 
+/* Public API, Used for COLO frame to notify compare event */
+void colo_notify_compares_event(void *opaque, int event, Error **errp)
+{
+CompareState *s;
+int ret;
+
+qemu_mutex_lock(_mtx);
+QTAILQ_FOREACH(s, _compares, next) {
+ret = filter_notifier_set(s->notifier, event);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to write value to eventfd");
+goto fail;
+}
+event_unhandled_count++;
+}
+/* Wait all compare threads to finish handling this event */
+while (event_unhandled_count > 0) {
+qemu_cond_wait(_complete_cond, _mtx);
+}
+
+fail:
+qemu_mutex_unlock(_mtx);
+}
+
+static void colo_flush_packets(void *opaque, void *user_data);
+
+static void colo_compare_handle_event(void *opaque, int event)
+{
+FilterNotifier *notify = opaque;
+CompareState *s = notify->opaque;
+
+switch (event) {
+case COLO_CHECKPOINT:
+g_queue_foreach(>conn_list, colo_flush_packets, s);
+break;
+case COLO_FAILOVER:
+break;
+default:
+break;
+}
+qemu_mutex_lock(_mtx);
+assert(event_unhandled_count > 0);
+event_unhandled_count--;
+qemu_cond_broadcast(_complete_cond);
+qemu_mutex_unlock(_mtx);
+}
+
 static void *colo_compare_thread(void *opaque)
 {
 CompareState *s = opaque;
@@ -558,10 +621,15 @@ static void *colo_compare_thread(void *opaque)
   (GSourceFunc)check_old_packet_regular, s, NULL);
 g_source_attach(timeout_source, s->worker_context);
 
+s->notifier = filter_notifier_new(colo_compare_handle_event, s, NULL);
+g_source_attach(>notifier->source, s->worker_context);
+
 qemu_sem_post(>thread_ready);
 
 g_main_loop_run(s->compare_loop);
 
+g_source_destroy(>notifier->source);
+g_source_unref(>notifier->source);
 g_source_destroy(timeout_source);
 g_source_unref(timeout_source);
 
@@ -706,6 +774,8 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 net_socket_rs_init(>pri_rs, compare_pri_rs_finalize);
 net_socket_rs_init(>sec_rs, compare_sec_rs_finalize);
 
+QTAILQ_INSERT_TAIL(_compares, s, next);
+
 g_queue_init(>conn_list);
 
 s->connection_track_table = g_hash_table_new_full(connection_key_hash,
@@ -765,6 +835,7 @@ static void colo_compare_init(Object *obj)
 static void colo_compare_finalize(Object *obj)
 {
 CompareState *s = COLO_COMPARE(obj);
+CompareState *tmp = NULL;
 
 qemu_chr_f

[Qemu-devel] [PATCH v2 15/18] COLO: flush host dirty ram from cache

2017-04-22 Thread zhanghailiang

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
v2:
 - stop dirty log after exit from COLO state. (Dave)
---
 migration/ram.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index f171a82..7bf3515 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2775,6 +2775,7 @@ int colo_init_ram_cache(void)
 ram_state.ram_bitmap = g_new0(RAMBitmap, 1);
 ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page());
 ram_state.migration_dirty_pages = 0;
+memory_global_dirty_log_start();
 
 return 0;
 
@@ -2798,6 +2799,7 @@ void colo_release_ram_cache(void)
 
 atomic_rcu_set(_state.ram_bitmap, NULL);
 if (bitmap) {
+memory_global_dirty_log_stop();
 call_rcu(bitmap, migration_bitmap_free, rcu);
 }
 
@@ -2822,6 +2824,16 @@ void colo_flush_ram_cache(void)
 void *src_host;
 unsigned long offset = 0;
 
+memory_global_dirty_log_sync();
+qemu_mutex_lock(_state.bitmap_mutex);
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, _list.blocks, next) {
+migration_bitmap_sync_range(_state, block, block->offset,
+block->used_length);
+}
+rcu_read_unlock();
+qemu_mutex_unlock(_state.bitmap_mutex);
+
 trace_colo_flush_ram_cache_begin(ram_state.migration_dirty_pages);
 rcu_read_lock();
 block = QLIST_FIRST_RCU(_list.blocks);
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 13/18] COLO: Separate the process of saving/loading ram and device state

2017-04-22 Thread zhanghailiang

We separate the process of saving/loading ram and device state when do
checkpoint. We add new helpers for save/load ram/device. With this change,
we can directly transfer RAM from primary side to secondary side without
using channel-buffer as assistant, which also reduce the size of extra memory
was used during checkpoint.

Besides, we move the colo_flush_ram_cache to the proper position after the
above change.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c   | 49 +++--
 migration/ram.c|  5 -
 migration/savevm.c |  4 
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index e62da93..8e27a4c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -357,11 +357,20 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
-qemu_savevm_state_header(fb);
-qemu_savevm_state_begin(fb, >params);
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save VM state begin error");
+goto out;
+}
 
 /* We call this API although this may do nothing on primary side. */
 qemu_mutex_lock_iothread();
@@ -372,15 +381,21 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 }
 
 qemu_mutex_lock_iothread();
-qemu_savevm_state_complete_precopy(fb, false);
+/*
+ * Only save VM's live state, which not including device state.
+ * TODO: We may need a timeout mechanism to prevent COLO process
+ * to be blocked here.
+ */
+qemu_savevm_live_state(s->to_dst_file);
+/* Note: device state is saved into buffer */
+ret = qemu_save_device_state(fb);
 qemu_mutex_unlock_iothread();
-
-qemu_fflush(fb);
-
-colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
-if (local_err) {
+if (ret < 0) {
+error_report("Save device state error");
 goto out;
 }
+qemu_fflush(fb);
+
 /*
  * We need the size of the VMstate data in Secondary side,
  * With which we can decide how much data should be read.
@@ -621,6 +636,7 @@ void *colo_process_incoming_thread(void *opaque)
 uint64_t total_size;
 uint64_t value;
 Error *local_err = NULL;
+int ret;
 
 qemu_sem_init(>colo_incoming_sem, 0);
 
@@ -693,6 +709,17 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+ret = qemu_loadvm_state_begin(mis->from_src_file);
+if (ret < 0) {
+error_report("Load vm state begin error, ret=%d", ret);
+goto out;
+}
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+if (ret < 0) {
+error_report("Load VM's live state (ram) error");
+goto out;
+}
+
 value = colo_receive_message_value(mis->from_src_file,
  COLO_MESSAGE_VMSTATE_SIZE, _err);
 if (local_err) {
@@ -726,8 +753,10 @@ void *colo_process_incoming_thread(void *opaque)
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
 vmstate_loading = true;
-if (qemu_loadvm_state(fb) < 0) {
-error_report("COLO: loadvm failed");
+colo_flush_ram_cache();
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_report("COLO: load device state failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
diff --git a/migration/ram.c b/migration/ram.c
index df10d4b..f171a82 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2602,7 +2602,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_state_get() >= POSTCOPY_INCOMING_ADVISE;
-bool need_flush = false;
 
 seq_iter++;
 
@@ -2637,7 +2636,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (migration_incoming_in_colo_state()) {
 host = colo_cache_from_block_offset(block, addr);
-need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);

[Qemu-devel] [PATCH v2 06/18] COLO: Add block replication into colo process

2017-04-22 Thread zhanghailiang

Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Kevin Wolf <kw...@redhat.com>
Cc: Max Reitz <mre...@redhat.com>
Cc: Xie Changlong <xiechanglon...@gmail.com>
---
 migration/colo.c  | 50 ++
 migration/migration.c | 16 
 2 files changed, 66 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index c4fc865..9949293 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -23,6 +23,9 @@
 #include "qmp-commands.h"
 #include "net/colo-compare.h"
 #include "net/colo.h"
+#include "qapi-event.h"
+#include "block/block.h"
+#include "replication.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -57,6 +60,7 @@ static void secondary_vm_do_failover(void)
 {
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
 
 /* Can not do failover during the process of VM's loading VMstate, Or
  * it will break the secondary VM.
@@ -74,6 +78,11 @@ static void secondary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -111,6 +120,7 @@ static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
 int old_state;
+Error *local_err = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
@@ -134,6 +144,13 @@ static void primary_vm_do_failover(void)
  FailoverStatus_lookup[old_state]);
 return;
 }
+
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+local_err = NULL;
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_exit_sem);
 }
@@ -345,6 +362,15 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 s->params.shared = 0;
 qemu_savevm_state_header(fb);
 qemu_savevm_state_begin(fb, >params);
+
+/* We call this API although this may do nothing on primary side. */
+qemu_mutex_lock_iothread();
+replication_do_checkpoint_all(_err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete_precopy(fb, false);
 qemu_mutex_unlock_iothread();
@@ -451,6 +477,12 @@ static void colo_process_checkpoint(MigrationState *s)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_PRIMARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
@@ -554,6 +586,7 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_GUEST_SHUTDOWN:
 qemu_mutex_lock_iothread();
 vm_stop_force_state(RUN_STATE_COLO);
+replication_stop_all(false, NULL);
 qemu_system_shutdown_request_core();
 qemu_mutex_unlock_iothread();
 /*
@@ -602,6 +635,11 @@ void *colo_process_incoming_thread(void *opaque)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
+replication_start_all(REPLICATION_MODE_SECONDARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 vm_start();
 trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
@@ -682,6 +720,18 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+replication_get_error_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 vm_start();
 trace_colo_vm_state_change("stop", "run");
diff --git a/migration/migration.c b/migration/migration.c
index 2ade2aa..755ea54 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -394,6 +394,7 @@ static void process_incoming_migration_c

[Qemu-devel] [PATCH v2 14/18] COLO: Split qemu_savevm_state_begin out of checkpoint process

2017-04-22 Thread zhanghailiang

It is unnecessary to call qemu_savevm_state_begin() in every checkpoint process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these data
transferring in the subsequent checkpoint.

Cc: Juan Quintela <quint...@redhat.com>
Sgned-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c | 51 ---
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 8e27a4c..66bb5b2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -362,16 +362,6 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
-/* Disable block migration */
-s->params.blk = 0;
-s->params.shared = 0;
-qemu_savevm_state_begin(s->to_dst_file, >params);
-ret = qemu_file_get_error(s->to_dst_file);
-if (ret < 0) {
-error_report("Save VM state begin error");
-goto out;
-}
-
 /* We call this API although this may do nothing on primary side. */
 qemu_mutex_lock_iothread();
 replication_do_checkpoint_all(_err);
@@ -459,6 +449,21 @@ static void colo_compare_notify_checkpoint(Notifier 
*notifier, void *data)
 colo_checkpoint_notify(data);
 }
 
+static int colo_prepare_before_save(MigrationState *s)
+{
+int ret;
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save VM state begin error");
+}
+return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -478,6 +483,11 @@ static void colo_process_checkpoint(MigrationState *s)
 packets_compare_notifier.notify = colo_compare_notify_checkpoint;
 colo_compare_register_notifier(_compare_notifier);
 
+ret = colo_prepare_before_save(s);
+if (ret < 0) {
+goto out;
+}
+
 /*
  * Wait for Secondary finish loading VM states and enter COLO
  * restore.
@@ -628,6 +638,17 @@ static void colo_wait_handle_message(QEMUFile *f, int 
*checkpoint_request,
 }
 }
 
+static int colo_prepare_before_load(QEMUFile *f)
+{
+int ret;
+
+ret = qemu_loadvm_state_begin(f);
+if (ret < 0) {
+error_report("Load VM state begin error, ret = %d", ret);
+}
+return ret;
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
@@ -662,6 +683,11 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+ret = colo_prepare_before_load(mis->from_src_file);
+if (ret < 0) {
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 replication_start_all(REPLICATION_MODE_SECONDARY, _err);
 if (local_err) {
@@ -709,11 +735,6 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state_begin(mis->from_src_file);
-if (ret < 0) {
-error_report("Load vm state begin error, ret=%d", ret);
-goto out;
-}
 ret = qemu_loadvm_state_main(mis->from_src_file, mis);
 if (ret < 0) {
 error_report("Load VM's live state (ram) error");
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 11/18] savevm: split save/find loadvm_handlers entry into two helper functions

2017-04-22 Thread zhanghailiang

COLO's checkpoint process is based on migration process,
everytime we do checkpoint we will repeat the process of savevm and loadvm.

So we will call qemu_loadvm_section_start_full() repeatedly, It will
add all migration sections information into loadvm_handlers list everytime,
which will lead to memory leak.

To fix it, we split the process of saving and finding section entry into two
helper functions, we will check if section info was exist in loadvm_handlers
list before save it.

This modifications have no side effect for normal migration.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/savevm.c | 55 +++---
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 03ae1bd..f87cd8d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1836,6 +1836,37 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
 }
 }
 
+static LoadStateEntry *loadvm_add_section_entry(MigrationIncomingState *mis,
+ SaveStateEntry *se,
+ uint32_t section_id,
+ uint32_t version_id)
+{
+LoadStateEntry *le;
+
+/* Add entry */
+le = g_malloc0(sizeof(*le));
+
+le->se = se;
+le->section_id = section_id;
+le->version_id = version_id;
+QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
+return le;
+}
+
+static LoadStateEntry *loadvm_find_section_entry(MigrationIncomingState *mis,
+ uint32_t section_id)
+{
+LoadStateEntry *le;
+
+QLIST_FOREACH(le, >loadvm_handlers, entry) {
+if (le->section_id == section_id) {
+break;
+}
+}
+
+return le;
+}
+
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
@@ -1878,15 +1909,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-/* Add entry */
-le = g_malloc0(sizeof(*le));
-
-le->se = se;
-le->section_id = section_id;
-le->version_id = version_id;
-QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
-
-ret = vmstate_load(f, le->se, le->version_id);
+ /* Check if we have saved this section info before, if not, save it */
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
+le = loadvm_add_section_entry(mis, se, section_id, version_id);
+}
+ret = vmstate_load(f, se, version_id);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
@@ -1909,12 +1937,9 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 section_id = qemu_get_be32(f);
 
 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, >loadvm_handlers, entry) {
-if (le->section_id == section_id) {
-break;
-}
-}
-if (le == NULL) {
+
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
 error_report("Unknown savevm section %d", section_id);
 return -EINVAL;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 12/18] savevm: split the process of different stages for loadvm/savevm

2017-04-22 Thread zhanghailiang

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_loadvm_state_begin(),
qemu_load_device_state() and qemu_savevm_live_state() to achieve
different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public, and simplify the codes of qemu_save_device_state() by calling the
wrapper qemu_savevm_state_header().

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
v2:
 - Use the wrapper qemu_savevm_state_header() to simplify the codes
  of qemu_save_device_state() (Dave's suggestion)
---
 include/sysemu/sysemu.h |  6 ++
 migration/savevm.c  | 54 ++---
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8054f53..0255c4e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,7 +132,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
uint64_t *start_list,
uint64_t *length_list);
 
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index f87cd8d..8c2ce0b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -54,6 +54,7 @@
 #include "qemu/cutils.h"
 #include "io/channel-buffer.h"
 #include "io/channel-file.h"
+#include "migration/colo.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -1285,13 +1286,20 @@ done:
 return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
 
+if (!migration_in_colo_state()) {
+qemu_savevm_state_header(f);
+}
 cpu_synchronize_all_states();
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
@@ -1342,8 +1350,6 @@ enum LoadVMExitCodes {
 LOADVM_QUIT =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* -- incoming postcopy messages -- */
 /* 'advise' arrives before any transfers just to tell us that a postcopy
  * *might* happen - it might be skipped if precopy transferred everything
@@ -1957,7 +1963,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 return 0;
 }
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
 uint8_t section_type;
 int ret = 0;
@@ -2095,6 +2101,40 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
 int save_vmstate(Monitor *mon, const char *name)
 {
 BlockDriverState *bs, *bs1;
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 09/18] COLO: Flush memory data from ram cache

2017-04-22 Thread zhanghailiang

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/ram.c   | 40 
 migration/trace-events|  2 ++
 3 files changed, 43 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index ba765eb..2aa7654 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -364,4 +364,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_flush_ram_cache(void);
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 0653a24..df10d4b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2602,6 +2602,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
 /* ADVISE is earlier, it shows the source has the postcopy capability on */
 bool postcopy_advised = postcopy_state_get() >= POSTCOPY_INCOMING_ADVISE;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -2636,6 +2637,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (migration_incoming_in_colo_state()) {
 host = colo_cache_from_block_offset(block, addr);
+need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);
 }
@@ -2742,6 +2744,10 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 wait_for_decompress_done();
 rcu_read_unlock();
 trace_ram_load_complete(ret, seq_iter);
+
+if (!ret  && ram_cache_enable && need_flush) {
+colo_flush_ram_cache();
+}
 return ret;
 }
 
@@ -2810,6 +2816,40 @@ void colo_release_ram_cache(void)
 rcu_read_unlock();
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+void colo_flush_ram_cache(void)
+{
+RAMBlock *block = NULL;
+void *dst_host;
+void *src_host;
+unsigned long offset = 0;
+
+trace_colo_flush_ram_cache_begin(ram_state.migration_dirty_pages);
+rcu_read_lock();
+block = QLIST_FIRST_RCU(_list.blocks);
+
+while (block) {
+offset = migration_bitmap_find_dirty(_state, block, offset);
+migration_bitmap_clear_dirty(_state, block, offset);
+
+if (offset << TARGET_PAGE_BITS >= block->used_length) {
+offset = 0;
+block = QLIST_NEXT_RCU(block, next);
+} else {
+dst_host = block->host + (offset << TARGET_PAGE_BITS);
+src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
+memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+}
+}
+
+rcu_read_unlock();
+trace_colo_flush_ram_cache_end();
+assert(ram_state.migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
 .save_live_setup = ram_save_setup,
 .save_live_iterate = ram_save_iterate,
diff --git a/migration/trace-events b/migration/trace-events
index b8f01a2..93f4337 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -72,6 +72,8 @@ ram_discard_range(const char *rbname, uint64_t start, size_t 
len) "%s: start: %"
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 
%zx len: %zx"
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 00/18] COLO: integrate colo frame with block replication and net compare

2017-04-22 Thread zhanghailiang

Hi,

COLO Frame, block replication and COLO net compare have been exist in qemu
for long time, it's time to integrate these three parts to make COLO really 
works.

In this series, we have some optimizations for COLO frame, including separating 
the
process of saving ram and device state, using an COLO_EXIT event to notify 
users that
VM exits COLO, for these parts, most of them have been reviewed long time ago 
in old version,
but since this series have just rebased on upstream which had merged a new 
series of migration,
parts of pathes in this series deserve review again.

We use notifier/callback method for COLO compare to notify COLO frame about
net packets inconsistent event, and add a handle_event method for 
NetFilterClass to
help COLO frame to notify filters and colo-compare about checkpoint/failover 
event, 
it is flexible.

Besides, this series is on top of '[PATCH 0/3] colo-compare: fix three bugs' 
series.

For the neweset version, please refer to:
https://github.com/coloft/qemu/tree/colo-for-qemu-2.10-2017-4-22

Please review, thanks.

Cc: Dong eddie <eddie.d...@intel.com>
Cc: Jiang yunhong <yunhong.ji...@intel.com>
Cc: Xu Quan <xuqu...@huawei.com>
Cc: Jason Wang <jasow...@redhat.com> 

zhanghailiang (18):
  net/colo: Add notifier/callback related helpers for filter
  colo-compare: implement the process of checkpoint
  colo-compare: use notifier to notify packets comparing result
  COLO: integrate colo compare with colo frame
  COLO: Handle shutdown command for VM in COLO state
  COLO: Add block replication into colo process
  COLO: Load dirty pages into SVM's RAM cache firstly
  ram/COLO: Record the dirty pages that SVM received
  COLO: Flush memory data from ram cache
  qmp event: Add COLO_EXIT event to notify users while exited COLO
  savevm: split save/find loadvm_handlers entry into two helper
functions
  savevm: split the process of different stages for loadvm/savevm
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  COLO: flush host dirty ram from cache
  filter: Add handle_event method for NetFilterClass
  filter-rewriter: handle checkpoint and failover event
  COLO: notify net filters about checkpoint/failover event

 include/exec/ram_addr.h   |   1 +
 include/migration/colo.h  |   1 +
 include/migration/migration.h |   5 +
 include/net/filter.h  |   5 +
 include/sysemu/sysemu.h   |   9 ++
 migration/colo.c  | 242 +++---
 migration/migration.c |  24 -
 migration/ram.c   | 147 -
 migration/savevm.c| 113 
 migration/trace-events|   2 +
 net/colo-compare.c| 110 ++-
 net/colo-compare.h|   8 ++
 net/colo.c| 105 ++
 net/colo.h|  19 
 net/filter-rewriter.c |  39 +++
 net/filter.c  |  16 +++
 net/net.c |  28 +
 qapi-schema.json  |  18 +++-
 qapi/event.json   |  21 
 vl.c  |  19 +++-
 20 files changed, 886 insertions(+), 46 deletions(-)
 create mode 100644 net/colo-compare.h

-- 
1.8.3.1

[Qemu-devel] [PATCH v2 08/18] ram/COLO: Record the dirty pages that SVM received

2017-04-22 Thread zhanghailiang

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

Cc: Juan Quintela <quint...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/ram.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 05d1b06..0653a24 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2268,6 +2268,9 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
  ram_addr_t offset)
 {
+unsigned long *bitmap;
+long k;
+
 if (!offset_in_ramblock(block, offset)) {
 return NULL;
 }
@@ -2276,6 +2279,17 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
  __func__, block->idstr);
 return NULL;
 }
+
+k = (memory_region_get_ram_addr(block->mr) + offset) >> TARGET_PAGE_BITS;
+bitmap = atomic_rcu_read(_state.ram_bitmap)->bmap;
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+if (!test_and_set_bit(k, bitmap)) {
+ram_state.migration_dirty_pages++;
+}
 return block->colo_cache + offset;
 }
 
@@ -2752,6 +2766,15 @@ int colo_init_ram_cache(void)
 memcpy(block->colo_cache, block->host, block->used_length);
 }
 rcu_read_unlock();
+/*
+* Record the dirty pages that sent by PVM, we use this dirty bitmap 
together
+* with to decide which page in cache should be flushed into SVM's RAM. Here
+* we use the same name 'ram_bitmap' as for migration.
+*/
+ram_state.ram_bitmap = g_new0(RAMBitmap, 1);
+ram_state.ram_bitmap->bmap = bitmap_new(last_ram_page());
+ram_state.migration_dirty_pages = 0;
+
 return 0;
 
 out_locked:
@@ -2770,6 +2793,12 @@ out_locked:
 void colo_release_ram_cache(void)
 {
 RAMBlock *block;
+RAMBitmap *bitmap = ram_state.ram_bitmap;
+
+atomic_rcu_set(_state.ram_bitmap, NULL);
+if (bitmap) {
+call_rcu(bitmap, migration_bitmap_free, rcu);
+}
 
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, _list.blocks, next) {
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 04/18] COLO: integrate colo compare with colo frame

2017-04-22 Thread zhanghailiang

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
 migration/colo.c  | 42 --
 migration/migration.c |  2 +-
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c19eb3f..a3344ce 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,8 +21,11 @@
 #include "migration/failover.h"
 #include "replication.h"
 #include "qmp-commands.h"
+#include "net/colo-compare.h"
+#include "net/colo.h"
 
 static bool vmstate_loading;
+static Notifier packets_compare_notifier;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -332,6 +335,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, _err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
@@ -390,6 +398,11 @@ out:
 return ret;
 }
 
+static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
+{
+colo_checkpoint_notify(data);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QIOChannelBuffer *bioc;
@@ -406,6 +419,9 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
+packets_compare_notifier.notify = colo_compare_notify_checkpoint;
+colo_compare_register_notifier(_compare_notifier);
+
 /*
  * Wait for Secondary finish loading VM states and enter COLO
  * restore.
@@ -451,11 +467,21 @@ out:
 qemu_fclose(fb);
 }
 
-timer_del(s->colo_delay_timer);
-
 /* Hope this not to be too long to wait here */
 qemu_sem_wait(>colo_exit_sem);
 qemu_sem_destroy(>colo_exit_sem);
+
+/*
+ * It is safe to unregister notifier after failover finished.
+ * Besides, colo_delay_timer and colo_checkpoint_sem can't be
+ * released befor unregister notifier, or there will be use-after-free
+ * error.
+ */
+colo_compare_unregister_notifier(_compare_notifier);
+timer_del(s->colo_delay_timer);
+timer_free(s->colo_delay_timer);
+qemu_sem_destroy(>colo_checkpoint_sem);
+
 /*
  * Must be called after failover BH is completed,
  * Or the failover BH may shutdown the wrong fd that
@@ -548,6 +574,11 @@ void *colo_process_incoming_thread(void *opaque)
 fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
 object_unref(OBJECT(bioc));
 
+qemu_mutex_lock_iothread();
+vm_start();
+trace_colo_vm_state_change("stop", "run");
+qemu_mutex_unlock_iothread();
+
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
   _err);
 if (local_err) {
@@ -567,6 +598,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+trace_colo_vm_state_change("run", "stop");
+qemu_mutex_unlock_iothread();
+
 /* FIXME: This is unnecessary for periodic checkpoint mode */
 colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
  _err);
@@ -620,6 +656,8 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 vmstate_loading = false;
+vm_start();
+trace_colo_vm_state_change("stop", "run");
 qemu_mutex_unlock_iothread();
 
 if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
diff --git a/migration/migration.c b/migration/migration.c
index 353f272..2ade2aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -70,7 +70,7 @@
 /* The delay time (in ms) between two COLO checkpoints
  * Note: Please change this default value to 1 when we support hybrid mode.
  */
-#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/3] colo-compare: fix three bugs

2017-04-20 Thread zhanghailiang

Hi, 

This series fixes three bugs found in our test, please review.

Thanks.

zhanghailiang (3):
  colo-compare: serialize compare thread's initialization with main
thread
  colo-compare: Check main_loop value before call g_main_loop_quit
  colo-compare: fix a memory leak

 net/colo-compare.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 3/3] colo-compare: fix a memory leak

2017-04-20 Thread zhanghailiang

g_timeout_source_new() will initialize GSource's reference count to 1,
and g_source_attach() will increase the count by 1, so it will not be
enough to call just g_source_unref() which only reduce the value by 1.
It will lead to memory leak.

We need to call g_source_destroy() before g_source_unref().

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo-compare.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index d6a5e4c..97bf0e5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -562,7 +562,9 @@ static void *colo_compare_thread(void *opaque)
 
 g_main_loop_run(s->compare_loop);
 
+g_source_destroy(timeout_source);
 g_source_unref(timeout_source);
+
 g_main_loop_unref(s->compare_loop);
 g_main_context_unref(s->worker_context);
 return NULL;
-- 
1.8.3.1

[Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread

2017-04-20 Thread zhanghailiang

We call qemu_chr_fe_set_handlers() in colo-compare thread, it is used
to detach watched fd from default main context, so it has chance to
handle the same watched fd with main thread concurrently, which will
trigger an error report:
"qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src == ((void *)0)' 
failed."

Fix it by serializing compare thread's initialization with main thread.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo-compare.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 54e6d40..a6bf419 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -83,6 +83,7 @@ typedef struct CompareState {
 GHashTable *connection_track_table;
 /* compare thread, a thread for each NIC */
 QemuThread thread;
+QemuSemaphore thread_ready;
 
 GMainContext *worker_context;
 GMainLoop *compare_loop;
@@ -557,6 +558,8 @@ static void *colo_compare_thread(void *opaque)
   (GSourceFunc)check_old_packet_regular, s, NULL);
 g_source_attach(timeout_source, s->worker_context);
 
+qemu_sem_post(>thread_ready);
+
 g_main_loop_run(s->compare_loop);
 
 g_source_unref(timeout_source);
@@ -707,12 +710,15 @@ static void colo_compare_complete(UserCreatable *uc, 
Error **errp)
   connection_key_equal,
   g_free,
   connection_destroy);
+qemu_sem_init(>thread_ready, 0);
 
 sprintf(thread_name, "colo-compare %d", compare_id);
 qemu_thread_create(>thread, thread_name,
colo_compare_thread, s,
QEMU_THREAD_JOINABLE);
 compare_id++;
+qemu_sem_wait(>thread_ready);
+qemu_sem_destroy(>thread_ready);
 
 return;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/3] colo-compare: Check main_loop value before call g_main_loop_quit

2017-04-20 Thread zhanghailiang

If some errors happen before main_loop is initialized in colo
compare thread, qemu will go into finalizing process where
we call g_main_loop_quit(s->main_loop), if main_loop is NULL, there
will be an error report:
"(process:14861): GLib-CRITICAL **: g_main_loop_quit: assertion 'loop != NULL' 
failed".

We need to check if main_loop is NULL or not before call g_main_loop_quit().

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo-compare.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index a6bf419..d6a5e4c 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -770,7 +770,9 @@ static void colo_compare_finalize(Object *obj)
  s->worker_context, true);
 qemu_chr_fe_deinit(>chr_out);
 
-g_main_loop_quit(s->compare_loop);
+if (s->compare_loop) {
+g_main_loop_quit(s->compare_loop);
+}
 qemu_thread_join(>thread);
 
 /* Release all unhandled packets after compare thead exited */
-- 
1.8.3.1

[Qemu-devel] [PATCH v3] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-18 Thread zhanghailiang

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
 g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource to 
handle.
We tried to fix the related codes in commit 
b43decb015a6efeb9e3cdbdb80f6547ad7248a4c,
but it didn't fix the bug completely, because we still have some codes didn't 
pass
*right* context parameter for remove_fd_in_watch().

Let's fix it by record the GSource directly instead of fd_in_tag.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com>
---
v3:
 - Use the full hash (Marc-André Lureau)
 - Use a simply name 'gsource' instead of 'chr_gsource' (Marc-André Lureau)
---
 chardev/char-fd.c |  8 
 chardev/char-io.c | 23 ---
 chardev/char-io.h |  4 ++--
 chardev/char-pty.c|  6 +++---
 chardev/char-socket.c |  8 
 chardev/char-udp.c|  8 
 chardev/char.c|  2 +-
 include/sysemu/char.h |  2 +-
 8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..0b182c5 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition 
cond, void *opaque)
 ret = qio_channel_read(
 chan, (gchar *)buf, len, NULL);
 if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 return FALSE;
 }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
 {
 FDChardev *s = FD_CHARDEV(chr);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->gsource = io_add_watch_poll(chr, s->ioc_in,
fd_chr_read_poll,
fd_chr_read, chr,
context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
 Chardev *chr = CHARDEV(obj);
 FDChardev *s = FD_CHARDEV(obj);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
 object_unref(OBJECT(s->ioc_in));
 }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..b5708ee 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
 .finalize = io_watch_poll_finalize,
 };
 
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,
 IOCanReadHandler *fd_can_read,
 QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
 GMainContext *context)
 {
 IOWatchPoll *iwp;
-int tag;
 char *name;
 
 iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,
@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
 g_source_set_name((GSource *)iwp, name);
 g_free(name);
 
-tag = g_source_attach(>parent, context);
+g_source_attach(>parent, context);
 g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
 }
 
-static void io_remove_watch_poll(guint tag, GMainContext *context)
+static void io_remove_watch_poll(GSource *source)
 {
-GSource *source;
 IOWatchPoll *iwp;
 
-g_return_if_fail(tag > 0);
-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
 iwp = io_watch_poll_from_source(source);
 if (iwp->src) {
 g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag, GMainContext 
*context)
 g_source_destroy(>parent);
 }
 
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)
+void remove_fd_in_watch(Chardev *chr)
 {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;
+if (chr->gsource) {
+io_remove_watch_poll(chr->gsource);
+chr->gsource = NULL;
 }
 }
 
diff --git a/chardev/char-io.h b/chardev/char-io.h
index 842be56..55973a7 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -29,14 +29,14 @@
 #include "sysemu/char.h"
 
 /* Can only be used for read */
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,

[Qemu-devel] [PATCH v2] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-17 Thread zhanghailiang

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
 g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tried to fix the related codes in commit b43dec, but it didn't
fix the bug completely, because we still have some codes didn't pass *right*
context parameter for remove_fd_in_watch().

Let's fix it by record the GSource directly instead of fd_in_tag.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
v2:
- Fix minor comments from Marc-André Lureau
---
 chardev/char-fd.c |  8 
 chardev/char-io.c | 23 ---
 chardev/char-io.h |  4 ++--
 chardev/char-pty.c|  6 +++---
 chardev/char-socket.c |  8 
 chardev/char-udp.c|  8 
 chardev/char.c|  2 +-
 include/sysemu/char.h |  2 +-
 8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..7f0169d 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition 
cond, void *opaque)
 ret = qio_channel_read(
 chan, (gchar *)buf, len, NULL);
 if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 return FALSE;
 }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
 {
 FDChardev *s = FD_CHARDEV(chr);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
fd_chr_read_poll,
fd_chr_read, chr,
context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
 Chardev *chr = CHARDEV(obj);
 FDChardev *s = FD_CHARDEV(obj);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
 object_unref(OBJECT(s->ioc_in));
 }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..d781ad6 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
 .finalize = io_watch_poll_finalize,
 };
 
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,
 IOCanReadHandler *fd_can_read,
 QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
 GMainContext *context)
 {
 IOWatchPoll *iwp;
-int tag;
 char *name;
 
 iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,
@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
 g_source_set_name((GSource *)iwp, name);
 g_free(name);
 
-tag = g_source_attach(>parent, context);
+g_source_attach(>parent, context);
 g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
 }
 
-static void io_remove_watch_poll(guint tag, GMainContext *context)
+static void io_remove_watch_poll(GSource *source)
 {
-GSource *source;
 IOWatchPoll *iwp;
 
-g_return_if_fail(tag > 0);
-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
 iwp = io_watch_poll_from_source(source);
 if (iwp->src) {
 g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag, GMainContext 
*context)
 g_source_destroy(>parent);
 }
 
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)
+void remove_fd_in_watch(Chardev *chr)
 {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;
+if (chr->chr_gsource) {
+io_remove_watch_poll(chr->chr_gsource);
+chr->chr_gsource = NULL;
 }
 }
 
diff --git a/chardev/char-io.h b/chardev/char-io.h
index 842be56..55973a7 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -29,14 +29,14 @@
 #include "sysemu/char.h"
 
 /* Can only be used for read */
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,
 IOCanReadHandler *fd_can_read,
 QIOChannelFunc fd_read,
 gpointer user_data,
 GMainConte

[Qemu-devel] [PATCH] char: Fix removing wrong GSource that be found by fd_in_tag

2017-04-13 Thread zhanghailiang

We use fd_in_tag to find a GSource, fd_in_tag is return value of
g_source_attach(GSource *source, GMainContext *context), the return
value is unique only in the same context, so we may get the same
values with different 'context' parameters.

It is no problem to find the right fd_in_tag by using
 g_main_context_find_source_by_id(GMainContext *context, guint source_id)
while there is only one default main context.

But colo-compare tries to create/use its own context, and if we pass wrong
'context' parameter with right fd_in_tag, we will find a wrong GSource
to handle. We tied to fix the related codes in commit b43dec, but it didn't
fix the bug completely, because we still have some codes didn't pass *right*
context parameter for remove_fd_in_watch().

Let's fix it by record the GSource directly instead of fd_in_tag.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 chardev/char-fd.c |  8 
 chardev/char-io.c | 23 ---
 chardev/char-io.h |  4 ++--
 chardev/char-pty.c|  6 +++---
 chardev/char-socket.c |  8 
 chardev/char-udp.c|  8 
 chardev/char.c|  2 +-
 include/sysemu/char.h |  2 +-
 8 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/chardev/char-fd.c b/chardev/char-fd.c
index 548dd4c..7f0169d 100644
--- a/chardev/char-fd.c
+++ b/chardev/char-fd.c
@@ -58,7 +58,7 @@ static gboolean fd_chr_read(QIOChannel *chan, GIOCondition 
cond, void *opaque)
 ret = qio_channel_read(
 chan, (gchar *)buf, len, NULL);
 if (ret == 0) {
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 return FALSE;
 }
@@ -89,9 +89,9 @@ static void fd_chr_update_read_handler(Chardev *chr,
 {
 FDChardev *s = FD_CHARDEV(chr);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
-chr->fd_in_tag = io_add_watch_poll(chr, s->ioc_in,
+chr->chr_gsource = io_add_watch_poll(chr, s->ioc_in,
fd_chr_read_poll,
fd_chr_read, chr,
context);
@@ -103,7 +103,7 @@ static void char_fd_finalize(Object *obj)
 Chardev *chr = CHARDEV(obj);
 FDChardev *s = FD_CHARDEV(obj);
 
-remove_fd_in_watch(chr, NULL);
+remove_fd_in_watch(chr);
 if (s->ioc_in) {
 object_unref(OBJECT(s->ioc_in));
 }
diff --git a/chardev/char-io.c b/chardev/char-io.c
index b4bb094..6deb193 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -98,7 +98,7 @@ static GSourceFuncs io_watch_poll_funcs = {
 .finalize = io_watch_poll_finalize,
 };
 
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,
 IOCanReadHandler *fd_can_read,
 QIOChannelFunc fd_read,
@@ -106,7 +106,6 @@ guint io_add_watch_poll(Chardev *chr,
 GMainContext *context)
 {
 IOWatchPoll *iwp;
-int tag;
 char *name;
 
 iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs,
@@ -122,21 +121,15 @@ guint io_add_watch_poll(Chardev *chr,
 g_source_set_name((GSource *)iwp, name);
 g_free(name);
 
-tag = g_source_attach(>parent, context);
+g_source_attach(>parent, context);
 g_source_unref(>parent);
-return tag;
+return (GSource *)iwp;
 }
 
-static void io_remove_watch_poll(guint tag, GMainContext *context)
+static void io_remove_watch_poll(GSource *source)
 {
-GSource *source;
 IOWatchPoll *iwp;
 
-g_return_if_fail(tag > 0);
-
-source = g_main_context_find_source_by_id(context, tag);
-g_return_if_fail(source != NULL);
-
 iwp = io_watch_poll_from_source(source);
 if (iwp->src) {
 g_source_destroy(iwp->src);
@@ -146,11 +139,11 @@ static void io_remove_watch_poll(guint tag, GMainContext 
*context)
 g_source_destroy(>parent);
 }
 
-void remove_fd_in_watch(Chardev *chr, GMainContext *context)
+void remove_fd_in_watch(Chardev *chr)
 {
-if (chr->fd_in_tag) {
-io_remove_watch_poll(chr->fd_in_tag, context);
-chr->fd_in_tag = 0;
+if (chr->chr_gsource) {
+io_remove_watch_poll(chr->chr_gsource);
+chr->chr_gsource = 0;
 }
 }
 
diff --git a/chardev/char-io.h b/chardev/char-io.h
index 842be56..55973a7 100644
--- a/chardev/char-io.h
+++ b/chardev/char-io.h
@@ -29,14 +29,14 @@
 #include "sysemu/char.h"
 
 /* Can only be used for read */
-guint io_add_watch_poll(Chardev *chr,
+GSource *io_add_watch_poll(Chardev *chr,
 QIOChannel *ioc,
 IOCanReadHandler *fd_can_read,
 QIOChannelFunc fd_read,
 gpointer user_data,
 GMainContext *context);
 
-void remove_fd_in_watch(Chardev

[Qemu-devel] 答复: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-13 Thread Zhanghailiang

Hi,

-邮件原件-
发件人: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] 代表 Xiao 
Guangrong
发送时间: 2017年4月13日 16:53
收件人: Paolo Bonzini; m...@redhat.com; mtosa...@redhat.com
抄送: qemu-devel@nongnu.org; k...@vger.kernel.org; yunfang...@tencent.com; Xiao 
Guangrong
主题: Re: [PATCH 0/5] mc146818rtc: fix Windows VM clock faster



On 04/13/2017 04:39 PM, Xiao Guangrong wrote:
>
>
> On 04/13/2017 02:37 PM, Paolo Bonzini wrote:
>>
>>
>> On 12/04/2017 17:51, guangrong.x...@gmail.com wrote:
>>> The root cause is that the clock will be lost if the periodic period 
>>> is changed as currently code counts the next periodic time like this:
>>>   next_irq_clock = (cur_clock & ~(period - 1)) + period;
>>>
>>> consider the case if cur_clock = 0x11FF and period = 0x100, then the 
>>> next_irq_clock is 0x1200, however, there is only 1 clock left to 
>>> trigger the next irq. Unfortunately, Windows guests (at least 
>>> Windows7) change the period very frequently if it runs the attached 
>>> code, so that the lost clock is accumulated, the wall-time become 
>>> faster and faster
>>
>> Very interesting.
>>
>
> Yes, indeed.
>
>> However, I think that the above should be exactly how the RTC should 
>> work.  The original RTC circuit had 22 divider stages (see page 13 of 
>> the datasheet[1], at the bottom right), and the periodic interrupt 
>> taps the rising edge of one of the dividers (page 16, second 
>> paragraph).  The datasheet also never mentions a comparator being 
>> used to trigger the periodic interrupts.
>>
>
> That was my thought before, however, after more test, i am not sure if 
> re-configuring RegA changes these divider stages internal...
>
>> Have you checked that this Windows bug doesn't happen on real 
>> hardware too?  Or is the combination of driftfix=slew and changing 
>> periods that is a problem?
>>
>
> I have two physical windows 7 machines, both of them have 
> 'useplatformclock = off' and ntp disabled, the wall time is really 
> accurate. The difference is that the physical machines are using Intel
> Q87 LPC chipset which is mc146818rtc compatible. However, on VM, the 
> issue is easily be reproduced just in ~10 mins.
>
> Our test mostly focus on 'driftfix=slew' and after this patchset the 
> time is accurate and stable.
>
> I will do the test for dropping 'slew' and see what will happen...
>

> Well, the time is easily observed to be faster if 'driftfix=slew' is not 
> used. :(

You mean, it only fixes the one case which with the ' driftfix=slew ' is used ?
We encountered this problem too, I have tried to fix it long time ago.  
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06937.html.
(It seems that your solution is more useful)
But it seems that it is impossible to fix, we need to emulate the behaviors of 
real hardware, 
but we didn't find any clear description about it. And it seems that other 
virtualization platforms
have this problem too:
VMware:
https://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
Heper-v:
https://blogs.msdn.microsoft.com/virtual_pc_guy/2010/11/19/time-synchronization-in-hyper-v/


Hailiang

[Qemu-devel] [PATCH v4 0/6] COLO block replication supports shared disk case

2017-04-12 Thread zhanghailiang

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the 4th version.

Please review and any commits are welcomed.

Cc: Dr. David Alan Gilbert (git) <dgilb...@redhat.com>
Cc: eddie.d...@intel.com

v4:
- Add proper comment for primary_disk in patch 2 (Stefan)
- Call bdrv_invalidate_cache() while do checkpoint for shared disk in patch 5

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
  docs/block-replication: Add description for shared-disk case
  replication: add shared-disk and shared-disk-id options
  replication: Split out backup_do_checkpoint() from
secondary_do_checkpoint()
  replication: fix code logic with the new shared_disk option
  replication: Implement block replication for shared disk case
  nbd/replication: implement .bdrv_get_info() for nbd and replication
driver

 block/nbd.c|  12 +++
 block/replication.c| 198 ++---
 docs/block-replication.txt | 139 ++-
 qapi/block-core.json   |  10 ++-
 4 files changed, 306 insertions(+), 53 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v4 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()

2017-04-12 Thread zhanghailiang

The helper backup_do_checkpoint() will be used for primary related
codes. Here we split it out from secondary_do_checkpoint().

Besides, it is unnecessary to call backup_do_checkpoint() in
replication starting and normally stop replication path.
We only need call it while do real checkpointing.

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Reviewed-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 block/replication.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 418b81b..b021215 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -352,20 +352,8 @@ static bool 
replication_recurse_is_first_non_filter(BlockDriverState *bs,
 
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
-Error *local_err = NULL;
 int ret;
 
-if (!s->secondary_disk->bs->job) {
-error_setg(errp, "Backup job was cancelled unexpectedly");
-return;
-}
-
-backup_do_checkpoint(s->secondary_disk->bs->job, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
 ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
 if (ret < 0) {
 error_setg(errp, "Cannot make active disk empty");
@@ -578,6 +566,8 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 return;
 }
 block_job_start(job);
+
+secondary_do_checkpoint(s, errp);
 break;
 default:
 aio_context_release(aio_context);
@@ -586,10 +576,6 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 
 s->replication_state = BLOCK_REPLICATION_RUNNING;
 
-if (s->mode == REPLICATION_MODE_SECONDARY) {
-secondary_do_checkpoint(s, errp);
-}
-
 s->error = 0;
 aio_context_release(aio_context);
 }
@@ -599,13 +585,29 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
 BlockDriverState *bs = rs->opaque;
 BDRVReplicationState *s;
 AioContext *aio_context;
+Error *local_err = NULL;
 
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
 s = bs->opaque;
 
-if (s->mode == REPLICATION_MODE_SECONDARY) {
+switch (s->mode) {
+case REPLICATION_MODE_PRIMARY:
+break;
+case REPLICATION_MODE_SECONDARY:
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+break;
+}
+backup_do_checkpoint(s->secondary_disk->bs->job, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+break;
+}
 secondary_do_checkpoint(s, errp);
+break;
+default:
+abort();
 }
 aio_context_release(aio_context);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v4 4/6] replication: fix code logic with the new shared_disk option

2017-04-12 Thread zhanghailiang

Some code logic only be needed in non-shared disk, here
we adjust these codes to prepare for shared disk scenario.

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 block/replication.c | 73 ++---
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index b021215..3a35471 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -539,33 +539,40 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 return;
 }
 
-/* start backup job now */
-error_setg(>blocker,
-   "Block device is in use by internal backup job");
-
-top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
-if (!top_bs || !bdrv_is_root_node(top_bs) ||
-!check_top_bs(top_bs, bs)) {
-error_setg(errp, "No top_bs or it is invalid");
-reopen_backing_file(bs, false, NULL);
-aio_context_release(aio_context);
-return;
-}
-bdrv_op_block_all(top_bs, s->blocker);
-bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
-
-job = backup_job_create(NULL, s->secondary_disk->bs, 
s->hidden_disk->bs,
-0, MIRROR_SYNC_MODE_NONE, NULL, false,
+/*
+ * Only in the case of non-shared disk,
+ * the backup job is in the secondary side
+ */
+if (!s->is_shared_disk) {
+/* start backup job now */
+error_setg(>blocker,
+"Block device is in use by internal backup job");
+
+top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
+if (!top_bs || !bdrv_is_root_node(top_bs) ||
+!check_top_bs(top_bs, bs)) {
+error_setg(errp, "No top_bs or it is invalid");
+reopen_backing_file(bs, false, NULL);
+aio_context_release(aio_context);
+return;
+}
+
+bdrv_op_block_all(top_bs, s->blocker);
+bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+job = backup_job_create(NULL, s->secondary_disk->bs,
+s->hidden_disk->bs, 0,
+MIRROR_SYNC_MODE_NONE, NULL, false,
 BLOCKDEV_ON_ERROR_REPORT,
 BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
 backup_job_completed, bs, NULL, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-backup_job_cleanup(bs);
-aio_context_release(aio_context);
-return;
+if (local_err) {
+error_propagate(errp, local_err);
+backup_job_cleanup(bs);
+aio_context_release(aio_context);
+return;
+}
+block_job_start(job);
 }
-block_job_start(job);
 
 secondary_do_checkpoint(s, errp);
 break;
@@ -595,14 +602,16 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
 case REPLICATION_MODE_PRIMARY:
 break;
 case REPLICATION_MODE_SECONDARY:
-if (!s->secondary_disk->bs->job) {
-error_setg(errp, "Backup job was cancelled unexpectedly");
-break;
-}
-backup_do_checkpoint(s->secondary_disk->bs->job, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-break;
+if (!s->is_shared_disk) {
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+break;
+}
+backup_do_checkpoint(s->secondary_disk->bs->job, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+break;
+}
 }
 secondary_do_checkpoint(s, errp);
 break;
@@ -683,7 +692,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
  * before the BDS is closed, because we will access hidden
  * disk, secondary disk in backup_job_completed().
  */
-if (s->secondary_disk->bs->job) {
+if (!s->is_shared_disk && s->secondary_disk->bs->job) {
 block_job_cancel_sync(s->secondary_disk->bs->job);
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v4 1/6] docs/block-replication: Add description for shared-disk case

2017-04-12 Thread zhanghailiang

Introuduce the scenario of shared-disk block replication
and how to use it.

Reviewed-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
 docs/block-replication.txt | 139 +++--
 1 file changed, 135 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network 
transportation
 effort during a vmstate checkpoint, the disk modification operations of
 the Primary disk are asynchronously forwarded to the Secondary node.
 
-== Workflow ==
+== Non-shared disk workflow ==
 The following is the image of block replication workflow:
 
 +--+++
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
 4) Secondary write requests will be buffered in the Disk buffer and it
will overwrite the existing sector content in the buffer.
 
-== Architecture ==
+== Non-shared disk architecture ==
 We are going to implement block replication from many basic
 blocks that are already in QEMU.
 
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative 
write-through
 of the NBD server into the secondary disk. So before block replication,
 the primary disk and secondary disk should contain the same data.
 
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  | (2)Forward and write through | |
+  | +--> | Disk Buffer |
+  | || |
+  | |\-/
+  | |(1)read   |
+  | |  |
+   (3)write   | |  | backing file
+  V |  |
+ +-+   |
+ | Shared Disk | <-+
+ +-+
+
+1) Primary writes will read original data and forward it to Secondary
+   QEMU.
+2) Before Primary write requests are written to Shared disk, the
+   original sector content will be read from Shared disk and
+   forwarded and buffered in the Disk buffer on the secondary site,
+   but it will not overwrite the existing sector content (it could be
+   from either "Secondary Write Requests" or previous COW of "Primary
+   Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Shared disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+ virtio-blk ||   
.--
+ /  ||   | 
Secondary
+/   ||   
'--
+   /|| 
virtio-blk
+  / || 
 |
+  | ||   
replication(5)
+  |NBD  >   NBD   (2)  
 |
+  |  client ||server ---> hidden disk <-- 
active disk(4)
+  | ^   ||  |
+  |  replication(1) ||  |
+  | |   ||  |
+  |   +-'   ||  |
+ (3)  |drive-backup sync=none   ||  |
+. |   +-+   ||

[Qemu-devel] [PATCH v4 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver

2017-04-12 Thread zhanghailiang

Without this callback, there will be an error reports in the primary side:
"qemu-system-x86_64: Couldn't determine the cluster size of the target image,
which has no backing file: Operation not supported
Aborting, since this may create an unusable destination image"

For nbd driver, it doesn't have cluster size, so here we return
a fake value for it.

This patch should be dropped if Eric's nbd patch be merged.
https://lists.gnu.org/archive/html/qemu-block/2017-02/msg00825.html
'[PATCH v4 7/8] nbd: Implement NBD_INFO_BLOCK_SIZE on server'.

Cc: Eric Blake <ebl...@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
---
 block/nbd.c | 12 
 block/replication.c |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 814ab26d..fceb14b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -43,6 +43,8 @@
 
 #define EN_OPTSTR ":exportname="
 
+#define NBD_FAKE_CLUSTER_SIZE 512
+
 typedef struct BDRVNBDState {
 NBDClientSession client;
 
@@ -561,6 +563,13 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 bs->full_open_options = opts;
 }
 
+static int nbd_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+bdi->cluster_size  = NBD_FAKE_CLUSTER_SIZE;
+
+return 0;
+}
+
 static BlockDriver bdrv_nbd = {
 .format_name= "nbd",
 .protocol_name  = "nbd",
@@ -578,6 +587,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_get_info  = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -597,6 +607,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_get_info  = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -616,6 +627,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_get_info  = nbd_get_info,
 };
 
 static void bdrv_nbd_init(void)
diff --git a/block/replication.c b/block/replication.c
index fb604e5..7371caa 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -761,6 +761,11 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
 aio_context_release(aio_context);
 }
 
+static int replication_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+return bdrv_get_info(bs->file->bs, bdi);
+}
+
 BlockDriver bdrv_replication = {
 .format_name= "replication",
 .protocol_name  = "replication",
@@ -774,6 +779,7 @@ BlockDriver bdrv_replication = {
 .bdrv_co_readv  = replication_co_readv,
 .bdrv_co_writev = replication_co_writev,
 
+.bdrv_get_info  = replication_get_info,
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter = 
replication_recurse_is_first_non_filter,
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v4 5/6] replication: Implement block replication for shared disk case

2017-04-12 Thread zhanghailiang

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

 virtio-blk ||   
.--
 /  ||   | 
Secondary
/   ||   
'--
   /|| 
virtio-blk
  / ||  
|
  | ||   
replication(5)
  |NBD  >   NBD   (2)   
|
  |  client ||server ---> hidden disk <-- 
active disk(4)
  | ^   ||  |
  |  replication(1) ||  |
  | |   ||  |
  |   +-'   ||  |
 (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
  V |   |
   +---+|
   |   shared disk | <--+
   +---+

1) Primary writes will read original data and forward it to Secondary
   QEMU.
2) The hidden-disk is created automatically. It buffers the original content
   that is modified by the primary VM. It should also be an empty disk, and
   the driver supports bdrv_make_empty() and backing file.
3) Primary write requests will be written to Shared disk.
4) Secondary write requests will be buffered in the active disk and it
   will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
v4:
 - Call bdrv_invalidate_cache() while do checkpoint for shared disk
---
 block/replication.c | 58 +++--
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 3a35471..fb604e5 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -253,7 +253,7 @@ static coroutine_fn int 
replication_co_readv(BlockDriverState *bs,
  QEMUIOVector *qiov)
 {
 BDRVReplicationState *s = bs->opaque;
-BdrvChild *child = s->secondary_disk;
+BdrvChild *child = s->is_shared_disk ? s->primary_disk : s->secondary_disk;
 BlockJob *job = NULL;
 CowRequest req;
 int ret;
@@ -435,7 +435,12 @@ static void backup_job_completed(void *opaque, int ret)
 s->error = -EIO;
 }
 
-backup_job_cleanup(bs);
+if (s->mode == REPLICATION_MODE_PRIMARY) {
+s->replication_state = BLOCK_REPLICATION_DONE;
+s->error = 0;
+} else {
+backup_job_cleanup(bs);
+}
 }
 
 static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
@@ -487,6 +492,19 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 
 switch (s->mode) {
 case REPLICATION_MODE_PRIMARY:
+if (s->is_shared_disk) {
+job = backup_job_create(NULL, s->primary_disk->bs, bs, 0,
+MIRROR_SYNC_MODE_NONE, NULL, false, BLOCKDEV_ON_ERROR_REPORT,
+BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
+backup_job_completed, bs, NULL, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+backup_job_cleanup(bs);
+aio_context_release(aio_context);
+return;
+}
+block_job_start(job);
+}
 break;
 case REPLICATION_MODE_SECONDARY:
 s->active_disk = bs->file;
@@ -505,7 +523,8 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 }
 
 s->secondary_disk = s->hidden_disk->bs->backing;
-if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) {
+if (!s->secondary_disk->bs ||
+(!s->is_shared_disk && !bdrv_has_blk(s->secondary_disk->bs))) {
 error_setg(errp, "The secondary disk doesn't have block backend");
 aio_context_release(aio_context);
 return;
@@ -600,11 +619,24 @@ static void replication_do_checkpoint(ReplicationState 
*r

[Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-04-12 Thread zhanghailiang

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
 block/replication.c  | 43 +--
 qapi/block-core.json | 10 +-
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index bf3c395..418b81b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
 typedef struct BDRVReplicationState {
 ReplicationMode mode;
 int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
 BdrvChild *active_disk;
 BdrvChild *hidden_disk;
 BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
 char *top_id;
 ReplicationState *rs;
 Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,
 
 #define REPLICATION_MODE"mode"
 #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
 static QemuOptsList replication_runtime_opts = {
 .name = "replication",
 .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
 .name = REPLICATION_TOP_ID,
 .type = QEMU_OPT_STRING,
 },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
 { /* end of list */ }
 },
 };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 QemuOpts *opts = NULL;
 const char *mode;
 const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
false, errp);
@@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
"The option mode's value should be primary or secondary");
 goto fail;
 }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We have a BlockBackend for the primary disk but use BdrvChild for
+ * consistency - active_disk, secondary_disk, etc are also BdrvChild.
+ */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(_bs->parents);
+}
 
 s->rs = replication_new(bs, _ops);
 
-ret = 0;
-
+qemu_opts_del(opts);
+return 0;
 fail:
+g_free(s->shared_disk_id);
 qemu_opts_del(opts);
 error_propagate(errp, local_err);
 
@@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs)
 {
 BDRVReplicationState *s = bs->opaque;
 
+g_free(s->shared_disk_id);
 if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
 replication_stop(s->rs, false, NULL);
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..361c932 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
 #  node who owns the replication node chain. Must not be given in
 #  primary mode.
 #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#
+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsReplication',
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'mode': 'ReplicationMode',
-'*top-id': 'str' } }
+'*top-id': 'str',
+'*shared-disk-id': 'str',
+'*shared-disk': 'bool' } }
 
 ##
 # @NFSTransport:
-- 
1.8.3.1

[Qemu-devel] [PATCH] virtio-serial-bus: Delete timer from list before free it

2017-03-05 Thread zhanghailiang

Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 hw/char/virtio-serial-bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index d544cd9..d797a67 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -724,6 +724,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 }
 }
 g_free(s->post_load->connected);
+timer_del(s->post_load->timer);
 timer_free(s->post_load->timer);
 g_free(s->post_load);
 s->post_load = NULL;
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 1/2] net/colo: fix memory double free error

2017-02-27 Thread zhanghailiang

The 'primary_list' and 'secondary_list' members of struct Connection
is not allocated through dynamically g_queue_new(), but we free it by using
g_queue_free(), which will lead to a double-free bug.

Reviewed-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
---
 net/colo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6a6eacd..8cc166b 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -147,9 +147,9 @@ void connection_destroy(void *opaque)
 Connection *conn = opaque;
 
 g_queue_foreach(>primary_list, packet_destroy, NULL);
-g_queue_free(>primary_list);
+g_queue_clear(>primary_list);
 g_queue_foreach(>secondary_list, packet_destroy, NULL);
-g_queue_free(>secondary_list);
+g_queue_clear(>secondary_list);
 g_slice_free(Connection, conn);
 }
 
-- 
1.8.3.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1407 matches

Mail list logo