Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-03-03 Thread Dr. David Alan Gilbert
* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> >>On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >>>* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> >>* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> >>>From: root 
> >>>
> >>>This is the 15th version of COLO (Still only support periodic 
> >>>checkpoint).
> >>>
> >>>Here is only COLO frame part, you can get the whole codes from github:
> >>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>
> >>>There are little changes for this series except the network releated 
> >>>part.
> >>
> >>I was looking at the time the guest is paused during COLO and
> >>was surprised to find one of the larger chunks was the time to reset
> >>the guest before loading each checkpoint;  I've traced it part way, the
> >>biggest contributors for my test VM seem to be:
> >>
> >>   3.8ms  pcibus_reset: VGA
> >>   1.8ms  pcibus_reset: virtio-net-pci
> >>   1.5ms  pcibus_reset: virtio-blk-pci
> >>   1.5ms  qemu_devices_reset: piix4_reset
> >>   1.1ms  pcibus_reset: piix3-ide
> >>   1.1ms  pcibus_reset: virtio-rng-pci
> >>
> >>I've not looked deeper yet, but some of these are very silly;
> >>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>going to be interesting.
> >>Also, my only block device is the virtio-blk, so while I understand the
> >>standard PC machine has the IDE controller, why it takes it over a ms
> >>to reset an unused device.
> >
> >OK, so I've dug a bit deeper, and it appears that it's the changes in
> >PCI bars that actually take the time;  every time we do a reset we
> >reset all the BARs, this causes it to do a pci_update_mappings and
> >end up doing a memory_region_del_subregion.
> >Then we load the config space of the PCI device as we do the 
> >vmstate_load,
> >and this recreates all the mappings again.
> >
> >I'm not sure what the fix is, but that sounds like it would
> >speed up the checkpoints usefully if we can avoid the map/remap when
> >they're the same.
> >
> 
> Interesting, and thanks for your report.
> 
> We already known qemu_system_reset() is a time-consuming function, we 
> shouldn't
> call it here, but if we didn't do that, there will be a bug, which we have
> reported before in the previous COLO series, the bellow is the copy of 
> the related
> patch comment:
> >
> >Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
> >(in particular if the guest changed a bar and the device code tried to 
> >access the memory
> >while loading the state???) - but it does seem to work and shaves ~10ms off 
> >the reset/load
> >times:
> >
> 
> Nice work, i also tested it, and it is a good improvement, I'm wondering if 
> it is safe here,
> it should be safe to apply to qemu_system_reset() independently (I tested it 
> too,
> it will shaves about 5ms off).

Yes, it seems quite nice.
I did find today one VM that wont boot with COLO with that change; it's
an ubuntu VM that has a delay in Grub, and it's when it does the first
checkpoint during Grub still being displayed it gets an error from
the inbound migrate.

The error is VQ 0 size 0x80 Guest index 0x2444 inconsistent with Host index 
0x119e: delta 0x12a6
from virtio-blk - so maybe virtio-blk is accessing the memory during loading.

Dave

> Hailiang
> 
> >Dave
> >
> >commit 7570b2984143860005ad9fe79f5394c75f294328
> >Author: Dr. David Alan Gilbert 
> >Date:   Tue Mar 1 12:08:14 2016 +
> >
> > COLO: Lock memory map around reset/load
> >
> > Changing the memory map appears to be expensive; we see this
> > partiuclarly when on loading a checkpoint we:
> >a) reset the devices
> >   This causes PCI bars to be reset
> >b) Loading the device states
> >   This causes the PCI bars to be reloaded.
> >
> > Turning this all into a single memory_region_transaction saves
> >  ~10ms/checkpoint.
> >
> > TBD: What happens if the device code accesses the RAM during loading
> > the checkpoint?
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> > Suggested-by: Paolo Bonzini 
> >
> >diff --git a/migration/colo.c b/migration/colo.c
> >index 45c3432..c44fb2a 100644
> >--- a/migration/colo.c
> >+++ b/migration/colo.c
> >@@ -22,6 +22,7 @@
> >  #include "net/colo-proxy.h"
> >  #include "net/net.h"
> >  #include "block/block_int.h"
> >+#include "exec/memory.h"
> >
> >  static bool vmstate_loading;
> >
> >@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void 

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-03-02 Thread Hailiang Zhang

On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

From: root 

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.


I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

   3.8ms  pcibus_reset: VGA
   1.8ms  pcibus_reset: virtio-net-pci
   1.5ms  pcibus_reset: virtio-blk-pci
   1.5ms  qemu_devices_reset: piix4_reset
   1.1ms  pcibus_reset: piix3-ide
   1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.


OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.



Interesting, and thanks for your report.

We already known qemu_system_reset() is a time-consuming function, we shouldn't
call it here, but if we didn't do that, there will be a bug, which we have
reported before in the previous COLO series, the bellow is the copy of the 
related
patch comment:


Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
(in particular if the guest changed a bar and the device code tried to access 
the memory
while loading the state???) - but it does seem to work and shaves ~10ms off the 
reset/load
times:



Nice work, i also tested it, and it is a good improvement, I'm wondering if it 
is safe here,
it should be safe to apply to qemu_system_reset() independently (I tested it 
too,
it will shaves about 5ms off).

Hailiang


Dave

commit 7570b2984143860005ad9fe79f5394c75f294328
Author: Dr. David Alan Gilbert 
Date:   Tue Mar 1 12:08:14 2016 +

 COLO: Lock memory map around reset/load

 Changing the memory map appears to be expensive; we see this
 partiuclarly when on loading a checkpoint we:
a) reset the devices
   This causes PCI bars to be reset
b) Loading the device states
   This causes the PCI bars to be reloaded.

 Turning this all into a single memory_region_transaction saves
  ~10ms/checkpoint.

 TBD: What happens if the device code accesses the RAM during loading
 the checkpoint?

 Signed-off-by: Dr. David Alan Gilbert 
 Suggested-by: Paolo Bonzini 

diff --git a/migration/colo.c b/migration/colo.c
index 45c3432..c44fb2a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -22,6 +22,7 @@
  #include "net/colo-proxy.h"
  #include "net/net.h"
  #include "block/block_int.h"
+#include "exec/memory.h"

  static bool vmstate_loading;

@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)

  stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
  qemu_mutex_lock_iothread();
+memory_region_transaction_begin();
  qemu_system_reset(VMRESET_SILENT);
  stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
  timed_average_account(>colo_state.time_reset,
@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
stage_time_end - stage_time_start);
  stage_time_start = stage_time_end;
  ret = qemu_load_device_state(fb);
+memory_region_transaction_commit();
  if (ret < 0) {
  error_report("COLO: load device state failed\n");
  vmstate_loading = false;

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.






Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-03-01 Thread Dr. David Alan Gilbert
* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> >>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> >From: root 
> >
> >This is the 15th version of COLO (Still only support periodic 
> >checkpoint).
> >
> >Here is only COLO frame part, you can get the whole codes from github:
> >https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >
> >There are little changes for this series except the network releated 
> >part.
> 
> I was looking at the time the guest is paused during COLO and
> was surprised to find one of the larger chunks was the time to reset
> the guest before loading each checkpoint;  I've traced it part way, the
> biggest contributors for my test VM seem to be:
> 
>    3.8ms  pcibus_reset: VGA
>    1.8ms  pcibus_reset: virtio-net-pci
>    1.5ms  pcibus_reset: virtio-blk-pci
>    1.5ms  qemu_devices_reset: piix4_reset
>    1.1ms  pcibus_reset: piix3-ide
>    1.1ms  pcibus_reset: virtio-rng-pci
> 
> I've not looked deeper yet, but some of these are very silly;
> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> going to be interesting.
> Also, my only block device is the virtio-blk, so while I understand the
> standard PC machine has the IDE controller, why it takes it over a ms
> to reset an unused device.
> >>>
> >>>OK, so I've dug a bit deeper, and it appears that it's the changes in
> >>>PCI bars that actually take the time;  every time we do a reset we
> >>>reset all the BARs, this causes it to do a pci_update_mappings and
> >>>end up doing a memory_region_del_subregion.
> >>>Then we load the config space of the PCI device as we do the vmstate_load,
> >>>and this recreates all the mappings again.
> >>>
> >>>I'm not sure what the fix is, but that sounds like it would
> >>>speed up the checkpoints usefully if we can avoid the map/remap when
> >>>they're the same.
> >>>
> >>
> >>Interesting, and thanks for your report.
> >>
> >>We already known qemu_system_reset() is a time-consuming function, we 
> >>shouldn't
> >>call it here, but if we didn't do that, there will be a bug, which we have
> >>reported before in the previous COLO series, the bellow is the copy of the 
> >>related
> >>patch comment:

Paolo suggested one fix, see the patch below;  I'm not sure if it's safe
(in particular if the guest changed a bar and the device code tried to access 
the memory
while loading the state???) - but it does seem to work and shaves ~10ms off the 
reset/load
times:

Dave

commit 7570b2984143860005ad9fe79f5394c75f294328
Author: Dr. David Alan Gilbert 
Date:   Tue Mar 1 12:08:14 2016 +

COLO: Lock memory map around reset/load

Changing the memory map appears to be expensive; we see this
partiuclarly when on loading a checkpoint we:
   a) reset the devices
  This causes PCI bars to be reset
   b) Loading the device states
  This causes the PCI bars to be reloaded.

Turning this all into a single memory_region_transaction saves
 ~10ms/checkpoint.

TBD: What happens if the device code accesses the RAM during loading
the checkpoint?

Signed-off-by: Dr. David Alan Gilbert 
Suggested-by: Paolo Bonzini 

diff --git a/migration/colo.c b/migration/colo.c
index 45c3432..c44fb2a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -22,6 +22,7 @@
 #include "net/colo-proxy.h"
 #include "net/net.h"
 #include "block/block_int.h"
+#include "exec/memory.h"
 
 static bool vmstate_loading;
 
@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
 
 stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
 qemu_mutex_lock_iothread();
+memory_region_transaction_begin();
 qemu_system_reset(VMRESET_SILENT);
 stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
 timed_average_account(>colo_state.time_reset,
@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
   stage_time_end - stage_time_start);
 stage_time_start = stage_time_end;
 ret = qemu_load_device_state(fb);
+memory_region_transaction_commit();
 if (ret < 0) {
 error_report("COLO: load device state failed\n");
 vmstate_loading = false;

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-29 Thread Dr. David Alan Gilbert
* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
> >* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> >>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >>>* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:

> >I've got a patch where I've tried to multithread the flush - it's made it a 
> >little
> >faster, but not as much as I hoped (~20ms down to ~16ms using 4 cores)
> >
> 
> Hmm, that seems to be a good idea, after switch to COLO (hybrid) mode, in 
> most cases,
> we will get much more dirtied pages than the periodic mode, because the delay 
> time
> between two checkpoints is usually longer.
> The multi-thread flushing way may gain much more in that case, but i doubt, 
> in some
> bad case, users still can't bear the pause time.
> 
> Actually, we have thought about this problem for a long time,
> In our early test based on Kernel COLO-proxy, we can easily got more than
> one seconds' flushing time, IMHO, uses can't bear the long pausing time of VM 
> if they
> choose to use COLO.

Yes, that's just too long; although only solving the 'flushing' time isn't 
enough in those
cases, because the same cases will probably need to transfer lots of RAM over 
the wire as well.

> We have designed another scenario which based on userfault's page-miss 
> capability.
> The base idea is to convert the flushing action to marking action, the flush 
> action
> will be processed during SVM's running time. For now it is only an idea,
> we'd like to verify the idea first. (I'm not quite sure if userfaults' 
> page-miss
> feature is good performance designed, while we use it to mark one page to be 
> MISS a time).

Yes, it's a different trade off, slower execution, but no flush time.

Dave

> 
> 
> Thanks,
> Hailiang
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-29 Thread Hailiang Zhang

On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

From: root 

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.


I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

   3.8ms  pcibus_reset: VGA
   1.8ms  pcibus_reset: virtio-net-pci
   1.5ms  pcibus_reset: virtio-blk-pci
   1.5ms  qemu_devices_reset: piix4_reset
   1.1ms  pcibus_reset: piix3-ide
   1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.


OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.



Interesting, and thanks for your report.

We already known qemu_system_reset() is a time-consuming function, we shouldn't
call it here, but if we didn't do that, there will be a bug, which we have
reported before in the previous COLO series, the bellow is the copy of the 
related
patch comment:

 COLO VMstate: Load VM state into qsb before restore it

 We should not destroy the state of secondary until we receive the whole
 state from the primary, in case the primary fails in the middle of sending
 the state, so, here we cache the device state in Secondary before restore 
it.

 Besides, we should call qemu_system_reset() before load VM state,
 which can ensure the data is intact.
 Note: If we discard qemu_system_reset(), there will be some odd error,
 For exmple, qemu in slave side crashes and reports:

 KVM: entry failed, hardware error 0x7
 EAX= EBX=e000 ECX=9578 EDX=434f
 ESI=fc10 EDI=434f EBP= ESP=1fca
 EIP=9594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0040 0400  9300
 CS =f000 000f  9b00
 SS =434f 000434f0  9300
 DS =434f 000434f0  9300
 FS =   9300
 GS =   9300
 LDT=   8200
 TR =   8b00
 GDT= 0002dcc8 0047
 IDT=  
 CR0=0010 CR2= CR3= CR4=
 DR0= DR1= DR2= 
DR3=
 DR6=0ff0 DR7=0400
 EFER=
 Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90  90 
fa fc 66 c3 66 53 66 89 c3
 ERROR: invalid runstate transition: 'internal-error' -> 'colo'

 The reason is, some of the device state will be ignored when saving device 
state to slave,
 if the corresponding data is in its initial value, such as 0.
 But the device state in slave maybe in initialized value, after a loop of 
checkpoint,
 there will be inconsistent for the value of device state.
 This will happen when the PVM reboot or SVM run ahead of PVM in the 
startup process.
 Signed-off-by: zhanghailiang 
 Signed-off-by: Yang Hongyang 
 Signed-off-by: Gonglei 
 Reviewed-by: Dr. David Alan Gilbert 

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-29 Thread Dr. David Alan Gilbert
* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
> >* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> >>* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> >>>From: root 
> >>>
> >>>This is the 15th version of COLO (Still only support periodic checkpoint).
> >>>
> >>>Here is only COLO frame part, you can get the whole codes from github:
> >>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> >>>
> >>>There are little changes for this series except the network releated part.
> >>
> >>I was looking at the time the guest is paused during COLO and
> >>was surprised to find one of the larger chunks was the time to reset
> >>the guest before loading each checkpoint;  I've traced it part way, the
> >>biggest contributors for my test VM seem to be:
> >>
> >>   3.8ms  pcibus_reset: VGA
> >>   1.8ms  pcibus_reset: virtio-net-pci
> >>   1.5ms  pcibus_reset: virtio-blk-pci
> >>   1.5ms  qemu_devices_reset: piix4_reset
> >>   1.1ms  pcibus_reset: piix3-ide
> >>   1.1ms  pcibus_reset: virtio-rng-pci
> >>
> >>I've not looked deeper yet, but some of these are very silly;
> >>I'm running with -nographic so why it's taking 3.8ms to reset VGA is
> >>going to be interesting.
> >>Also, my only block device is the virtio-blk, so while I understand the
> >>standard PC machine has the IDE controller, why it takes it over a ms
> >>to reset an unused device.
> >
> >OK, so I've dug a bit deeper, and it appears that it's the changes in
> >PCI bars that actually take the time;  every time we do a reset we
> >reset all the BARs, this causes it to do a pci_update_mappings and
> >end up doing a memory_region_del_subregion.
> >Then we load the config space of the PCI device as we do the vmstate_load,
> >and this recreates all the mappings again.
> >
> >I'm not sure what the fix is, but that sounds like it would
> >speed up the checkpoints usefully if we can avoid the map/remap when
> >they're the same.
> >
> 
> Interesting, and thanks for your report.
> 
> We already known qemu_system_reset() is a time-consuming function, we 
> shouldn't
> call it here, but if we didn't do that, there will be a bug, which we have
> reported before in the previous COLO series, the bellow is the copy of the 
> related
> patch comment:
> 
> COLO VMstate: Load VM state into qsb before restore it
> 
> We should not destroy the state of secondary until we receive the whole
> state from the primary, in case the primary fails in the middle of sending
> the state, so, here we cache the device state in Secondary before restore 
> it.
> 
> Besides, we should call qemu_system_reset() before load VM state,
> which can ensure the data is intact.
> Note: If we discard qemu_system_reset(), there will be some odd error,
> For exmple, qemu in slave side crashes and reports:
> 
> KVM: entry failed, hardware error 0x7
> EAX= EBX=e000 ECX=9578 EDX=434f
> ESI=fc10 EDI=434f EBP= ESP=1fca
> EIP=9594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0040 0400  9300
> CS =f000 000f  9b00
> SS =434f 000434f0  9300
> DS =434f 000434f0  9300
> FS =   9300
> GS =   9300
> LDT=   8200
> TR =   8b00
> GDT= 0002dcc8 0047
> IDT=  
> CR0=0010 CR2= CR3= CR4=
> DR0= DR1= DR2= 
> DR3=
> DR6=0ff0 DR7=0400
> EFER=
> Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90  90 
> fa fc 66 c3 66 53 66 89 c3
> ERROR: invalid runstate transition: 'internal-error' -> 'colo'
> 
> The reason is, some of the device state will be ignored when saving 
> device state to slave,
> if the corresponding data is in its initial value, such as 0.
> But the device state in slave maybe in initialized value, after a loop of 
> checkpoint,
> there will be inconsistent for the value of device state.
> This will happen when the PVM reboot or SVM run ahead of PVM in the 
> startup process.
> Signed-off-by: zhanghailiang 
> Signed-off-by: Yang Hongyang 
> Signed-off-by: Gonglei 
> Reviewed-by: Dr. David Alan Gilbert  
> As described above, some values of the device state are zero, they will be
> ignored during  migration, it has no problem for normal migration, because
> for the VM in destination, the initial values will be zero too. But for COLO,
> there are more than one round of migration, the related values may be changed
> from no-zero to zero, they will be ignored too in the next checkpoint, 

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-26 Thread Hailiang Zhang

On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

From: root 

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.


I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

   3.8ms  pcibus_reset: VGA
   1.8ms  pcibus_reset: virtio-net-pci
   1.5ms  pcibus_reset: virtio-blk-pci
   1.5ms  qemu_devices_reset: piix4_reset
   1.1ms  pcibus_reset: piix3-ide
   1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.


OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.



Interesting, and thanks for your report.

We already known qemu_system_reset() is a time-consuming function, we shouldn't
call it here, but if we didn't do that, there will be a bug, which we have
reported before in the previous COLO series, the bellow is the copy of the 
related
patch comment:

COLO VMstate: Load VM state into qsb before restore it

We should not destroy the state of secondary until we receive the whole
state from the primary, in case the primary fails in the middle of sending
the state, so, here we cache the device state in Secondary before restore 
it.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.
Note: If we discard qemu_system_reset(), there will be some odd error,
For exmple, qemu in slave side crashes and reports:

KVM: entry failed, hardware error 0x7
EAX= EBX=e000 ECX=9578 EDX=434f
ESI=fc10 EDI=434f EBP= ESP=1fca
EIP=9594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0040 0400  9300
CS =f000 000f  9b00
SS =434f 000434f0  9300
DS =434f 000434f0  9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT= 0002dcc8 0047
IDT=  
CR0=0010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90  90 fa 
fc 66 c3 66 53 66 89 c3
ERROR: invalid runstate transition: 'internal-error' -> 'colo'

The reason is, some of the device state will be ignored when saving device 
state to slave,
if the corresponding data is in its initial value, such as 0.
But the device state in slave maybe in initialized value, after a loop of 
checkpoint,
there will be inconsistent for the value of device state.
This will happen when the PVM reboot or SVM run ahead of PVM in the startup 
process.
Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-26 Thread Dr. David Alan Gilbert
* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> > From: root 
> > 
> > This is the 15th version of COLO (Still only support periodic checkpoint).
> > 
> > Here is only COLO frame part, you can get the whole codes from github:
> > https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> > 
> > There are little changes for this series except the network releated part.
> 
> I was looking at the time the guest is paused during COLO and
> was surprised to find one of the larger chunks was the time to reset
> the guest before loading each checkpoint;  I've traced it part way, the
> biggest contributors for my test VM seem to be:
> 
>   3.8ms  pcibus_reset: VGA
>   1.8ms  pcibus_reset: virtio-net-pci
>   1.5ms  pcibus_reset: virtio-blk-pci
>   1.5ms  qemu_devices_reset: piix4_reset
>   1.1ms  pcibus_reset: piix3-ide
>   1.1ms  pcibus_reset: virtio-rng-pci
> 
> I've not looked deeper yet, but some of these are very silly;
> I'm running with -nographic so why it's taking 3.8ms to reset VGA is 
> going to be interesting.
> Also, my only block device is the virtio-blk, so while I understand the
> standard PC machine has the IDE controller, why it takes it over a ms
> to reset an unused device.

OK, so I've dug a bit deeper, and it appears that it's the changes in
PCI bars that actually take the time;  every time we do a reset we
reset all the BARs, this causes it to do a pci_update_mappings and
end up doing a memory_region_del_subregion.
Then we load the config space of the PCI device as we do the vmstate_load,
and this recreates all the mappings again.

I'm not sure what the fix is, but that sounds like it would
speed up the checkpoints usefully if we can avoid the map/remap when
they're the same.

Dave

> 
> I guess reset is normally off anyones radar since it's outside
> the time anyone cares about, but I guess perhaps the guys trying
> to make qemu start really quickly would be interested.
> 
> Dave
> 
> > 
> > Patch status:
> > Unreviewed: patch 21,27,28,29,33,38
> > Updated: patch 31,34,35,37
> > 
> > TODO:
> > 1. Checkpoint based on proxy in qemu
> > 2. The capability of continuous FT
> > 3. Optimize the VM's downtime during checkpoint
> > 
> > v15:
> >  - Go on the shutdown process if encounter error while sending shutdown
> >message to SVM. (patch 24)
> >  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
> >some useless comment. (patch 31, Jason)
> >  - Call object_new_with_props() directly to add filter in
> >colo_add_buffer_filter. (patch 34, Jason)
> >  - Re-implement colo_set_filter_status() based on COLOBufferFilters
> >list. (patch 35)
> >  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
> >list. (patch 37) 
> > v14:
> >  - Re-implement the network processing based on netfilter (Jason Wang)
> >  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
> >  - Split two new patches (patch 27/28) from patch 29
> >  - Fix some other comments from Dave and Markus.
> > 
> > v13:
> >  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
> >   instead of return value to indicate success or failure. (patch 10)
> >  - Remove the optional error message for COLO_EXIT event. (patch 25)
> >  - Use semaphore to notify colo/colo incoming loop that failover work is
> >finished. (patch 26)
> >  - Move COLO shutdown related codes to colo.c file. (patch 28)
> >  - Fix memory leak bug for colo incoming loop. (new patch 31)
> >  - Re-use some existed helper functions to realize the process of
> >saving/loading ram and device. (patch 32)
> >  - Fix some other comments from Dave and Markus.
> > 
> > zhanghailiang (38):
> >   configure: Add parameter for configure to enable/disable COLO support
> >   migration: Introduce capability 'x-colo' to migration
> >   COLO: migrate colo related info to secondary node
> >   migration: Integrate COLO checkpoint process into migration
> >   migration: Integrate COLO checkpoint process into loadvm
> >   COLO/migration: Create a new communication path from destination to
> > source
> >   COLO: Implement colo checkpoint protocol
> >   COLO: Add a new RunState RUN_STATE_COLO
> >   QEMUSizedBuffer: Introduce two help functions for qsb
> >   COLO: Save PVM state to secondary side when do checkpoint
> >   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
> >   ram/COLO: Record the dirty pages that SVM received
> >   COLO: Load VMState into qsb before restore it
> >   COLO: Flush PVM's cached RAM into SVM's memory
> >   COLO: Add checkpoint-delay parameter for migrate-set-parameters
> >   COLO: synchronize PVM's state to SVM periodically
> >   COLO failover: Introduce a new command to trigger a failover
> >   COLO failover: Introduce state to record failover process
> >   COLO: Implement failover work for Primary VM
> >   COLO: Implement failover work for Secondary VM
> >   qmp event: 

Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-25 Thread Dr. David Alan Gilbert
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> From: root 
> 
> This is the 15th version of COLO (Still only support periodic checkpoint).
> 
> Here is only COLO frame part, you can get the whole codes from github:
> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
> 
> There are little changes for this series except the network releated part.

I was looking at the time the guest is paused during COLO and
was surprised to find one of the larger chunks was the time to reset
the guest before loading each checkpoint;  I've traced it part way, the
biggest contributors for my test VM seem to be:

  3.8ms  pcibus_reset: VGA
  1.8ms  pcibus_reset: virtio-net-pci
  1.5ms  pcibus_reset: virtio-blk-pci
  1.5ms  qemu_devices_reset: piix4_reset
  1.1ms  pcibus_reset: piix3-ide
  1.1ms  pcibus_reset: virtio-rng-pci

I've not looked deeper yet, but some of these are very silly;
I'm running with -nographic so why it's taking 3.8ms to reset VGA is 
going to be interesting.
Also, my only block device is the virtio-blk, so while I understand the
standard PC machine has the IDE controller, why it takes it over a ms
to reset an unused device.

I guess reset is normally off anyones radar since it's outside
the time anyone cares about, but I guess perhaps the guys trying
to make qemu start really quickly would be interested.

Dave

> 
> Patch status:
> Unreviewed: patch 21,27,28,29,33,38
> Updated: patch 31,34,35,37
> 
> TODO:
> 1. Checkpoint based on proxy in qemu
> 2. The capability of continuous FT
> 3. Optimize the VM's downtime during checkpoint
> 
> v15:
>  - Go on the shutdown process if encounter error while sending shutdown
>message to SVM. (patch 24)
>  - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
>some useless comment. (patch 31, Jason)
>  - Call object_new_with_props() directly to add filter in
>colo_add_buffer_filter. (patch 34, Jason)
>  - Re-implement colo_set_filter_status() based on COLOBufferFilters
>list. (patch 35)
>  - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
>list. (patch 37) 
> v14:
>  - Re-implement the network processing based on netfilter (Jason Wang)
>  - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
>  - Split two new patches (patch 27/28) from patch 29
>  - Fix some other comments from Dave and Markus.
> 
> v13:
>  - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
>   instead of return value to indicate success or failure. (patch 10)
>  - Remove the optional error message for COLO_EXIT event. (patch 25)
>  - Use semaphore to notify colo/colo incoming loop that failover work is
>finished. (patch 26)
>  - Move COLO shutdown related codes to colo.c file. (patch 28)
>  - Fix memory leak bug for colo incoming loop. (new patch 31)
>  - Re-use some existed helper functions to realize the process of
>saving/loading ram and device. (patch 32)
>  - Fix some other comments from Dave and Markus.
> 
> zhanghailiang (38):
>   configure: Add parameter for configure to enable/disable COLO support
>   migration: Introduce capability 'x-colo' to migration
>   COLO: migrate colo related info to secondary node
>   migration: Integrate COLO checkpoint process into migration
>   migration: Integrate COLO checkpoint process into loadvm
>   COLO/migration: Create a new communication path from destination to
> source
>   COLO: Implement colo checkpoint protocol
>   COLO: Add a new RunState RUN_STATE_COLO
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   COLO: Save PVM state to secondary side when do checkpoint
>   COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
>   ram/COLO: Record the dirty pages that SVM received
>   COLO: Load VMState into qsb before restore it
>   COLO: Flush PVM's cached RAM into SVM's memory
>   COLO: Add checkpoint-delay parameter for migrate-set-parameters
>   COLO: synchronize PVM's state to SVM periodically
>   COLO failover: Introduce a new command to trigger a failover
>   COLO failover: Introduce state to record failover process
>   COLO: Implement failover work for Primary VM
>   COLO: Implement failover work for Secondary VM
>   qmp event: Add COLO_EXIT event to notify users while exited from COLO
>   COLO failover: Shutdown related socket fd when do failover
>   COLO failover: Don't do failover during loading VM's state
>   COLO: Process shutdown command for VM in COLO state
>   COLO: Update the global runstate after going into colo state
>   savevm: Introduce two helper functions for save/find loadvm_handlers
> entry
>   migration/savevm: Add new helpers to process the different stages of
> loadvm
>   migration/savevm: Export two helper functions for savevm process
>   COLO: Separate the process of saving/loading ram and device state
>   COLO: Split qemu_savevm_state_begin out of checkpoint process
>   net/filter: Add a 'status' property for filter object
>   filter-buffer: Accept 

[Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-21 Thread zhanghailiang
From: root 

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.

Patch status:
Unreviewed: patch 21,27,28,29,33,38
Updated: patch 31,34,35,37

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v15:
 - Go on the shutdown process if encounter error while sending shutdown
   message to SVM. (patch 24)
 - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
   some useless comment. (patch 31, Jason)
 - Call object_new_with_props() directly to add filter in
   colo_add_buffer_filter. (patch 34, Jason)
 - Re-implement colo_set_filter_status() based on COLOBufferFilters
   list. (patch 35)
 - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
   list. (patch 37) 
v14:
 - Re-implement the network processing based on netfilter (Jason Wang)
 - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
 - Split two new patches (patch 27/28) from patch 29
 - Fix some other comments from Dave and Markus.

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

zhanghailiang (38):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO/migration: Create a new communication path from destination to
source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Introduce two helper functions for save/find loadvm_handlers
entry
  migration/savevm: Add new helpers to process the different stages of
loadvm
  migration/savevm: Export two helper functions for savevm process
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter: Add a 'status' property for filter object
  filter-buffer: Accept zero interval
  net: Add notifier/callback for netdev init
  COLO/filter: add each netdev a buffer filter
  COLO: manage the status of buffer filters for PVM
  filter-buffer: make filter_buffer_flush() public
  COLO: flush buffered packets in checkpoint process or exit COLO
  COLO: Add block replication into colo process

 configure |  11 +
 docs/qmp-events.txt   |  16 +
 hmp-commands.hx   |  15 +
 hmp.c |  15 +
 hmp.h |   1 +
 include/exec/ram_addr.h   |   1 +
 include/migration/colo.h  |  42 ++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  16 +
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h  |   5 +
 include/net/net.h |   4 +
 include/sysemu/sysemu.h   |   9 +
 migration/Makefile.objs   |   2 +
 migration/colo-comm.c |  76 
 migration/colo-failover.c |  83 
 migration/colo.c  | 866 ++
 migration/migration.c | 109 +-
 migration/qemu-file-buf.c |  61 +++
 migration/ram.c   | 175 -
 migration/savevm.c| 114