On Thu, Jun 12, 2025 at 12:35:46PM +0200, Mario Casquero wrote:
> Hello Peter,

Hi, Mario,

> 
> Thanks for pointing this out! I retested it with the series you
> mentioned and everything works fine.
> 
> Booted up 2 VMs as usual, one in source and one in destination with
> -incoming defer. Set the postcopy-blocktime and postcopy-ram
> capabilities and query them to verify that they are enabled.
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_set_capability postcopy-blocktime on
> (qemu) info migrate_capabilities
> 
> ...
> postcopy-ram: on
> ...
> postcopy-blocktime: on
> ...
> 
> Do migration with postcopy, this time check the full info migrate in source.
> (qemu) info migrate  -a
> Status: postcopy-active
> Time (ms): total=6522, setup=33, down=16
> RAM info:
>   Throughput (Mbps): 949.60
>   Sizes: pagesize=4 KiB, total=16 GiB
>   Transfers: transferred=703 MiB, remain=5.4 GiB
>     Channels: precopy=111 MiB, multifd=0 B, postcopy=592 MiB
>     Page Types: normal=178447, zero=508031
>   Page Rates (pps): transfer=167581
>   Others: dirty_syncs=2, postcopy_req=1652
> Globals:
>   store-global-state: on
>   only-migratable: off
>   send-configuration: on
>   send-section-footer: on
>   send-switchover-start: on
>   clear-bitmap-shift: 18
> 
> Once migration is completed compare the differences in destination
> about the postcopy blocktime.
> 
> (qemu) info migrate -a
> Status: completed
> Globals:
> ...
> Postcopy Blocktime (ms): 712
> Postcopy vCPU Blocktime (ms):
>  [1633, 1635, 1710, 2097, 2595, 1993, 1958, 1214]
> 
> With all the series applied and same VM:
> 
> (qemu) info migrate -a
> Status: completed
> Globals:
> ...
> Postcopy Blocktime (ms): 134
> Postcopy vCPU Blocktime (ms):
>  [1310, 1064, 1112, 1400, 1334, 756, 1216, 1420]
> Postcopy Latency (us): 16075

Here the latency is 16ms, my fault here - I forgot to let you enable
postcopy-preempt as well, sorry.

The optimization won't help much without preempt, because the optimization
is in tens of microseconds level.  So logically the optimization might be
buried in the noise if without preempt mode.  It's suggested to enable
preempt mode always for a postcopy migration whenever available.

> Postcopy non-vCPU Latencies (us): 14743
> Postcopy vCPU Latencies (us):
>  [24730, 25350, 27125, 25930, 23825, 29110, 22960, 26304]
> 
> Indeed the Postcopy Blocktime has been reduced a lot :)

I haven't compared with blocktime before, I'm surprised it changed that
much.  Though maybe you didn't really run any workload inside?  In that
case the results can be unpredictable.  The perf test would make more sense
if you run some loads so the majority of the faults triggered will not be
adhoc system probes but more predictable.  I normally use mig_mon [1] with
something like this:

[1] https://github.com/xzpeter/mig_mon

$ ./mig_mon mm_dirty -m 13G -p random

This first write pre-fault the whole memory using all the CPUs, then
dirties the 13G memory single threaded as fast as possible in random
fashion.

What I did with the test was applying both series then revert the last
patch of 1st series, as "postcopy-latency' metrics wasn't around before
applying the 2nd series, or you'll need to use some kernel tracepoints.

This is definitely an awkward series to test when having the two mangled.
Again, feel free to skip that, just FYI!

Thanks,

-- 
Peter Xu


Reply via email to