Re: live-migration performance regression when using pmem

Michael S. Tsirkin Mon, 12 May 2025 12:53:32 -0700

On Mon, May 12, 2025 at 03:16:34PM +0000, Chaney, Ben wrote:
> Hello,
> 
>         When live migrating to a destination host with pmem there is a very 
> long downtime where the guest is paused. In some cases, this can be as high 
> as 5 minutes, compared to less than one second in the good case.
> 
> 
>         Profiling suggests very high activity in this code path:
> 
> 
> ffffffffa2956de6 clean_cache_range+0x26 ([kernel.kallsyms])
> ffffffffa2359b0f dax_writeback_mapping_range+0x1ef ([kernel.kallsyms])
> ffffffffc0c6336d ext4_dax_writepages+0x7d ([kernel.kallsyms])
> ffffffffa2242dac do_writepages+0xbc ([kernel.kallsyms])
> ffffffffa2235ea6 filemap_fdatawrite_wbc+0x66 ([kernel.kallsyms])
> ffffffffa223a896 __filemap_fdatawrite_range+0x46 ([kernel.kallsyms])
> ffffffffa223af73 file_write_and_wait_range+0x43 ([kernel.kallsyms])
> ffffffffc0c57ecb ext4_sync_file+0xfb ([kernel.kallsyms])
> ffffffffa228a331 __do_sys_msync+0x1c1 ([kernel.kallsyms])
> ffffffffa2997fe6 do_syscall_64+0x56 ([kernel.kallsyms])
> ffffffffa2a00126 entry_SYSCALL_64_after_hwframe+0x6e ([kernel.kallsyms])
> 11ec5f msync+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
> 675ada qemu_ram_msync+0x8a (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> 6873c7 xbzrle_load_cleanup+0x37 (inlined)
> 6873c7 ram_load_cleanup+0x37 (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> 4ff375 qemu_loadvm_state_cleanup+0x55 
> (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> 500f0b qemu_loadvm_state+0x15b (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> 4ecf85 process_incoming_migration_co+0x95 
> (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> 8b6412 qemu_coroutine_self+0x2 (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
> ffffffffffffffff [unknown] ([unknown])
> 
> 
>         I was able to resolve the performance issue by removing the call to 
> qemu_ram_block_writeback in ram_load_cleanup. This causes the performance to 
> return to normal. It looks like this code path was initially added to ensure 
> the memory was synchronized if the persistent memory region is backed by an 
> NVDIMM device. Does it serve any purpose if pmem is instead backed by 
> standard DRAM?
> 
> 
>         I'm also curious about the intended use of this code path in NVDIMM 
> case. It seems like it would run into a few issues. This on its own seems 
> insufficient to restore the VM state if the host crashes after a live 
> migration. The memory region being synced is only the guest memory. It 
> doesn't save the driver state on the host side. Also, once the migration 
> completes, the guest can redirty the pages. If the host crashes after that 
> point, the guest memory will still be in an inconsistent state unless the 
> crash is exceptionally well timed. Does anyone have any insight into why this 
> sync operation was introduced?
> 
> 
> Thank you,
>         Ben Chaney
> 
> 
> 
> 
>


Was added here:

commit 56eb90af39abf66c0e80588a9f50c31e7df7320b
Author: Junyan He <junyan...@intel.com>
Date:   Wed Jul 18 15:48:03 2018 +0800

    migration/ram: ensure write persistence on loading all data to PMEM.
    
    Because we need to make sure the pmem kind memory data is synced
    after migration, we choose to call pmem_persist() when the migration
    finish. This will make sure the data of pmem is safe and will not
    lose if power is off.
    
    Signed-off-by: Junyan He <junyan...@intel.com>
    Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
    Reviewed-by: Igor Mammedov <imamm...@redhat.com>
    Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
    Signed-off-by: Michael S. Tsirkin <m...@redhat.com>


it kind of sounded reasonable ... but I don't remember.

Also CC Haozhong Zhang who worked in this area.

> 
>

Re: live-migration performance regression when using pmem

Reply via email to