Peter Maydell <peter.mayd...@linaro.org> wrote: > On 21 July 2017 at 10:13, Dr. David Alan Gilbert <dgilb...@redhat.com> wrote: >> I don't fully understand the way memory_region_do_invalidate_mmio_ptr >> works; I see it dropping the memory region; if that's also dropping >> the RAMBlock then it will upset migration. Even if the CPU is stopped >> I dont think that stops the migration thread walking through the list of >> RAMBlocks. > > memory_region_do_invalidate_mmio_ptr() calls memory_region_unref(), > which will eventually result in memory_region_finalize() being > called, which will call the MR destructor, which in this case is > memory_region_destructor_ram(), which calls qemu_ram_free() on > the RAMBlock, which removes the RAMBlock from the list (after > taking the ramlist lock). > >> Even then, the problem is migration keeps a 'dirty_pages' count which is >> calculated at the start of migration and updated as we dirty and send >> pages; if we add/remove a RAMBlock then that dirty_pages count is wrong >> and we either never finish migration (since dirty_pages never reaches >> zero) or finish early with some unsent data. >> And then there's the 'received' bitmap currently being added for >> postcopy which tracks each page that's been received (that's not in yet >> though). > > It sounds like we really need to make migration robust against > RAMBlock changes -- in the hotplug case it's certainly possible > for RAMBlocks to be newly created or destroyed while migration > is in progress.
There is code to disable hotplug while we are migrating. For 2.10 we disabled *all* hotplug/unplug. If there are things that are safe, we can add them as we do them. The problem with ramblocks is that we do the equivalent of: foreach ramblock for each page in this ramblock if page is dirty, send page But we could take a lot of time/rounds sending a single ramblock, because we go back/forth with top level migration functions/loops. Later, Juan.