On 02/06/2014 03:45 AM, Paolo Bonzini wrote: > Il 05/02/2014 17:42, Dr. David Alan Gilbert ha scritto: >> Because: >> * the code is still running and keeps redirtying a small handful of >> pages >> * but because we've underestimated our available bandwidth we never stop >> it and just throw those pages across immediately > > Ok, I thought Alexey was saying we are not redirtying that handful of pages.
Every iteration we read the dirty map from KVM and send all dirty pages across the stream. > And in turn, this is because the max downtime we have is too low > (especially for the default 32 MB/sec default bandwidth; that's also pretty > low). My understanding nooow is that in order to finish migration QEMU waits for the earliest 100ms (BUFFER_DELAY) of continuously low trafic but due to those pages getting dirty every time we read the dirty map, we transfer more in these 100ms than we are actually allowed (>32MB/s or 320KB/100ms). So we transfer-transfer-transfer, detect than we transfer too much, do delay() and if max_size (calculated from actual transfer and downtime) for the next iteration is less (by luck) than those 96 pages (uncompressed) - we finish. Increasing speed or/and downtime will help but still - we would not need that if migration did not expect all 96 pages to have to be sent but did have some smart way to detect that many are empty (so - compressed). Literally, move is_zero_range() from ram_save_block() to migration_bitmap_sync() and store this bit in some new pages_zero_map, for example. But does it make a lot of sense? -- Alexey