Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > On Tue, Jul 24, 2012 at 08:36:25PM +0200, Juan Quintela wrote: >> Hi >> >> This series are on top of the migration-next-v5 series just posted. >> >> First of all, this is an RFC/Work in progress. Just a lot of people >> asked for it, and I would like review of the design. >> >> It does: >> - get a new bitmap for migration, and that bitmap uses 1 bit by page >> - it unfolds migration_buffered_file. Only one user existed. >> - it simplifies buffered_file a lot. >> >> - About the migration thread, special attention was giving to try to >> get the series review-able (reviewers would tell if I got it). >> >> Basic design: >> - we create a new thread instead of a timer function >> - we move all the migration work to that thread (but run everything >> except the waits with the iothread lock. >> - we move all the writting to outside the iothread lock. i.e. >> we walk the state with the iothread hold, and copy everything to one >> buffer. >> then we write that buffer to the sockets outside the iothread lock. >> - once here, we move to writting synchronously to the sockets. >> - this allows us to simplify quite a lot. >> >> And basically, that is it. Notice that we still do the iterate page >> walking with the iothread held. Light testing show that we got > > Is the plan to eventually hold the iothread lock only to re-sync the > dirty bitmap, and then rely on qemu_mutex_lock_ramlist() to walk the > ramlist?
Yeap. I want to drop the walking of the RAM without iothread lock, but aren't yet there. This series basically move: - all migration operations are done in its own thread - all copy from "guest" to "buffer" is done with the io thread lock (to call it something) - all writes from that buffer to the socket/fd is done synchronously and without the iothread - we measure bandwith/downtime more correctly (still not perfect) > It seems like the non-MRU block list, "local" migration > bitmap copy, and ramlist mutex are all toward this end, but we still > have basically the same locking protocol as before. If not, can you elaborate > on what purpose they serve in this series? The problem to move the "live part" outside is that we still miss some locking issues. Later, Juan.