On Thu, Nov 30, 2017 at 08:00:54PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > Tree is pushed here for better reference and testing: > > github.com/xzpeter postcopy-recovery-support > > Hi Peter, > Do you have a git with this code + your OOB world in? > I'd like to play with doing recovery and see what happens; > I still worry a bit about whether the (potentially hung) main loop > is needed for the new incoming connection to be accepted by the > destination.
Good question... I'd say I thought it was okay. The reason is that as long as we run migrate-incoming command using run-oob=true, it'll be run in iothread, and our iothread implementation has this in iothread_run(): g_main_context_push_thread_default(iothread->worker_context); This _should_ mean that from now on NULL context will be replaced with iothread->worker_context (which is the monitor context, rather than main thread any more) mostly (I say mostly because there are corner cases that glib won't use this thread-local var but still the global one, though it should not be our case I guess). I tried to confirm this by breaking at the entry of function socket_accept_incoming_migration() on destination side. Sadly, I was wrong. It's still running in main(). I found that the problem is that g_source_attach() implementation is still using the g_main_context_default() rather than g_main_context_get_thread_default() for the cases where context=NULL is passed in. I don't know whether this is a glib bug: g_source_attach (GSource *source, GMainContext *context) { guint result = 0; ... if (!context) context = g_main_context_default (); ... } I'm CCing some more people who may know better on glib than me. For now, I think a simple solution can be that, we just call g_main_context_get_thread_default() explicitly for QIO code. But also I'd like to see how other people think too. I'll prepare one branch soon, including the two series (postcopy recovery + oob), after the solution is settled down. Thanks, -- Peter Xu