Am 11.02.2013 13:42, schrieb Paolo Bonzini: > Il 11/02/2013 13:29, Kevin Wolf ha scritto: >> The bug is not in this function but in process_incoming_migration(). It >> should never blindly enter a coroutine which is in an unknown state. > > process_incoming_migration() is the function that _creates_ the > coroutine and enters it for the first time, so there shouldn't be any > problem there. > > The function we're looking at should be, as you write correctly, > enter_migration_coroutine().
Yes, Stefan's commit message confused me. >> If it can reenter here, it can reenter anywhere in block drivers, and >> adding a workaround to one place that just yields again if it got an >> early reentrance doesn't fix the real bug. >> >> Which is the yield that corresponds to the enter in >> enter_migration_coroutine()? We need to add some state that can be used >> to make sure the enter happens only for this specific yield. > > In this case the corresponding yield is in qemu-coroutine-io.c, and it > is driven by read-availability of the migration file descriptor. So it > is possible to test before entering the coroutine, but really ugly (not > just because it would cost one extra system call). So the problem is that qemu-coroutine-io.c has a bad interface that yields without scheduling the reenter itself. Wouldn't it better if it was qemu_co_sendv_recvv() which registered an fd handler and handled the yield/reenter stuff transparently? > But why is enter_migration_coroutine() being called at all? bdrv_write > should be a synchronous call, it should run its own qemu_aio_wait() > event loop and that loop doesn't include the migration file descriptor. Why doesn't it include the migration fd? Wouldn't we get such behaviour only if we had different AioContexts? Kevin