On Fri, Oct 26, 2018 at 09:10:19PM +0800, Fei Li wrote: > > > On 10/25/2018 08:58 PM, Peter Xu wrote: > > On Thu, Oct 25, 2018 at 05:04:00PM +0800, Fei Li wrote: > > > > [...] > > > > > @@ -1325,22 +1325,24 @@ bool multifd_recv_all_channels_created(void) > > > /* Return true if multifd is ready for the migration, otherwise false */ > > > bool multifd_recv_new_channel(QIOChannel *ioc) > > > { > > > + MigrationIncomingState *mis = migration_incoming_get_current(); > > > MultiFDRecvParams *p; > > > Error *local_err = NULL; > > > int id; > > > > > > id = multifd_recv_initial_packet(ioc, &local_err); > > > if (id < 0) { > > > - multifd_recv_terminate_threads(local_err); > > > - return false; > > > + error_reportf_err(local_err, > > > + "failed to receive packet via multifd channel > > > %x: > > > ", > > > + multifd_recv_state->count); > > > + goto fail; > > > } > > > > > > p = &multifd_recv_state->params[id]; > > > if (p->c != NULL) { > > > error_setg(&local_err, "multifd: received id '%d' already > > > setup'", > > > id); > > > - multifd_recv_terminate_threads(local_err); > > > - return false; > > > + goto fail; > > > } > > > p->c = ioc; > > > object_ref(OBJECT(ioc)); > > > @@ -1352,6 +1354,11 @@ bool multifd_recv_new_channel(QIOChannel *ioc) > > > QEMU_THREAD_JOINABLE); > > > atomic_inc(&multifd_recv_state->count); > > > return multifd_recv_state->count == migrate_multifd_channels(); > > > +fail: > > > + multifd_recv_terminate_threads(local_err); > > > + qemu_fclose(mis->from_src_file); > > > + mis->from_src_file = NULL; > > > + exit(EXIT_FAILURE); > > > } > > Yeah I think it makes sense to at least report some details when error > > happens, but I'm not sure whether it's good to explicitly exit() here. > > IMHO you can add an Error** in multifd_recv_new_channel() parameter > > list to do that, and even through migration_ioc_process_incoming(). > > What do you think? > > > > Regards, > > > You mean exit() in migration_ioc_process_incoming(), or further > caller migration_channel_process_incoming()? Actually either is > ok for me. :) But today I find if using postcopy and multifd together > to do live migration, it seems the hang still occurs even with the > above codes, so sad about that. I will keep debugging and see > how to fix this.
Maybe you can move the error_report_err() in migration_channel_process_incoming() out of the TLS path so we can report the error if either TLS or non-TLS case got something wrong. And I don't even know whether multifd could work with postcopy... Regards, -- Peter Xu