On Tue, Dec 19, 2017 at 10:14:08AM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Fri, Dec 15, 2017 at 05:16:53PM +0000, Dr. David Alan Gilbert (git) > > wrote: > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > Hi, > > > Where a channel fails asynchronously during connect, call > > > back through the migration code so it can clean up. > > > In particular this causes the transition of a 'cancelling' state > > > to 'cancelled' in the case of: > > > > > > migrate -d tcp:deadhost:port > > > <host tries to connect> > > > migrate_cancel > > > > > > previously the status would get stuck in cancelling because > > > the final cleanup didn't happen. > > > > > > This is the second part of the fix for: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1525899 > > > > IIUC this series tries to deliver the connection error a long way > > until migrate_fd_connect() to handle it. But, haven't we already have > > a function migrate_fd_error() to do that (which is faster, and > > simpler)? > > > > void migrate_fd_error(MigrationState *s, const Error *error) > > { > > trace_migrate_fd_error(error_get_pretty(error)); > > assert(s->to_dst_file == NULL); > > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > > MIGRATION_STATUS_FAILED); > > migrate_set_error(s, error); > > notifier_list_notify(&migration_state_notifiers, s); > > block_cleanup_parameters(s); > > } > > > > I think it's not handling the case when cancelling. If we let it to > > handle the cancelling case well, would it be a simpler fix? > > > > Moreover, I think this is another good example that migration is not > > handling the cleanup "cleanly" in general... I really hope we can do > > this better in 2.12. I'll see whether I can give it a shot, but in > > all cases it'll be after the merging of existing patches since there > > are already quite a lot of dangling patches. > > No, I think migrate_fd_error is the cause of the problem here, not the > answer.
Could I ask why migrate_fd_error() is problematic? Yeah I agree that we should have a single point to clean things up, then can we call migrate_fd_cleanup() somehow inside migrate_fd_error()? The thing I don't really understand is: why we want the error be delivered via those functions (migration_channel_connect, migrate_fd_connect, etc.) to finally be cleaned up. > > If we stick to the simple rule that a migration must always call > migrate_fd_cleanup then the cancellation problems are fixed - I think > that's how we make migration 'clean' - a single cleanup routine > that always gets called. -- Peter Xu