On Fri, Mar 04, 2016 at 10:49:35AM +0100, Markus Armbruster wrote: > "Daniel P. Berrange" <berra...@redhat.com> writes: > > > Currently if an app initiates an outgoing migration, it > > application > > > may or may not, get an error reported back on failure. If > > the error occurs synchronously to the 'migrate' command > > execution, the client app will see the error message. This > > is the case for DNS lookup failures. If the error occurs > > asynchronously to the monitor command though, the error > > will be thrown away and the client left guessing about > > what went wrong. This is the case for failure to connect > > to the TCP server (eg due to wrong port, or firewall > > rules, or other similar errors). > > > > In the future we'll be adding more scope for errors to > > happen asynchronously with the TLS protocol handshake. > > TLS errors are hard to diagnose even when they are well > > reported, so discarding errors entirely will make it > > impossible to debug TLS connection problems. > > > > Management apps which do migration are already using > > 'query-migrate' / 'info migrate' to check up on progress > > of background migration operations and to see their end > > status. This is a fine place to also include the error > > message when things go wrong. > > > > This patch thus adds an 'error-desc' field to the > > MigrationInfo struct, which will be populated when > > the 'status' is set to 'failed': > > > > (qemu) migrate -d tcp:localhost:9001 > > (qemu) info migrate > > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: > > off compress: off events: off x-postcopy-ram: off > > Migration status: failed > > total time: 0 milliseconds > > error description: Error connecting to socket: Connection refused > > Perhaps: > > Migration status: failed (Error connecting to socket: Connection refused) > > Just an idea, use whatever you like better.
Yeah, that is nicer for the HMP. > > In the HMP, when doing non-detached migration, it is > > also possible to display this error message directly > > to the app. > > > > (qemu) migrate tcp:localhost:9001 > > Error connecting to socket: Connection refused > > You could include a QMP example if you like. Sure, will add it. > > @@ -853,12 +857,14 @@ static void migrate_fd_cleanup(void *opaque) > > notifier_list_notify(&migration_state_notifiers, s); > > } > > > > -void migrate_fd_error(MigrationState *s) > > +void migrate_fd_error(MigrationState *s, const Error *error) > > { > > - trace_migrate_fd_error(); > > + trace_migrate_fd_error(error ? error_get_pretty(error) : ""); > > assert(s->to_dst_file == NULL); > > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > > MIGRATION_STATUS_FAILED); > > + error_free(s->error); > > + s->error = error_copy(error); > > Can migrate_fd_error() be called more than once per migration attempt? I think so, but I felt it worth being paranoid against mistakes so I chose to call error_free() just in case. > > diff --git a/qapi-schema.json b/qapi-schema.json > > index 7b8f2a1..ff89747 100644 > > --- a/qapi-schema.json > > +++ b/qapi-schema.json > > @@ -484,6 +484,8 @@ > > # throttled during auto-converge. This is only present when > > auto-converge > > # has started throttling guest cpus. (Since 2.5) > > # > > +# @error-desc: the error description string, when @status == 'failed' > > (Since 2.6) > > +# > > "when @status == 'failed'" is a semantic constraint, not visible in > query-qmp-schema. Several existing members have similar constraints. > > Making this a flat union tagged by @status would turn semantic > constraints into syntax, visible in query-qmp-schema. Not sure it's > worth the churn now. > > Note that qmp_query_migrate() may not actually set @error-desc when > @status is 'failed'. Bug in either the code or the documentation. Bug in docs :-) > Further note that code uses it without also checking status. You could > assert the constraint holds. Your choice. > > Let me take a step back and examine the bigger picture. > > Migration is a long-running task with a non-trivial live cycle (see enum > MigrationStatus). Similar tasks exist in the block layer ("block > jobs"), and perhaps we can have a generic "jobs" abstraction some day. > > Originally, QMP was designed to permit doing long-running tasks as > asynchronous commands. We ended up doing them as synchronous commands + > events + status queries. Two reasons, one accidental, one fundamental. > > The accidental one is asynchronous commands never quite worked, so > nobody used them, so nobody bothered to fix them. > > The fundamental one is complex life cycles. A job that starts, runs > silently for a while, then either completes or fails can be done nicely > as asynchronous command. But when the job's life cycle is more complex, > a single command reply isn't enough. Thus commands + events + status > queries. > > In this model, the reply to a status query in a "failed" state takes the > role of the asynchronous command's error reply. It therefore makes > sense to compare the two. > > The QMP error reply is documented as follows in qmp-spec.txt: > > The format of an error response is: > > { "error": { "class": json-string, "desc": json-string }, "id": > json-value } > > Where, > > - The "class" member contains the error class name (eg. "GenericError") > - The "desc" member is a human-readable error message. Clients should > not attempt to parse this message. > - The "id" member contains the transaction identification associated with > the command execution if issued by the Client > > MigrationInfo in a "failed" contains @status and @error-desc, and may > contain additional members like @total-time (not entirely clear from > documentation, which means the documentation is too vague). > > MigrationInfo covers @desc, but not @class and @id. > > Use of @class is discouraged nowadays, and omitting it here at least > until we have a compelling use for it makes sense. > > @id ties the asynchronous reply to the command. Not necessary as long > as only one migration task can exist. > > The only thing I'd like you to add to MigrationInfo is the "Clients > should not attempt to parse this" admonition. Suggest to describe > @error-desc as "human-readable" while there. Yes, makes sense. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|