On Fri, 30 Oct 2020 15:02:09 +0100 Markus Armbruster <arm...@redhat.com> wrote:
> Lukas Straub <lukasstra...@web.de> writes: > > > On Thu, 29 Oct 2020 17:36:14 +0100 > > Markus Armbruster <arm...@redhat.com> wrote: > > > >> Nothing major, looks almost ready to me. > >> > >> Lukas Straub <lukasstra...@web.de> writes: > >> > >> > The yank feature allows to recover from hanging qemu by "yanking" > >> > at various parts. Other qemu systems can register themselves and > >> > multiple yank functions. Then all yank functions for selected > >> > instances can be called by the 'yank' out-of-band qmp command. > >> > Available instances can be queried by a 'query-yank' oob command. > >> > > >> > Signed-off-by: Lukas Straub <lukasstra...@web.de> > >> > Acked-by: Stefan Hajnoczi <stefa...@redhat.com> > >> > --- > >> > include/qemu/yank.h | 95 ++++++++++++++++++++ > >> > qapi/misc.json | 106 ++++++++++++++++++++++ > >> > util/meson.build | 1 + > >> > util/yank.c | 213 ++++++++++++++++++++++++++++++++++++++++++++ > >> > > >> > >> checkpatch.pl warns: > >> > >> WARNING: added, moved or deleted file(s), does MAINTAINERS need > >> updating? > >> > >> Can we find a maintainer for the two new files? > > > > Yes, I'm maintaining this for now, see patch 7. > > Thanks! Would it make sense to add the yank stuff to a new QAPI module > yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover > it? Yes, makes sense. Changed for the next version. > [...] > >> > diff --git a/qapi/misc.json b/qapi/misc.json > >> > index 40df513856..3b7de02a4d 100644 > >> > --- a/qapi/misc.json > >> > +++ b/qapi/misc.json > [...] > >> > +## > >> > +# @YankInstance: > >> > +# > >> > +# A yank instance can be yanked with the "yank" qmp command to recover > >> > from a > >> > +# hanging qemu. > >> > >> QEMU > >> > >> > +# > >> > +# Currently implemented yank instances: > >> > +# -nbd block device: > >> > +# Yanking it will shutdown the connection to the nbd server without > >> > +# attempting to reconnect. > >> > +# -socket chardev: > >> > +# Yanking it will shutdown the connected socket. > >> > +# -migration: > >> > +# Yanking it will shutdown all migration connections. > >> > >> To my surprise, this is recognized as bullet list markup. But please > >> put a space between the bullet and the text anyway. > >> > >> Also: "shutdown" is a noun, the verb is spelled "shut down". > > > > Both changed for the next version. > > > >> In my review of v8, I asked how yanking migration is related to command > >> migrate_cancel. Daniel explained: > >> > >> migrate_cancel will do a shutdown() on the primary migration socket > >> only. > >> In addition it will toggle the migration state. > >> > >> Yanking will do a shutdown on all migration sockets (important for > >> multifd), but won't touch migration state or any other aspect of QEMU > >> code. > >> > >> Overall yanking has less potential for things to go wrong than the > >> migrate_cancel method, as it doesn't try to do any kind of cleanup > >> or migration. > >> > >> Would it make sense to work this into the documentation? > > > > How about this? > > > > - migration: > > Yanking it will shut down all migration connections. Unlike > > @migrate_cancel, it will not notify the migration process, > > so migration will go into @failed state, instead of @cancelled > > state. > > Works for me. Advice on when to use it rather than migrate_cancel would > be nice, though. Ok, Changed for the next version. > >> > +# > >> > +# Since: 5.2 > >> > +## > >> > +{ 'union': 'YankInstance', > >> > + 'base': { 'type': 'YankInstanceType' }, > >> > + 'discriminator': 'type', > >> > + 'data': { > >> > + 'blockdev': 'YankInstanceBlockdev', > >> > + 'chardev': 'YankInstanceChardev' } } > >> > + > >> > +## > >> > +# @yank: > >> > +# > >> > +# Recover from hanging qemu by yanking the specified instances. See > >> > >> QEMU > >> > >> "Try to recover" would be more precise, I think. > > > > Changed for the next version. > > > >> > +# "YankInstance" for more information. > >> > +# > >> > +# Takes a list of @YankInstance as argument. > >> > +# > >> > +# Returns: nothing. > >> > +# > >> > +# Example: > >> > +# > >> > +# -> { "execute": "yank", > >> > +# "arguments": { > >> > +# "instances": [ > >> > +# { "type": "block-node", > >> > +# "node-name": "nbd0" } > >> > +# ] } } > >> > +# <- { "return": {} } > >> > +# > >> > +# Since: 5.2 > >> > +## > >> > +{ 'command': 'yank', > >> > + 'data': { 'instances': ['YankInstance'] }, > >> > + 'allow-oob': true } > >> > + > [...] > >> > diff --git a/util/yank.c b/util/yank.c > >> > new file mode 100644 > >> > index 0000000000..0b3a816706 > >> > --- /dev/null > >> > +++ b/util/yank.c > [...] > >> > +void qmp_yank(YankInstanceList *instances, > >> > + Error **errp) > >> > +{ > >> > + YankInstanceList *tail; > >> > + YankInstanceEntry *entry; > >> > + YankFuncAndParam *func_entry; > >> > + > >> > + qemu_mutex_lock(&yank_lock); > >> > + for (tail = instances; tail; tail = tail->next) { > >> > + entry = yank_find_entry(tail->value); > >> > + if (!entry) { > >> > + error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not > >> > found"); > >> > >> Quote error.h: > >> > >> * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is > >> * strongly discouraged. > >> > >> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND? If not, then > >> error_setg(), please. > > > > There may be a time-to-check to time-to-use race condition here. Suppose > > the management application (via QMP) calls 'blockev-del nbd0', then QEMU > > hangs, so after a timeout it calls 'yank nbd0' while shortly before that > > QEMU unhangs for some reason and removes the blockdev. Then yank returns > > this error. > > > > QMP errors should not be parsed except for the error class, so the the > > management application can only differentiate between this (ignorable) > > error and other (possibly fatal) ones by parsing the error class. > > I see. > > DeviceNotFound doesn't really fit, but none of the other error classes > is any better. > > I think the line > > # Returns: nothing. > > in the QAPI schema (quoted above) should be expanded to something like > > > # Returns: - Nothing on success > - If the YankInstance doesn't exist, DeviceNotFound Changed for the next version. > >> > + qemu_mutex_unlock(&yank_lock); > >> > + return; > >> > + } > >> > + } > >> > + for (tail = instances; tail; tail = tail->next) { > >> > + entry = yank_find_entry(tail->value); > >> > + assert(entry); > >> > + QLIST_FOREACH(func_entry, &entry->yankfns, next) { > >> > + func_entry->func(func_entry->opaque); > >> > + } > >> > + } > >> > + qemu_mutex_unlock(&yank_lock); > >> > +} > [...] > --
pgp5McU115nlM.pgp
Description: OpenPGP digital signature