Am 26.10.2021 um 19:56 hat John Snow geschrieben: > To use the AQMP backend, Machine just needs to be a little more diligent > about what happens when closing a QMP connection. The operation is no > longer a freebie in the async world; it may return errors encountered in > the async bottom half on incoming message receipt, etc. > > (AQMP's disconnect, ultimately, serves as the quiescence point where all > async contexts are gathered together, and any final errors reported at > that point.) > > Because async QMP continues to check for messages asynchronously, it's > almost certainly likely that the loop will have exited due to EOF after > issuing the last 'quit' command. That error will ultimately be bubbled > up when attempting to close the QMP connection. The manager class here > then is free to discard it -- if it was expected. > > Signed-off-by: John Snow <[email protected]> > Reviewed-by: Hanna Reitz <[email protected]> > --- > python/qemu/machine/machine.py | 48 +++++++++++++++++++++++++++++----- > 1 file changed, 42 insertions(+), 6 deletions(-) > > diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py > index 0bd40bc2f76..a0cf69786b4 100644 > --- a/python/qemu/machine/machine.py > +++ b/python/qemu/machine/machine.py > @@ -342,9 +342,15 @@ def _post_shutdown(self) -> None: > # Comprehensive reset for the failed launch case: > self._early_cleanup() > > - if self._qmp_connection: > - self._qmp.close() > - self._qmp_connection = None > + try: > + self._close_qmp_connection() > + except Exception as err: # pylint: disable=broad-except > + LOG.warning( > + "Exception closing QMP connection: %s", > + str(err) if str(err) else type(err).__name__ > + ) > + finally: > + assert self._qmp_connection is None > > self._close_qemu_log_file() > > @@ -420,6 +426,31 @@ def _launch(self) -> None: > close_fds=False) > self._post_launch() > > + def _close_qmp_connection(self) -> None: > + """ > + Close the underlying QMP connection, if any. > + > + Dutifully report errors that occurred while closing, but assume > + that any error encountered indicates an abnormal termination > + process and not a failure to close. > + """ > + if self._qmp_connection is None: > + return > + > + try: > + self._qmp.close() > + except EOFError: > + # EOF can occur as an Exception here when using the Async > + # QMP backend. It indicates that the server closed the > + # stream. If we successfully issued 'quit' at any point, > + # then this was expected. If the remote went away without > + # our permission, it's worth reporting that as an abnormal > + # shutdown case. > + if not (self._user_killed or self._quit_issued): > + raise
Isn't this racy for those tests that expect QEMU to quit by itself and then later call wait()? self._quit_issued is only set to True in wait(), but whatever will cause QEMU to quit happens earlier and it might actually quit before wait() is called. It would make sense to me that such tests need to declare that they expect QEMU to quit before actually performing the action. And then wait() becomes less weird in patch 1, too, because it can just assert self._quit_issued instead of unconditionally setting it. The other point I'm unsure is whether you can actually kill QEMU without getting either self._user_killed or self._quit_issued set. The potentially problematic case I see is shutdown(hard = False) where soft shutdown fails. Then a hard shutdown will be performed without setting self._user_killed (is this a bug?). Of course, sending the 'quit' command in _soft_shutdown() will set self._quit_issued at least, but are we absolutely sure that it can never raise an exception before getting to qmp()? I guess in theory, closing the console socket in _early_cleanup() could raise one? (But either way, not relying on such subtleties would make the code easier to understand.) Kevin
