On Thu, Jul 20, 2017 at 7:49 PM, Eduardo Habkost <ehabk...@redhat.com> wrote: > On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote: >> Amador Pahim <apa...@redhat.com> writes: >> >> > On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <arm...@redhat.com> >> > wrote: >> >> Amador Pahim <apa...@redhat.com> writes: >> >> >> >>> Current implementation is broken. It does not really test if the child >> >>> process is running. >> >> >> >> What usage exactly is broken by this? Got a reproducer for me? >> > >> > Problem is that 'returncode' is not set without a calling >> > poll()/wait()/communicate(), so it's only useful to test if the >> > process is running after such calls. But if we use 'poll()' instead, >> > it will, according to the docs, "Check if child process has >> > terminated. Set and return returncode attribute." >> > >> > Reproducer is: >> > >> > >>> import subprocess >> > >>> devnull = open('/dev/null', 'rb') >> > >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'], >> > stdin=devnull, stdout=devnull, stderr=devnull, shell=False) >> > >>> print p.returncode >> > None >> > >>> print p.poll() >> > 1 >> > >>> print p.returncode >> > 1 >> > >> >>> The Popen.returncode will only be set after by a poll(), wait() or >> >>> communicate(). If the Popen fails to launch a VM, the Popen.returncode >> >>> will not turn to None by itself. >> >> >> >> Hmm. What is the value of .returncode then? >> > >> > returncode starts with None and becomes the process exit code when the >> > process is over and one of that three methods is called (poll(), >> > wait() or communicate()). >> > >> > There's an error in my description though. The correct would be: "The >> > Popen.returncode will only be set after a call to poll(), wait() or >> > communicate(). If the Popen fails to launch a VM, the Popen.returncode >> > will not turn from None to the actual return code by itself." >> >> Suggest to add ", and is_running() continues to report True". >> >> >>> Instead of using Popen.returncode, let's use Popen.poll(), which >> >>> actually checks if child process has terminated. >> >>> >> >>> Signed-off-by: Amador Pahim <apa...@redhat.com> >> >>> Reviewed-by: Eduardo Habkost <ehabk...@redhat.com> >> >>> Reviewed-by: Fam Zheng <f...@redhat.com> >> >>> --- >> >>> scripts/qemu.py | 2 +- >> >>> 1 file changed, 1 insertion(+), 1 deletion(-) >> >>> >> >>> diff --git a/scripts/qemu.py b/scripts/qemu.py >> >>> index 880e3e8219..f0fade32bd 100644 >> >>> --- a/scripts/qemu.py >> >>> +++ b/scripts/qemu.py >> >>> @@ -86,7 +86,7 @@ class QEMUMachine(object): >> >>> raise >> >>> >> >>> def is_running(self): >> >>> - return self._popen and (self._popen.returncode is None) >> >>> + return self._popen and (self._popen.poll() is None) >> >>> > > After re-reading shutdown(), I think this is _not_ OK: if > is_running() return False before we call .wait(), we will never > load the log file or run _post_shutdown() if QEMU exits between > the launch() and shutdown() calls.
Yes, I just noticed that while cleaning up the code. > > Yes, it's fragile. > > The root problem on both launch() and shutdown() seems to be > coupling the external "is QEMU running?" state with the internal > "did we load the log file and ran _post_shutdown() already?" > state. > > I see two possible approaches for this: > > 1) Benefit from the fact that the internal Popen state will not > change under our feet unless we explicitly call > poll()/wait()/etc, and keep the existing code. (Not my > favorite option) > > 2) Rewrite the code so that we don't depend on the subtle Popen > internal state rules, and track our own internal state in > a QEMUMachine attribute. e.g.: +1 for this approach. I'm working on something similar, thanks for the detailed "e.g." code here. > > def _handle_shutdown(self): > '''Load log file and call _post_shutdown() hook if necessary''' > # Must be called only after QEMU actually exited. > assert not self.is_running() > if self._shutdown_pending: > if self.exitcode() < 0: > sys.stderr.write('qemu received signal %i: %s\n' % > (-exitcode, ' '.join(self._args))) > self._load_io_log() > self._post_shutdown() > self._shutdown_pending = False > > def _terminate(self): > '''Terminate QEMU if it's still running''' > if self.is_running(): > try: > self._qmp.cmd('quit') > self._qmp.close() > except: > self._popen.kill() > self._popen.wait() > > def _launch(self): > '''Launch the VM and establish a QMP connection''' > devnull = open('/dev/null', 'rb') > qemulog = open(self._qemu_log_path, 'wb') > self._shutdown_pending = True > self._pre_launch() > args = self._wrapper + [self._binary] + self._base_args() + self._args > self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog, > stderr=subprocess.STDOUT, shell=False) > self._post_launch() > > def launch(self): > try: > self._launch() > except: > self._terminate() > self._handle_shutdown() > raise > > def shutdown(self): > '''Terminate the VM and clean up''' > self._terminate() > self._handle_shutdown() > > > Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> > >> >>> def exitcode(self): >> >>> if self._popen is None: >> >> return None >> >> return self._popen.returncode >> >> >> >> Why is this one safe? >> > >> > Here it's used just to retrieve the value from the Popen.returncode. >> > It's not being used to check whether the process is running or not. >> >> If self._popen is not None, we return self._popen.returncode. It's None >> if .poll() etc. haven't been called. Can this happen? If not, why not? >> If yes, why is returning None then okay? > > It can't happen because the only caller of exitcode() > (device-crash-test) calls it immediately after shutdown(). But > it would be nice to make exitcode() behavior consistent with > is_running(). > > -- > Eduardo