Dne 20.7.2017 v 22:14 Amador Pahim napsal(a): > On Thu, Jul 20, 2017 at 7:49 PM, Eduardo Habkost <ehabk...@redhat.com> wrote: >> On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote: >>> Amador Pahim <apa...@redhat.com> writes: >>> >>>> On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <arm...@redhat.com> >>>> wrote: >>>>> Amador Pahim <apa...@redhat.com> writes: >>>>> >>>>>> Current implementation is broken. It does not really test if the child >>>>>> process is running. >>>>> >>>>> What usage exactly is broken by this? Got a reproducer for me? >>>> >>>> Problem is that 'returncode' is not set without a calling >>>> poll()/wait()/communicate(), so it's only useful to test if the >>>> process is running after such calls. But if we use 'poll()' instead, >>>> it will, according to the docs, "Check if child process has >>>> terminated. Set and return returncode attribute." >>>> >>>> Reproducer is: >>>> >>>> >>> import subprocess >>>> >>> devnull = open('/dev/null', 'rb') >>>> >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'], >>>> stdin=devnull, stdout=devnull, stderr=devnull, shell=False) >>>> >>> print p.returncode >>>> None >>>> >>> print p.poll() >>>> 1 >>>> >>> print p.returncode >>>> 1 >>>> >>>>>> The Popen.returncode will only be set after by a poll(), wait() or >>>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode >>>>>> will not turn to None by itself. >>>>> >>>>> Hmm. What is the value of .returncode then? >>>> >>>> returncode starts with None and becomes the process exit code when the >>>> process is over and one of that three methods is called (poll(), >>>> wait() or communicate()). >>>> >>>> There's an error in my description though. The correct would be: "The >>>> Popen.returncode will only be set after a call to poll(), wait() or >>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode >>>> will not turn from None to the actual return code by itself." >>> >>> Suggest to add ", and is_running() continues to report True". >>> >>>>>> Instead of using Popen.returncode, let's use Popen.poll(), which >>>>>> actually checks if child process has terminated. >>>>>> >>>>>> Signed-off-by: Amador Pahim <apa...@redhat.com> >>>>>> Reviewed-by: Eduardo Habkost <ehabk...@redhat.com> >>>>>> Reviewed-by: Fam Zheng <f...@redhat.com> >>>>>> --- >>>>>> scripts/qemu.py | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/scripts/qemu.py b/scripts/qemu.py >>>>>> index 880e3e8219..f0fade32bd 100644 >>>>>> --- a/scripts/qemu.py >>>>>> +++ b/scripts/qemu.py >>>>>> @@ -86,7 +86,7 @@ class QEMUMachine(object): >>>>>> raise >>>>>> >>>>>> def is_running(self): >>>>>> - return self._popen and (self._popen.returncode is None) >>>>>> + return self._popen and (self._popen.poll() is None) >>>>>> >> >> After re-reading shutdown(), I think this is _not_ OK: if >> is_running() return False before we call .wait(), we will never >> load the log file or run _post_shutdown() if QEMU exits between >> the launch() and shutdown() calls. > > Yes, I just noticed that while cleaning up the code. > >> >> Yes, it's fragile. >> >> The root problem on both launch() and shutdown() seems to be >> coupling the external "is QEMU running?" state with the internal >> "did we load the log file and ran _post_shutdown() already?" >> state. >> >> I see two possible approaches for this: >> >> 1) Benefit from the fact that the internal Popen state will not >> change under our feet unless we explicitly call >> poll()/wait()/etc, and keep the existing code. (Not my >> favorite option) >> >> 2) Rewrite the code so that we don't depend on the subtle Popen >> internal state rules, and track our own internal state in >> a QEMUMachine attribute. e.g.: > > +1 for this approach. I'm working on something similar, thanks for the > detailed "e.g." code here. > >> >> def _handle_shutdown(self): >> '''Load log file and call _post_shutdown() hook if necessary''' >> # Must be called only after QEMU actually exited. >> assert not self.is_running() >> if self._shutdown_pending: >> if self.exitcode() < 0: >> sys.stderr.write('qemu received signal %i: %s\n' % >> (-exitcode, ' '.join(self._args))) >> self._load_io_log() >> self._post_shutdown() >> self._shutdown_pending = False >> >> def _terminate(self): >> '''Terminate QEMU if it's still running''' >> if self.is_running(): >> try: >> self._qmp.cmd('quit') >> self._qmp.close() >> except: >> self._popen.kill() >> self._popen.wait() >> >> def _launch(self): >> '''Launch the VM and establish a QMP connection''' >> devnull = open('/dev/null', 'rb') >> qemulog = open(self._qemu_log_path, 'wb') >> self._shutdown_pending = True >> self._pre_launch() >> args = self._wrapper + [self._binary] + self._base_args() + >> self._args >> self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog, >> stderr=subprocess.STDOUT, shell=False) >> self._post_launch() >> >> def launch(self): >> try: >> self._launch() >> except: >> self._terminate() >> self._handle_shutdown() >> raise >> >> def shutdown(self): >> '''Terminate the VM and clean up''' >> self._terminate() >> self._handle_shutdown() >> This part also caught my attention and I also meant to improve it when this series is merged. Anyway let's state my suggestions here, take it or let it go:
1. `get_log` should check whether `self._iolog` is `None` and then it should check for process status 2. the `self._iolog` is `None` is the indication whether `shutdown` was called or not (not whether the process exists or not) 3. add `__del__` to cleanup in case one forgets to call `shutdown` (currently the files and processes are left behind) 4. use `name = "qemu-%d-%d" % (os.getpid(), id(self))` to allow multiple instances with default name at the same time. Also I just realized that even with just this patch (as is) files/processes can be left behind: >>> import qemu, os >>> a=qemu.QEMUMachine("/usr/bin/qemu-kvm", debug=True) >>> a.launch() QMP:>>> {'execute': 'qmp_capabilities'} QMP:<<< {u'return': {}} >>> a.is_running() False >>> a.shutdown() >>> os.path.exists(a._qemu_log_path) True Before this patch it worked well as (as Eduardo mentioned) the `is_running` was tracing internal state, not the process state. Regards, Lukáš >> >> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> >> >>>>>> def exitcode(self): >>>>>> if self._popen is None: >>>>> return None >>>>> return self._popen.returncode >>>>> >>>>> Why is this one safe? >>>> >>>> Here it's used just to retrieve the value from the Popen.returncode. >>>> It's not being used to check whether the process is running or not. >>> >>> If self._popen is not None, we return self._popen.returncode. It's None >>> if .poll() etc. haven't been called. Can this happen? If not, why not? >>> If yes, why is returning None then okay? >> >> It can't happen because the only caller of exitcode() >> (device-crash-test) calls it immediately after shutdown(). But >> it would be nice to make exitcode() behavior consistent with >> is_running(). >> >> -- >> Eduardo
signature.asc
Description: OpenPGP digital signature