Bugs item #1183780, was opened at 2005-04-15 16:27 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1183780&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Taale Skogan (tskogan) Assigned to: Neal Norwitz (nnorwitz) Summary: Popen4 wait() fails sporadically with threads Initial Comment: Calling wait() on a popen2.Popen4 object fails intermittently with the error Traceback (most recent call last): ... File "/usr/local/lib/python2.3/popen2.py", line 90, in wait pid, sts = os.waitpid(self.pid, 0) OSError: [Errno 10] No child processes when using threads. The problem seems to be a race condition when a thread calls wait() on a popen2.Popen4 object. This also apllies to Popen3 objects. The constructor of Popen4. calls _cleanup() which calls poll() which calls the system call waitpid() for all acitve child processes. If another thread calls poll() before the current thread calls wait() on it's child process and the child process has terminated, the child process is no longer waitable and the second call to wait() fails. Code to replicate this behavoir is attached in popen_bug. py. Solution: Popen4 and Popen3 should be threadsafe. Related modules: A seemingly related error occurs with Popen from the new subprocess module. Use the -s option in the popen_bug.py script to test this. Tested on Linux RedHat Enterprise 3 for Python 2.3.3, Python 2.3.5 and Python 2.4.1 and Solaris for Python 2. 4.1. The error did not occur on a RedHat 7.3 machine with Python 2.3.5. See the attached file popen_bug.py for details on the platforms. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2007-07-10 23:50 Message: Logged In: YES user_id=21627 Originator: NO If you are seeing a bug in subprocess, please report it separately. This one has been fixed. Please don't assume that it is the "same" problem as the one reported here, unless you have a working patch that proves that it is indeed a similar problem. ---------------------------------------------------------------------- Comment By: Geoffrey Bache (gjb1002) Date: 2007-07-10 18:00 Message: Logged In: YES user_id=769182 Originator: NO As an additional note, I have also reproduced the popen problem using Python 2.4.4, though only once. It seems to occur more frequently in subprocess.py. ---------------------------------------------------------------------- Comment By: Geoffrey Bache (gjb1002) Date: 2007-07-10 17:38 Message: Logged In: YES user_id=769182 Originator: NO Did this get fixed in subprocess.py? The patches all seem to be for popen2. I have been observing similar problems in subprocess.py, so I downloaded the test script and ran it with the -s option. It didn't work out of the box, I had to pass "shell=True" to the subprocess.Popen before it did anything at all. Which also made me wonder if the subprocess variant of this problem got forgotten. Having done that, I have observed failures using Python 2.4.4 on Red Hat EL3 Linux, and also using Python 2.4.3 on Red Hat EL4 linux. Most of the time it works, sometimes it hangs forever, and sometimes we get something that look like this: Started 20 threads Exception in thread Thread-19: Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap self.run() File "popen_bug.py", line 53, in run pipe.wait() File "/usr/lib/python2.4/subprocess.py", line 1007, in wait pid, sts = os.waitpid(self.pid, 0) OSError: [Errno 10] No child processes P.S. Googling for "[Errno 10] No child processes" suggests others have this problem. There have been long discussions on the Zope list as to why some people on Linux get exceptions that look like this, for example. ---------------------------------------------------------------------- Comment By: cheops (atila-cheops) Date: 2006-04-10 16:55 Message: Logged In: YES user_id=1276121 see patch # 1467770 for subprocess.py library module ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-03-24 09:16 Message: Logged In: YES user_id=21627 Committed as 43286. I also added .cmd to Popen4. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-03-24 08:56 Message: Logged In: YES user_id=33168 It makes sense to remove from _active on ECHILD. I wondered the same thing about waitpid(), but left it as it was. I don't believe it's possible for waitpid to return any pid other than what you ask for unless the O/S is very, very broken. This patch is fine with me, feel free to check it in. BTW, nice comments and precondition checks in the test. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-03-24 08:40 Message: Logged In: YES user_id=21627 This looks all fine. As a further issue, I think _cleanup should also clean processes which already have been waited on. So if waitpid gives ECHILD (in _cleanup), I think the object should get removed from _active - otherwise, it would stay there forever. Of course, care is then need to avoid __del__ adding it back to _active. Putting these all together, I propose v3 of the patch. Another aspect that puzzles me is the repeated test that waitpid() really returns the pid you asked for. How could it not? If it fails, you get an os.error. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-03-24 06:17 Message: Logged In: YES user_id=33168 I agree with your comment about setting self.sts to 0. That was the problem I alluded to on python-dev. Although I dislike __del__, this does seem like an appropriate place to do the modification of _active. Note that currently all os.error's are swallowed in poll(). I'm not sure if that was the best idea, but that's the current interface. wait() does *not* catch any exceptions. I wasn't really sure what to do about threads. The threads could always manage their calls into a popen object like you propose (don't try to handle simultaneous calls to poll and wait). Another question I had was if popen should be deprecated in favor of subprocess? I've attached a patch which I think implements your suggestion. It also seems to fix all the problems. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-03-24 01:37 Message: Logged In: YES user_id=21627 I don't understand why you are setting self.sts to 0 if wait fails: most likely, there was a simultaneous call to .poll, which should have set self.sts to the real return value. So we should return that instead. I think the whole issue can be avoid if we use resurrection: If __del__ would put unwaited objects into _active, rather than __init__, it would not happen that _cleanup polls a pid which a thread still intends to wait for. In fact, it would be sufficient to only put the pid into _active (avoiding the need for resurrection). If then a thread calls poll explicitly, and another calls wait, they deserve to lose (with ECHILD). I would claim the same error exists if part of the application calls os.wait[3,4](), and another calls .wait later - they also deserve the exception. With that approach, I don't think further thread synchronization would be needed. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-03-23 09:41 Message: Logged In: YES user_id=33168 The attached patch fixes the problem for me. It also addresses another issue where wait could be called from outside the popen2 module. I'm not sure this is the best solution. I'm not sure there really is a good solution. Perhaps it's best to allow an exception to be raised? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1183780&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com