Hi, as a result of the decision to defer thread-safety in 0.8, I switched our external subprocess handling to use temporary files. I created a new class pygr.classutil.FilePopen that follows Python's subprocess.Popen interface while addressing several problems:
- use of Popen.wait() is prone to deadlocks (see the subprocess docs). - the recommended read-write interface, Popen.communicate(), apparently cannot handle even modest amounts of data (>64k), again resulting in deadlocks (there's a veiled warning in the subprocess docs; see the following blog page for some real testing). http://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/ - more broadly, Popen.communicate() forces you to keep the entire data to be read and / or written as a single string in memory. This is not suitable for very large data transfers. Far more flexible is the standard model of reading / writing streams to transfer as much data as you want without ever having to keep all of it in memory. Streams are an easy, ubiquitous interface that would be compatible with a much broader range of existing code (any function that reads / write to a file-like object) than Popen.communicate(). - Popen provides stream attributes (Popen.stdin, Popen.stdout, Popen.stderr), but the docs say that using them directly will result in deadlocks. (That is probably why they try to make you use Popen.communicate()). Also these attributes are only set if that stream is a pipe, not if it is a regular file. The bottom line is that you can't use these stream attributes reliably, so why does the subprocess module present them as part of the public interface? - The subprocess module is not available on Python 2.3, which is a problem for Pygr. The older interfaces (os.popen, popen2, os.system etc.) have their own problems (the same deadlock issues; vulnerability to shell injection attacks, etc.). Even worse, we were already propagating a pattern throughout Pygr of writing duplicate code for every external process interaction: once using subprocess, and again using older interfaces for Python 2.3. That way lies madness. - all of these interfaces seem to force people to use multiple threading as the only way of evading deadlocks. That seems like an overly complicated solution, if all you want to do is pass a bunch of data to an external program, then block until it returns all its results (i.e. if you are "calling" the external program like a subroutine). Most code in the world (including Pygr 0.8) is not truly thread-safe, so invoking multiple threading is either asking for trouble, or drastically limits what code resources you can use. I have implemented the obvious solution: - use temporary files in place of pipes for a subprocess' stdin, stdout, and stderr. - Instead of using Popen.communicate() to read / write, you simply write to Popen.stdin, then call Popen.wait() to let the subprocess finish, and finally read from Popen.stdout (and if you wish, Popen.stderr). - when the Popen object is garbage collected, the temporary files are automatically deleted from the file system. - because you are simply reading / writing to a file (not a pipe to a subprocess), deadlocks are not possible, and there is no need for multi-threading. - this implementation uses the standard Python subprocess module if available, but in its absence (on Python 2.3) transparently provides the same functionality using older interfaces. (Since all interprocess communication is via files, a simple redirect of stdin / stdout is all that's required. Currently this doesn't support stderr redirection, but we could add that.) In this case each argument is "shell-escaped" against shell-injection attacks using commands.mkarg(). The older interface doesn't currently handle fancier aspects of subprocess's arguments (like change the child's environment); currently, those arguments are silently ignored. The new interface is called pygr.classutil.FilePopen, which mimics the subprocess.Popen interface. There is also a convenience function pygr.classutil.call_subprocess(), which mimics the subprocess.call() function. Some details: - if you specify the keyword argument stdin, stdout or stderr to be pygr.classutil.PIPE (same as subprocess.PIPE), it connects that stream to a temporary file (created and deleted automatically) instead of a pipe. - if you specify the argument stdinFlag, it appends that plus the stdin filename to the command line arguments for invoking the subprocess. This is useful for programs like megablast that apparently will only read from a filename argument given on the command line, and not from stdin. The stdinFlag switches the behavior of the stdin stream from being passed as stdin-redirection to being simply appended as a command line argument. For example, stdinFlag='- i' would cause the following to be appended to the command line arguments: -i /path/to/temp/file. If you pass stdinFlag=None, *only* the stdin filename is appended to the command line arguments (with no preceding flag argument). - same thing for stdoutFlag - use the attributes FilePopen.stdin, FilePopen.stdout, FilePopen.stderr for reading / writing from / to the subprocess. These attributes are present and usable regardless of whether they were passed as PIPE or as a regular file. Any attribute not passed PIPE or a file object will be None. - basic usage pattern: 1. create the FilePopen object; 2. write all its input to FilePopen.stdin; 3. call FilePopen.wait() to wait until the subprocess finishes; 4. if desired read from FilePopen.stdout / stderr to get its output; 5. releasing the FilePopen object will cause any temporary files (i.e. created with the PIPE argument) to be deleted from the file system. You don't need to flush() or rewind any of the streams; FilePopen.wait() does that for you. - See usage examples in pygr/downloader.py and pygr/blast.py - like Popen.wait(), FilePopen.wait() returns the return code of the subprocess. Note that the subprocess is not actually started until you can FilePopen.wait(). - most other aspects of the subprocess.Popen interface (such as communicate()) are not implemented on FilePopen, because they aren't relevant. All existing usages in Pygr of subprocess / popen and all but a few of the os.system calls have been switched to FilePopen / call_subprocess. All of this was pushed to master around 10 pm last night. Get it from the public repo.or.cz repository. I've tested so far on Python 2.5.2 and Python 2.6.1 (Mac OS 10.5.6) and Python 2.3.3 (linux 2.4.20), and it seems to have passed last night's megatests fine as well. Yours, Chris On Mar 31, 2009, at 9:42 PM, C. Titus Brown wrote: > > On Tue, Mar 31, 2009 at 09:42:00PM -0700, Alexander Alekseyenko wrote: > -> In my opinion we should make a note that pygr 0.8 is not thread > safe, > -> use the temporary file solution for now and put thread safety on > a to > -> do list for 1.0. > > +1 > > cheers, > --titus --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---