On Mon, 12 Jan 2009 06:37:35 -0800 (PST), "psaff...@googlemail.com" 
<psaff...@googlemail.com> wrote:
I'm building a bioinformatics application using the ipcress tool:

http://www.ebi.ac.uk/~guy/exonerate/ipcress.man.html

I'm using subprocess.Popen to execute ipcress, which takes a group of
files full of DNA sequences and returns some analysis on them. Here's
a code fragment:

cmd = "/usr/bin/ipcress ipcresstmp.txt --sequence /home/pzs/genebuilds/
human/*.fasta"
print "checking with ipcress using command", cmd
p = Popen(cmd, shell=True, bufsize=100, stdout=PIPE, stderr=PIPE)
retcode = p.wait()
if retcode != 0:
        print "ipcress failed with error code:", retcode
        raise Exception
output = p.stdout.read()

If I run the command at my shell, it finishes successfully. It takes
30 seconds - it uses 100% of one core and several hundred MB of memory
during this time. The output is 220KB of text.

However, running it through Python as per the above code, it stalls
after 5 seconds not using any processor at all. I've tried leaving it
for a few minutes with no change. If I interrupt it, it's at the
"retcode = p.wait()" line.

I've tried making the bufsize really large and that doesn't seem to
help. I'm a bit stuck - any suggestions? This same command has worked
fine on other ipcress runs. This one might generate more output than
the others, but 220KB isn't that much, is it?

You have to read the output.  Otherwise, the process's stdout fills up
and its write attempt eventually blocks, preventing it from continuing.

If you use Twisted's process API instead, the reading will be done for
you (without any of the race conditions that are likely when using the
subprocess module), and things will probably "just work".

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to