On Fri, Feb 6, 2009 at 4:11 AM, Daniel <daniel.chaow...@gmail.com> wrote: > Hi Tutors, > > I want to use python to finish some routine data processing tasks > automatically (on Windows). > > The main task could be split to sub small tasks. Each can be done by > executing some small tools like "awk" or by some other python scripts. > One example of such task is conducting a data processing job, including: > > use tool-A to produce some patterns. > feed tool-B with these patterns to mine more related data > repeat these tasks circularly until meeting some conditions. > > The real task includes more tools which run in parallel or sequential > manner. > > I know how to do this with modules like subprocess, but the final python > program looks somewhat messy and hard to adapt for changes. > > Do you have any best practices on this?
My first thought was, use shell pipelines and bash. Then I remembered, David Beazley shows how to use generators to implement a processing pipeline in Python: http://www.dabeaz.com/generators-uk/ It's a fascinating read, it might take a couple of times to get it but it might fit your needs quite well. You would write a generator that wraps a subprocess call and use that to access external tools; other pieces and the control logic would be in Python. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor