Instead of using readTextFile on the pipeline, try using the read method and 
use the TextFileSource, which can accept in a collection of paths. 

https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java





On Fri, Jul 15, 2016 at 8:53 PM -0500, "Ben Juhn" <[email protected]> wrote:










Hello,
I have a job configured the following way:for (String path : paths) {
    PCollection<String> col = pipeline.readTextFile(path);
    col.parallelDo(new MyDoFn(path), 
Writables.strings()).write(To.textFile(“out/“ + path), Target.WriteMode.APPEND);
}
pipeline.done();It results in one spark job for each path, and the jobs run in 
sequence even though there are no dependencies.  Is it possible to have the 
jobs run in parallel?Thanks,Ben





Reply via email to