Processing many map only collections in single pipeline with spark

Ben Juhn Fri, 15 Jul 2016 18:53:54 -0700

Hello,

I have a job configured the following way:
for (String path : paths) {
    PCollection<String> col = pipeline.readTextFile(path);
    col.parallelDo(new MyDoFn(path), 
Writables.strings()).write(To.textFile(“out/“ + path), Target.WriteMode.APPEND);
}
pipeline.done();
It results in one spark job for each path, and the jobs run in sequence even 
though there are no dependencies.  Is it possible to have the jobs run in 
parallel?
Thanks,
Ben

Processing many map only collections in single pipeline with spark

Reply via email to