I enabled FAIR scheduling hoping that would help but only one job is showing up a time.
Thanks, Ben > On Jul 15, 2016, at 8:17 PM, Ben Juhn <[email protected]> wrote: > > Each input is of a different format, and the DoFn implementation handles them > depending on instantiation parameters. > > Thanks, > Ben > >> On Jul 15, 2016, at 7:09 PM, Stephen Durfey <[email protected] >> <mailto:[email protected]>> wrote: >> >> Instead of using readTextFile on the pipeline, try using the read method and >> use the TextFileSource, which can accept in a collection of paths. >> >> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java >> >> <https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java> >> >> >> >> >> On Fri, Jul 15, 2016 at 8:53 PM -0500, "Ben Juhn" <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello, >> >> I have a job configured the following way: >> for (String path : paths) { >> PCollection<String> col = pipeline.readTextFile(path); >> col.parallelDo(new MyDoFn(path), >> Writables.strings()).write(To.textFile(“out/“ + path), >> Target.WriteMode.APPEND); >> } >> pipeline.done(); >> It results in one spark job for each path, and the jobs run in sequence even >> though there are no dependencies. Is it possible to have the jobs run in >> parallel? >> Thanks, >> Ben >> >
