What are the cluster resources available vs what a single map uses? On Sat, Jul 16, 2016, 3:04 PM Ben Juhn <[email protected]> wrote:
> I enabled FAIR scheduling hoping that would help but only one job is > showing up a time. > > Thanks, > Ben > > On Jul 15, 2016, at 8:17 PM, Ben Juhn <[email protected]> wrote: > > Each input is of a different format, and the DoFn implementation handles > them depending on instantiation parameters. > > Thanks, > Ben > > On Jul 15, 2016, at 7:09 PM, Stephen Durfey <[email protected]> wrote: > > Instead of using readTextFile on the pipeline, try using the read method > and use the TextFileSource, which can accept in a collection of paths. > > > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java > > > > > On Fri, Jul 15, 2016 at 8:53 PM -0500, "Ben Juhn" <[email protected]> > wrote: > > Hello, >> >> I have a job configured the following way: >> >> for (String path : paths) { >> PCollection<String> col = pipeline.readTextFile(path); >> col.parallelDo(new MyDoFn(path), >> Writables.strings()).write(To.textFile(“out/“ + path), >> Target.WriteMode.APPEND); >> } >> pipeline.done(); >> >> It results in one spark job for each path, and the jobs run in sequence even >> though there are no dependencies. Is it possible to have the jobs run in >> parallel? >> >> Thanks, >> >> Ben >> >> >> > >
