Nope, it queues up the jobs in series there too.
> On Jul 16, 2016, at 6:01 PM, David Ortiz <[email protected]> wrote:
>
> *run in parallel
>
>
> On Sat, Jul 16, 2016, 5:36 PM David Ortiz <[email protected]
> <mailto:[email protected]>> wrote:
> Just out of curiosity, if you use mrpipeline does it fun on parallel? If so,
> issue may be in spark since I believe crunch leaves it to spark to handle
> best method of execution.
>
>
> On Sat, Jul 16, 2016, 4:29 PM Ben Juhn <[email protected]
> <mailto:[email protected]>> wrote:
> Hey David,
>
> I have 100 active executors, each job typically only uses a few. It’s
> running on yarn.
>
> Thanks,
> Ben
>
>> On Jul 16, 2016, at 12:53 PM, David Ortiz <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> What are the cluster resources available vs what a single map uses?
>>
>>
>> On Sat, Jul 16, 2016, 3:04 PM Ben Juhn <[email protected]
>> <mailto:[email protected]>> wrote:
>> I enabled FAIR scheduling hoping that would help but only one job is showing
>> up a time.
>>
>> Thanks,
>> Ben
>>
>>> On Jul 15, 2016, at 8:17 PM, Ben Juhn <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Each input is of a different format, and the DoFn implementation handles
>>> them depending on instantiation parameters.
>>>
>>> Thanks,
>>> Ben
>>>
>>>> On Jul 15, 2016, at 7:09 PM, Stephen Durfey <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Instead of using readTextFile on the pipeline, try using the read method
>>>> and use the TextFileSource, which can accept in a collection of paths.
>>>>
>>>> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java
>>>>
>>>> <https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/TextFileSource.java>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 15, 2016 at 8:53 PM -0500, "Ben Juhn" <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have a job configured the following way:
>>>> for (String path : paths) {
>>>> PCollection<String> col = pipeline.readTextFile(path);
>>>> col.parallelDo(new MyDoFn(path),
>>>> Writables.strings()).write(To.textFile(“out/“ + path),
>>>> Target.WriteMode.APPEND);
>>>> }
>>>> pipeline.done();
>>>> It results in one spark job for each path, and the jobs run in sequence
>>>> even though there are no dependencies. Is it possible to have the jobs
>>>> run in parallel?
>>>> Thanks,
>>>> Ben
>>>>
>>>
>>
>