Re: Reliably Parallelizing CPU-Intensive DoFns

Allan Shoup Thu, 25 Sep 2014 21:04:11 -0700

I failed to mention that the I don't have an opportunity to read the source
- my input is a PTable of Avro keys and values.


On Thu, Sep 25, 2014 at 8:48 PM, Josh Wills <[email protected]> wrote:

> NLineSource, to control how many shards the small input is split up into?
>
> J
>
> On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <[email protected]>
> wrote:
>
>> I have a very cpu-intensive DoFn which running over a relatively small
>> input. Running on a Hadoop cluster, the job that it is run in sometimes
>> executes the function in map tasks and sometimes in reduce tasks. What's
>> the best way to reliably increase parallelization?
>>
>> One option may be to force a reduce step and control the number of
>> reducers. Are there any better options?
>>
>
>

Re: Reliably Parallelizing CPU-Intensive DoFns

Reply via email to