NLineSource, to control how many shards the small input is split up into? J
On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <[email protected]> wrote: > I have a very cpu-intensive DoFn which running over a relatively small > input. Running on a Hadoop cluster, the job that it is run in sometimes > executes the function in map tasks and sometimes in reduce tasks. What's > the best way to reliably increase parallelization? > > One option may be to force a reduce step and control the number of > reducers. Are there any better options? >
