I failed to mention that the I don't have an opportunity to read the source - my input is a PTable of Avro keys and values.
On Thu, Sep 25, 2014 at 8:48 PM, Josh Wills <[email protected]> wrote: > NLineSource, to control how many shards the small input is split up into? > > J > > On Thu, Sep 25, 2014 at 6:10 PM, Allan Shoup <[email protected]> > wrote: > >> I have a very cpu-intensive DoFn which running over a relatively small >> input. Running on a Hadoop cluster, the job that it is run in sometimes >> executes the function in map tasks and sometimes in reduce tasks. What's >> the best way to reliably increase parallelization? >> >> One option may be to force a reduce step and control the number of >> reducers. Are there any better options? >> > >
