I have a very cpu-intensive DoFn which running over a relatively small
input. Running on a Hadoop cluster, the job that it is run in sometimes
executes the function in map tasks and sometimes in reduce tasks. What's
the best way to reliably increase parallelization?

One option may be to force a reduce step and control the number of
reducers. Are there any better options?

Reply via email to