I have a very cpu-intensive DoFn which running over a relatively small input. Running on a Hadoop cluster, the job that it is run in sometimes executes the function in map tasks and sometimes in reduce tasks. What's the best way to reliably increase parallelization?
One option may be to force a reduce step and control the number of reducers. Are there any better options?
