Reliably Parallelizing CPU-Intensive DoFns

Allan Shoup Thu, 25 Sep 2014 18:11:38 -0700

I have a very cpu-intensive DoFn which running over a relatively small
input. Running on a Hadoop cluster, the job that it is run in sometimes
executes the function in map tasks and sometimes in reduce tasks. What's
the best way to reliably increase parallelization?


One option may be to force a reduce step and control the number of
reducers. Are there any better options?

Reliably Parallelizing CPU-Intensive DoFns

Reply via email to