If you are using the FairScheduler, you can set this in its config:
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html
Relevant bits from that doc:
The allocation file configures minimum shares, running job limits, weights
and preemption timeouts for each pool. Only users/pools whose values differ
from the defaults need to be explicitly configured in this file. The
allocation file is located in *HADOOP_HOME/conf/fair-scheduler.xml*. It can
contain the following types of elements:
- *pool* elements, which configure each pool. These may contain the
following sub-elements:
- *minMaps* and *minReduces*, to set the pool's minimum share of task
slots.
- *maxMaps* and *maxReduces*, to set the pool's maximum concurrent
task slots.
- *schedulingMode*, the pool's internal scheduling mode, which can be
*fair* for fair sharing or *fifo* for first-in-first-out.
- *maxRunningJobs*, to limit the number of jobs from the pool to run
at once (defaults to infinite).
On Tue, Apr 26, 2011 at 7:30 AM, Jay Hacker <[email protected]> wrote:
> I have a Pig script that sometimes submits two mapreduce jobs at once.
> This runs double the number of mappers and reducers that the cluster
> is configured for, which leads to oversubscription and thrashing.
> This may be more of a scheduler thing, but does anyone know how to
> tell Hadoop to only run one job at a time? Thanks.
>