Re: Limiting the maximum number of simultaneous jobs

Dmitriy Ryaboy Tue, 26 Apr 2011 11:01:04 -0700

If you are using the FairScheduler, you can set this in its config:
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html

Relevant bits from that doc:

The allocation file configures minimum shares, running job limits, weights
and preemption timeouts for each pool. Only users/pools whose values differ
from the defaults need to be explicitly configured in this file. The
allocation file is located in *HADOOP_HOME/conf/fair-scheduler.xml*. It can
contain the following types of elements:

   - *pool* elements, which configure each pool. These may contain the
   following sub-elements:
      - *minMaps* and *minReduces*, to set the pool's minimum share of task
      slots.
      - *maxMaps* and *maxReduces*, to set the pool's maximum concurrent
      task slots.
      - *schedulingMode*, the pool's internal scheduling mode, which can be
      *fair* for fair sharing or *fifo* for first-in-first-out.
      - *maxRunningJobs*, to limit the number of jobs from the pool to run
      at once (defaults to infinite).

On Tue, Apr 26, 2011 at 7:30 AM, Jay Hacker <[email protected]> wrote:

> I have a Pig script that sometimes submits two mapreduce jobs at once.
>  This runs double the number of mappers and reducers that the cluster
> is configured for, which leads to oversubscription and thrashing.
> This may be more of a scheduler thing, but does anyone know how to
> tell Hadoop to only run one job at a time?  Thanks.
>

Re: Limiting the maximum number of simultaneous jobs

Reply via email to