Okay, that might be what I need.  Let's say I have 10 nodes in my cluster,
and they all have the same specs.  For Job A (the one that isn't CPU
intensive) I want it to run with 50 mappers per node.  For Job B (the one
that is CPU intensive) I want it to run with 25 mappers per node.  Let's
assume that when each job runs, there are no other jobs running on the
cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A,
and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers
for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy



On Fri, May 30, 2014 at 8:49 PM, Harsh J <[email protected]> wrote:

> This has been discussed in past. There is no current dynamic way to
> control the parallel execution on a per-node basis.
>
> Scheduler configurations will let you control overall parallelism (#
> of simultaneous tasks) of specific jobs on a cluster-level basis, but
> not on a per-node level.
>
> On Sat, May 31, 2014 at 4:08 AM, jeremy p
> <[email protected]> wrote:
> > Hello all,
> >
> > I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and
> so
> > we would like to run it with 50 mappers per node.  Job B is very
> > CPU-intensive, and so we would like to run it with 25 mappers per node.
>  How
> > can we request a different number of mappers per node for each job?  From
> > what I've read, mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> > client.
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Reply via email to