Okay, that might be what I need. Let's say I have 10 nodes in my cluster, and they all have the same specs. For Job A (the one that isn't CPU intensive) I want it to run with 50 mappers per node. For Job B (the one that is CPU intensive) I want it to run with 25 mappers per node. Let's assume that when each job runs, there are no other jobs running on the cluster. Can I just tell Hadoop to run 500 simultaneous mappers for Job A, and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers for Job B? How do I go about doing this?
I've read that mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the client. Will I run into problems because of that? Thanks for the help. --Jeremy On Fri, May 30, 2014 at 8:49 PM, Harsh J <[email protected]> wrote: > This has been discussed in past. There is no current dynamic way to > control the parallel execution on a per-node basis. > > Scheduler configurations will let you control overall parallelism (# > of simultaneous tasks) of specific jobs on a cluster-level basis, but > not on a per-node level. > > On Sat, May 31, 2014 at 4:08 AM, jeremy p > <[email protected]> wrote: > > Hello all, > > > > I have two jobs, Job A and Job B. Job A is not very CPU-intensive, and > so > > we would like to run it with 50 mappers per node. Job B is very > > CPU-intensive, and so we would like to run it with 25 mappers per node. > How > > can we request a different number of mappers per node for each job? From > > what I've read, mapred.tasktracker.map.tasks.maximum and > > mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the > > client. > > > > --Jeremy > > > > -- > Harsh J >
