On Sat 23 Aug 2014 01:52:38 PM EDT, S.L wrote:
> Thats what I thought too, but please check the Answer #2 here in this
> question , I am facing a similar problem.
>
> http://stackoverflow.com/questions/12135949/why-map-task-always-running-on-a-single-node
We were having the same problem; a map with 50 tasks would result in
all 50 on a single datanode (our datanodes have 64GB of memory). What I
did to fix it is change the following configuration values in
mapred-site.xml:
mapreduce.map.memory.mb
mapreduce.map.java.opts
mapreduce.reduce.memory.mb
mapreduce.reduce.java.opts
These control the amount of memory used for maps and reduces; our
machines have 12 cores, so we wanted ~16-20 tasks per node instead of
the current 63 per node since "mapreduce.map.memory.mb" is by default
1024 as far as I know. If you set these values appropriately (memory in
box / tasks per node), you should be good to go. Also, each of the
"java.opts" should be "-Xmx##M", where ## should be the memory for the
JVM in MB.
Both mapreduce.map.memory.mb and mapreduce.reduce.memory.mb are 3072 in
our installation, resulting in around 20 tasks per node.
Please note that I'm not sure if this is the "official" solution, but I
could not find a better solution since the old way of assigning a
certain number of maps per node was deprecated. Also, as mentioned
earlier in this thread, you do need to have enough input splits before
tasks will be assigned to multiple nodes.
Hope this helps,
Alec