knowing the nodes on which reduce tasks will run

Abhay Ratnaparkhi Mon, 03 Sep 2012 07:19:38 -0700

Hello,

How can one get to know the nodes on which reduce tasks will run?


One of my job is running and it's completing all the map tasks.
My map tasks write lots of intermediate data. The intermediate directory is
getting full on all the nodes.
If the reduce task take any node from cluster then It'll try to copy the
data to same disk and it'll eventually fail due to Disk space related
exceptions.

I have added few more tasktracker nodes in the cluster and now want to run
reducer on new nodes only.
Is it possible to choose a node on which the reducer will run? What's the
algorithm hadoop uses to get a new node to run reducer?

Thanks in advance.

Bye
Abhay

knowing the nodes on which reduce tasks will run

Reply via email to