Hi,

I  have a 3-node cluster, with JobTracker running on one machine and 
TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS, I 
have written my own FileSystem implementation. Since, unlike HDFS I am unable 
to provide a shared filesystem view to JobTrackers and TaskTracker thus, I 
mounted the root container of slave2 on a directory in slave1 (nfs mount). By 
doing this I am able to submit MR job to JobTracker, with input path as 
my_scheme://slave1_IP:Port/dir1, etc.  MR runs successfully but what happens is 
that data locality is not ensured i.e. if files A,B,C are kept on slave1 and 
D,E,F on slave2 then according to data locality, map tasks should be submitted 
such that map task of A,B,C are submitted to TaskTracker running on slave1 and 
D,E,F on slave2. Instead of this, it randomly schedules the map task to any of 
the tasktrackers. If map task of file A is submitted to TaskTracker running on 
slave2 then it implies that file A is being fetched over the network by slave2.

How do I avoid this from happening?

Thanks,
Nikhil


Reply via email to