Hi Nikhil, Which scheduler are you using?
-Sandy On Tue, May 14, 2013 at 3:55 AM, Agarwal, Nikhil <[email protected]>wrote: > Hi,**** > > ** ** > > I have a 3-node cluster, with JobTracker running on one machine and > TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS, > I have written my own FileSystem implementation. Since, unlike HDFS I am > unable to provide a shared filesystem view to JobTrackers and TaskTracker > thus, I mounted the root container of slave2 on a directory in slave1 (nfs > mount). By doing this I am able to submit MR job to JobTracker, with input > path as my_scheme://slave1_IP:Port/dir1, etc. MR runs successfully but > what happens is that data locality is not ensured i.e. if files A,B,C are > kept on slave1 and D,E,F on slave2 then according to data locality, map > tasks should be submitted such that map task of A,B,C are submitted to > TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it > randomly schedules the map task to any of the tasktrackers. If map task of > file A is submitted to TaskTracker running on slave2 then it implies that > file A is being fetched over the network by slave2.**** > > ** ** > > How do I avoid this from happening?**** > > ** ** > > Thanks,**** > > Nikhil**** > > ** ** > > ** ** >
