Also, does your custom FS report block locations in the exact same format as how HDFS does?
On Tue, May 14, 2013 at 4:25 PM, Agarwal, Nikhil <[email protected]> wrote: > Hi, > > > > I have a 3-node cluster, with JobTracker running on one machine and > TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS, I > have written my own FileSystem implementation. Since, unlike HDFS I am > unable to provide a shared filesystem view to JobTrackers and TaskTracker > thus, I mounted the root container of slave2 on a directory in slave1 (nfs > mount). By doing this I am able to submit MR job to JobTracker, with input > path as my_scheme://slave1_IP:Port/dir1, etc. MR runs successfully but what > happens is that data locality is not ensured i.e. if files A,B,C are kept on > slave1 and D,E,F on slave2 then according to data locality, map tasks should > be submitted such that map task of A,B,C are submitted to TaskTracker > running on slave1 and D,E,F on slave2. Instead of this, it randomly > schedules the map task to any of the tasktrackers. If map task of file A is > submitted to TaskTracker running on slave2 then it implies that file A is > being fetched over the network by slave2. > > > > How do I avoid this from happening? > > > > Thanks, > > Nikhil > > > > -- Harsh J
