Hi Mehal, When Client makes a read request for a certain file say foo.txt, namenode sends information of first block(BlockID) and the datanodes it resides on.
It's client which decides which datanode to pull information from. If first request fails, it can make a retry to get another replica of block from another datanode. This process repeats until all data is read. Thanks and Regards, Rishi Yadav (o) 408.988.2000x113 || (f) 408.716.2726 InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)* *INC 500 Fastest growing company in 2012 || 2011* *Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon Valley / San Jose Business Journal 2041 Mission College Boulevard, #280 || Santa Clara, CA 95054 On Fri, Feb 8, 2013 at 4:40 PM, Mehal Patel <[email protected]> wrote: > Hello All, > > I am confused over how MapReduce tasks select data blocks for processing > user requests ? > > As data block replication replicates single data block over multiple > datanodes, during job processing how uniquely > data blocks are selected for processing user requests ? How does it > guarantees that no same block gets chosen twice or thrice > for different mapper task. > > > Thank you > > -Mehal >
