I think you misread it. If a given split has only one block, it uses all the locations of that block.
If it so happens that a given split has multiple blocks, it uses all the locations of the first block. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote: > All, > > I'm trying to understand how the current FileInputFormat implements locality. > As far as I can tell, it calculates splits using getSplit and each split > will contain the node that hosts the first block of data in that split. Is > my understanding correct? > > Looking at the FileInputFormat for the old API (mapred), it appears that it > does more to implement locality, using getSplitHosts to "return the hosts > that contribute most for a given split" > > If I understand correctly, why was this changed? > > Thanks, > Brian >
