I think you misread it.

If a given split has only one block, it uses all the locations of that block.

If it so happens that a given split has multiple blocks, it uses all the 
locations of the first block.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:

> All,
> 
> I'm trying to understand how the current FileInputFormat implements locality. 
>  As far as I can tell, it calculates splits using getSplit and each split 
> will contain the node that hosts the first block of data in that split.  Is 
> my understanding correct?
> 
> Looking at the FileInputFormat for the old API (mapred), it appears that it 
> does more to implement locality, using getSplitHosts to "return the hosts 
> that contribute most for a given split"
> 
> If I understand correctly, why was this changed?
> 
> Thanks,
> Brian
> 

Reply via email to