All,

I'm trying to understand how the current FileInputFormat implements locality. As far as I can tell, it calculates splits using getSplit and each split will contain the node that hosts the first block of data in that split. Is my understanding correct?

Looking at the FileInputFormat for the old API (mapred), it appears that it does more to implement locality, using getSplitHosts to "return the hosts that contribute most for a given split"

If I understand correctly, why was this changed?

Thanks,
Brian

Reply via email to