The code is slightly hard to follow since it's split between the client and the 
ApplicationMaster.

The client invokes InputFormat.getSplits to compute locations and writes it to 
a file in HDFS.
The ApplicationMaster then reads the file and creates resource-requests based 
on the locations for each input file (3-replicas). See 
TaskAttemptImpl.dataLocalHosts and TaskAttemptImpl.dataLocalRacks - follow 
those variables around in the code-base.

hth,
Arun

On Oct 28, 2013, at 11:10 PM, ricky l <[email protected]> wrote:

> Hi Sandy, thank you very much for the information. It is good to know that 
> MapReduce AM considers the block location information. BTW, I am not very 
> familiar with the concept of splits. Is it specific to MR jobs? If possible, 
> code location would be very helpful for reference as I am trying to implement 
> an application master that needs to consider HDFS data-locality. thx.
> 
> r.
> 
> 
> On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[email protected]> wrote:
> Hi Ricky,
> 
> The input splits contain the locations of the blocks they cover.  The AM gets 
> the information from the input splits and submits requests for those 
> location.  Each container request spans all the replicas that the block is 
> located on.  Are you interested in something more specific?
> 
> -Sandy
> 
> 
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[email protected]> wrote:
> Well, I thought an application master can somewhat ask where the data exist 
> to a namenode.... isn't it true? If it does not know where the data reside, 
> does a MapReduce application master specify the resource name as "*" which 
> means data locality might not be preserved at all? thx,
> 
> r
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to