Hi Sandy, thank you very much for the information. It is good to know that MapReduce AM considers the block location information. BTW, I am not very familiar with the concept of splits. Is it specific to MR jobs? If possible, code location would be very helpful for reference as I am trying to implement an application master that needs to consider HDFS data-locality. thx.
r. On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[email protected]>wrote: > Hi Ricky, > > The input splits contain the locations of the blocks they cover. The AM > gets the information from the input splits and submits requests for those > location. Each container request spans all the replicas that the block is > located on. Are you interested in something more specific? > > -Sandy > > > On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[email protected]> wrote: > >> Well, I thought an application master can somewhat ask where the data >> exist to a namenode.... isn't it true? If it does not know where the data >> reside, does a MapReduce application master specify the resource name as >> "*" which means data locality might not be preserved at all? thx, >> >> r >> > >
