Re: question about preserving data locality in MapReduce with Yarn

ricky l Mon, 28 Oct 2013 20:12:08 -0700

Hi Sandy, thank you very much for the information. It is good to know that
MapReduce AM considers the block location information. BTW, I am not very
familiar with the concept of splits. Is it specific to MR jobs? If
possible, code location would be very helpful for reference as I am trying
to implement an application master that needs to consider HDFS
data-locality. thx.


r.


On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <[email protected]>wrote:

> Hi Ricky,
>
> The input splits contain the locations of the blocks they cover.  The AM
> gets the information from the input splits and submits requests for those
> location.  Each container request spans all the replicas that the block is
> located on.  Are you interested in something more specific?
>
> -Sandy
>
>
> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <[email protected]> wrote:
>
>> Well, I thought an application master can somewhat ask where the data
>> exist to a namenode.... isn't it true? If it does not know where the data
>> reside, does a MapReduce application master specify the resource name as
>> "*" which means data locality might not be preserved at all? thx,
>>
>> r
>>
>
>

Re: question about preserving data locality in MapReduce with Yarn

Reply via email to