Hi,

I have a question about maintaining data locality in a MapReduce job
launched through Yarn. Based on the Yarn tutorial, it seems like an
application master can specify resource name, memory, and cpu when
requesting containers. By carefully choosing resource names, I think the
data locality can be achieved. I am curious how the current MapReduce
application master is doing this. Does it check all needed blocks for a job
and choose subset of nodes with the most needed blocks? If someone can
point me source code snippets that make this decision, it would be very
much appreciated. thx.

-r

Reply via email to