Hi, I have a question about maintaining data locality in a MapReduce job launched through Yarn. Based on the Yarn tutorial, it seems like an application master can specify resource name, memory, and cpu when requesting containers. By carefully choosing resource names, I think the data locality can be achieved. I am curious how the current MapReduce application master is doing this. Does it check all needed blocks for a job and choose subset of nodes with the most needed blocks? If someone can point me source code snippets that make this decision, it would be very much appreciated. thx.
-r
