[
https://issues.apache.org/jira/browse/YARN-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632830#comment-13632830
]
Thomas Graves commented on YARN-580:
------------------------------------
Depending on what version you are using MAPREDUCE-4893 helped make this better.
Even with that though you probably won't get perfect locality.
I looked at this with MR and the problem there is in way the AM asks for
containers or the lack of ability to tell RM I need only one of these 3
locations (assuming block repl factor is 3). As far as the RM is concerned in
my case it was giving perfect locality by what was requested but since the MR
AM generally asks for 3 locations per block and the RM doesn't know that by
filling one request it negates 2 others, it can give you containers that
doesn't get you perfect locality. In that case the AM would have to be smarter
about requesting or about giveng them back and asking again.
> Delay scheduling in capacity scheduler is not ensuring 100% locality
> --------------------------------------------------------------------
>
> Key: YARN-580
> URL: https://issues.apache.org/jira/browse/YARN-580
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 2.0.2-alpha, 2.0.1-alpha
> Reporter: Nishan Shetty
> Assignee: Devaraj K
>
> Example
> --------
> Machine1: 3 blocks
> Machine2: 2 blocks
> Machine3: 1 blocks
> When we run job on this data, node locality is not ensured 100%
> Tasks run like below even if slots are available in all nodes:
> --------------
> Machine1: 4Task
> Machine2: 2Task
> Machine3: No task
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira