[
https://issues.apache.org/jira/browse/YARN-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050543#comment-14050543
]
Karthik Kambatla commented on YARN-2175:
----------------------------------------
In MR1, mapred.task.timeout handles localization as well and that has worked
very well for our customers. Should we do the same for MR2 as well?
> Container localization has no timeouts and tasks can be stuck there for a
> long time
> -----------------------------------------------------------------------------------
>
> Key: YARN-2175
> URL: https://issues.apache.org/jira/browse/YARN-2175
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.4.0
> Reporter: Anubhav Dhoot
> Assignee: Anubhav Dhoot
>
> There are no timeouts that can be used to limit the time taken by various
> container startup operations. Localization for example could take a long time
> and there is no automated way to kill an task if its stuck in these states.
> These may have nothing to do with the task itself and could be an issue
> within the platform.
> Ideally there should be configurable limits for various states within the
> NodeManager to limit various states. The RM does not care about most of these
> and its only between AM and the NM. We can start by making these global
> configurable defaults and in future we can make it fancier by letting AM
> override them in the start container request.
> This jira will be used to limit localization time and we can open others if
> we feel we need to limit other operations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)