[
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920355#comment-13920355
]
Jian He commented on YARN-1783:
-------------------------------
The problem is that while NM is resyncing with RM, NM will clean the finished
containers from its context before it processes the resync command. But the RM
is still waiting for previous AM container Finished event from NM after it
restarts, so that it knows to launch a new attempt.
> yarn application does not make any progress even when no other application is
> running when RM is being restarted in the background
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-1783
> URL: https://issues.apache.org/jira/browse/YARN-1783
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Jian He
> Priority: Critical
>
> Noticed that during HA tests some tests took over 3 hours to run when the
> test failed.
> Looking at the logs i see the application made no progress for a very long
> time. However if i look at application log from yarn it actually ran in 5 mins
> I am seeing same behavior when RM was being restarted in the background and
> when both RM and AM were being restarted. This does not happen for all
> applications but a few will hit this in the nightly run.
--
This message was sent by Atlassian JIRA
(v6.2#6252)