[
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922920#comment-13922920
]
Xuan Gong commented on YARN-1783:
---------------------------------
The logic to handle NodeAction.RESYNC looks good to me. But there will be one
more issue. It is very possible that there is one container whose state is not
completed when we generate NodeStatus and send to RM, but after we receive the
response, the state of this container become COMPLETE. In this patch, we will
remove all the completed containers. In this case, we will remove this
container from context, and this container’s status will be missed.
> yarn application does not make any progress even when no other application is
> running when RM is being restarted in the background
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-1783
> URL: https://issues.apache.org/jira/browse/YARN-1783
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Jian He
> Priority: Critical
> Attachments: YARN-1783.1.patch, YARN-1783.2.patch
>
>
> Noticed that during HA tests some tests took over 3 hours to run when the
> test failed.
> Looking at the logs i see the application made no progress for a very long
> time. However if i look at application log from yarn it actually ran in 5 mins
> I am seeing same behavior when RM was being restarted in the background and
> when both RM and AM were being restarted. This does not happen for all
> applications but a few will hit this in the nightly run.
--
This message was sent by Atlassian JIRA
(v6.2#6252)