[
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739178#comment-13739178
]
Rohith Sharma K S commented on YARN-1061:
-----------------------------------------
Actual issue I got in 5 node cluster (1 RM and 5 NM).It is hard to recure
scenario for resourcemanager is hang up state in real cluster.
The same scenario can be simulated manually bringing resourcemanager to hang up
state with help of linux command "KILL -STOP <RM_PID>". All the NM->RM call
wait indefinitely. Another case where we can observer indefinite wait is "Add
new NodeManager when ResouceMangaer is hang up state".
> NodeManager is indefinitely waiting for nodeHeartBeat() response from
> ResouceManager.
> -------------------------------------------------------------------------------------
>
> Key: YARN-1061
> URL: https://issues.apache.org/jira/browse/YARN-1061
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.0.5-alpha
> Reporter: Rohith Sharma K S
>
> It is observed that in one of the scenario, NodeManger is indefinetly waiting
> for nodeHeartbeat response from ResouceManger where ResouceManger is in
> hanged up state.
> NodeManager should get timeout exception instead of waiting indefinetly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira