[
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984722#comment-13984722
]
Jian He commented on YARN-1885:
-------------------------------
Thanks for the update!
- some places exceed the 80 column limit, like the RMAppImpl transitions.
- app.isAppFinalStateStored() better use isAppInFinalState instead ?
- sleeping for a fixed amount time is not deterministic, test may fail
randomly. it’s better doing it in a while loop with heartbeats, and exit out of
the loop if condition meets.
{code}
// sleep for a while before do next heartbeat
Thread.sleep(1000);
NodeHeartbeatResponse response = nm1.nodeHeartbeat(true);
{code}
- timeout = 600000, timeout too long.
- these two transitions cannot happen? Generally, we should not add events to
states where the transitions can never happen, that’ll hide bugs.
{code}
.addTransition(RMAppState.NEW, RMAppState.NEW, RMAppEventType.NODE_ADDED,
new NodeAddedTransition())
.addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING,
RMAppEventType.NODE_ADDED,
new NodeAddedTransition())
{code}
- These two loops may block the register RPC call for a while, I think we may
send them as the payload of RMNodeStartEvent and handle them in
RMNodeAddTransition ?
{code}
// Handle container statuses reported by NM
if (!request.getContainerStatuses().isEmpty()) {
LOG.info("received container statuses on node manager register :"
+ request.getContainerStatuses());
for (ContainerStatus containerStatus : request.getContainerStatuses()) {
handleContainerStatus(containerStatus);
}
}
// Handle running applications reported by NM
if (null != request.getRunningApplications()) {
for (ApplicationId appId : request.getRunningApplications()) {
handleRunningAppOnNode(appId, request.getNodeId());
}
}
{code}
> RM may not send the finished signal to some nodes where the application ran
> after RM restarts
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Wangda Tan
> Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch
>
>
> During our HA testing we have seen cases where yarn application logs are not
> available through the cli but i can look at AM logs through the UI. RM was
> also being restarted in the background as the application was running.
--
This message was sent by Atlassian JIRA
(v6.2#6252)