[
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984945#comment-13984945
]
Wangda Tan commented on YARN-1885:
----------------------------------
[~jianhe], Thanks for your review!
bq. some places exceed the 80 column limit, like the RMAppImpl transitions.
Will correct this later
bq. app.isAppFinalStateStored() better use isAppInFinalState instead ?
Agree, it's a bug using isAppFinalStateStored()
bq. sleeping for a fixed amount time is not deterministic, test may fail
randomly. it’s better doing it in a while loop with heartbeats, and exit out of
the loop if condition meets.
Agree
bq. timeout = 600000, timeout too long.
Sorry for this typo :)
bq. these two transitions cannot happen? Generally, we should not add events to
states where the transitions can never happen, that’ll hide bugs.
Agree, and I think SUBMITTED is also cannot happen, because an app with
SUBMITTED state doesn't launch any container, so NMs will not have the app in
runningApplication list. Do you agree?
bq. These two loops may block the register RPC call for a while, I think we may
send them as the payload of RMNodeStartEvent and handle them in
RMNodeAddTransition ?
IMO, this shouldn't be a big problem, because there's no blocking calls existed
in handleRunningAppOnNode/handleContainerStatus. So additional microseconds of
latency (just loop array) should be fine. Is it?
Attached new patch.
> RM may not send the finished signal to some nodes where the application ran
> after RM restarts
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-1885
> URL: https://issues.apache.org/jira/browse/YARN-1885
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Wangda Tan
> Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch
>
>
> During our HA testing we have seen cases where yarn application logs are not
> available through the cli but i can look at AM logs through the UI. RM was
> also being restarted in the background as the application was running.
--
This message was sent by Atlassian JIRA
(v6.2#6252)